Blogs

BROWSE: Most Recent | Popular Tags |

Tags > big data

Four short links: 20 August 2014

By Nat Torkington
August 20, 2014

Machine Learning for Plant Properties — startup building database of plant genomics, properties, research, etc. for mining. The more familiar you are with your data and its meaning, the better your machine learning will be at suggesting fruitful lines of …

Four short links: 20 August 2014

By Nat Torkington
August 20, 2014

Machine Learning for Plant Properties — startup building database of plant genomics, properties, research, etc. for mining. The more familiar you are with your data and its meaning, the better your machine learning will be at suggesting fruitful lines of …

Four short links: 20 August 2014

By Nat Torkington
August 20, 2014

Machine Learning for Plant Properties — startup building database of plant genomics, properties, research, etc. for mining. The more familiar you are with your data and its meaning, the better your machine learning will be at suggesting fruitful lines of …

Four short links: 13 August 2014

By Nat Torkington
August 13, 2014

Viv — another step in the cognition race. Wolfram Alpha was first out the gate, but Watson, Viv, and others are hot on heels of being able to parse complex requests, then seek and use information to fulfil them. Universal …

Four short links: 13 August 2014

By Nat Torkington
August 13, 2014

Viv — another step in the cognition race. Wolfram Alpha was first out the gate, but Watson, Viv, and others are hot on heels of being able to parse complex requests, then seek and use information to fulfil them. Universal …

Four short links: 13 August 2014

By Nat Torkington
August 13, 2014

Viv — another step in the cognition race. Wolfram Alpha was first out the gate, but Watson, Viv, and others are hot on heels of being able to parse complex requests, then seek and use information to fulfil them. Universal …

Four short links: 7 August 2014

By Nat Torkington
August 7, 2014

Material Design in the Google I/O App (Medium) — steps through design thinking as they put Google’s new design metaphor in place. I’ve been chewing on material design. It brings an internal consistency and logic to the Android world that …

Four short links: 7 August 2014

By Nat Torkington
August 7, 2014

Material Design in the Google I/O App (Medium) — steps through design thinking as they put Google’s new design metaphor in place. I’ve been chewing on material design. It brings an internal consistency and logic to the Android world that …

Four short links: 7 August 2014

By Nat Torkington
August 7, 2014

Material Design in the Google I/O App (Medium) — steps through design thinking as they put Google’s new design metaphor in place. I’ve been chewing on material design. It brings an internal consistency and logic to the Android world that …

Four short links: 6 August 2014

By Nat Torkington
August 6, 2014

Mesa: Geo-Replicated, Near Real-Time, Scalable Data Warehousing (PDF) — paper by Googlers on the database holding G’s ad data. Trillions of rows, petabytes of data, point queries with 99th percentile latency in the hundreds of milliseconds and overall query throughput …

Four short links: 6 August 2014

By Nat Torkington
August 6, 2014

Mesa: Geo-Replicated, Near Real-Time, Scalable Data Warehousing (PDF) — paper by Googlers on the database holding G’s ad data. Trillions of rows, petabytes of data, point queries with 99th percentile latency in the hundreds of milliseconds and overall query throughput …

Four short links: 6 August 2014

By Nat Torkington
August 6, 2014

Mesa: Geo-Replicated, Near Real-Time, Scalable Data Warehousing (PDF) — paper by Googlers on the database holding G’s ad data. Trillions of rows, petabytes of data, point queries with 99th percentile latency in the hundreds of milliseconds and overall query throughput …

Four short links: 5 August 2014

By Nat Torkington
August 5, 2014

Discussion Graph Tool (Microsoft Research) — simplifies social media analysis by making it easy to extract high-level features and co-occurrence relationships from raw data. Superlinear Productivity in Collective Group Actions (PLoS ONE) — study of open source projects shows small …

Four short links: 5 August 2014

By Nat Torkington
August 5, 2014

Discussion Graph Tool (Microsoft Research) — simplifies social media analysis by making it easy to extract high-level features and co-occurrence relationships from raw data. Superlinear Productivity in Collective Group Actions (PLoS ONE) — study of open source projects shows small …

Four short links: 5 August 2014

By Nat Torkington
August 5, 2014

Discussion Graph Tool (Microsoft Research) — simplifies social media analysis by making it easy to extract high-level features and co-occurrence relationships from raw data. Superlinear Productivity in Collective Group Actions (PLoS ONE) — study of open source projects shows small …

Four short links: 1 August 2014

By Nat Torkington
August 1, 2014

Miso — Dataset, a JavaScript client-side data management and transformation library, Storyboard, a state and flow-control management library & d3.chart, a framework for creating reusable charts with d3.js. Open source designed to expedite the creation of high-quality interactive storytelling and …

Four short links: 1 August 2014

By Nat Torkington
August 1, 2014

Miso — Dataset, a JavaScript client-side data management and transformation library, Storyboard, a state and flow-control management library & d3.chart, a framework for creating reusable charts with d3.js. Open source designed to expedite the creation of high-quality interactive storytelling and …

Four short links: 1 August 2014

By Nat Torkington
August 1, 2014

Miso — Dataset, a JavaScript client-side data management and transformation library, Storyboard, a state and flow-control management library & d3.chart, a framework for creating reusable charts with d3.js. Open source designed to expedite the creation of high-quality interactive storytelling and …

Why local state is a fundamental primitive in stream processing

By Jay Kreps
July 31, 2014

One of the concepts that has proven the hardest to explain to people when I talk about Samza is the idea of fault-tolerant local state for stream processing. I think people are so used to the idea of keeping all …

Why local state is a fundamental primitive in stream processing

By Jay Kreps
July 31, 2014

One of the concepts that has proven the hardest to explain to people when I talk about Samza is the idea of fault-tolerant local state for stream processing. I think people are so used to the idea of keeping all …

Why local state is a fundamental primitive in stream processing

By Jay Kreps
July 31, 2014

One of the concepts that has proven the hardest to explain to people when I talk about Samza is the idea of fault-tolerant local state for stream processing. I think people are so used to the idea of keeping all …

Four short links: 21 July 2014

By Nat Torkington
July 21, 2014

nupic (github) -GPL v3-licensed ode from Numenta, at last. See their patent position. Robocup — soccer robotics contest, condition of entry is that all codes are open sourced after the contest. (via The Economist) Security Data Science Paper Collection — …

Four short links: 21 July 2014

By Nat Torkington
July 21, 2014

nupic (github) -GPL v3-licensed ode from Numenta, at last. See their patent position. Robocup — soccer robotics contest, condition of entry is that all codes are open sourced after the contest. (via The Economist) Security Data Science Paper Collection — …

Four short links: 21 July 2014

By Nat Torkington
July 21, 2014

nupic (github) -GPL v3-licensed ode from Numenta, at last. See their patent position. Robocup — soccer robotics contest, condition of entry is that all codes are open sourced after the contest. (via The Economist) Security Data Science Paper Collection — …

Four short links: 15 July 2014

By Nat Torkington
July 15, 2014

Inside Data Brokers — very readable explanation of the data brokers and how their information is used to track advertising effectiveness. Elon, I Want My Data! — Telsa don’t give you access to the data that your cars collects. Bodes …

Four short links: 15 July 2014

By Nat Torkington
July 15, 2014

Inside Data Brokers — very readable explanation of the data brokers and how their information is used to track advertising effectiveness. Elon, I Want My Data! — Telsa don’t give you access to the data that your cars collects. Bodes …

Four short links: 15 July 2014

By Nat Torkington
July 15, 2014

Inside Data Brokers — very readable explanation of the data brokers and how their information is used to track advertising effectiveness. Elon, I Want My Data! — Telsa don’t give you access to the data that your cars collects. Bodes …

Four short links: 9 July 2014

By Nat Torkington
July 9, 2014

Developer Inequality (Jonathan Edwards) — The bigger injustice is that programming has become an elite: a vocation requiring rare talents, grueling training, and total dedication. The way things are today if you want to be a programmer you had best …

Four short links: 9 July 2014

By Nat Torkington
July 9, 2014

Developer Inequality (Jonathan Edwards) — The bigger injustice is that programming has become an elite: a vocation requiring rare talents, grueling training, and total dedication. The way things are today if you want to be a programmer you had best …

Four short links: 9 July 2014

By Nat Torkington
July 9, 2014

Developer Inequality (Jonathan Edwards) — The bigger injustice is that programming has become an elite: a vocation requiring rare talents, grueling training, and total dedication. The way things are today if you want to be a programmer you had best …

Four short links: 1 July 2014

By Nat Torkington
July 1, 2014

word2vec — This tool provides an efficient implementation of the continuous bag-of-words and skip-gram architectures for computing vector representations of words. These representations can be subsequently used in many natural language processing applications and for further research. From Google Research …

Four short links: 1 July 2014

By Nat Torkington
July 1, 2014

word2vec — This tool provides an efficient implementation of the continuous bag-of-words and skip-gram architectures for computing vector representations of words. These representations can be subsequently used in many natural language processing applications and for further research. From Google Research …

Four short links: 1 July 2014

By Nat Torkington
July 1, 2014

word2vec — This tool provides an efficient implementation of the continuous bag-of-words and skip-gram architectures for computing vector representations of words. These representations can be subsequently used in many natural language processing applications and for further research. From Google Research …

Four short links: 27 June 2014

By Nat Torkington
June 27, 2014

MillWheel: Fault-Tolerant Stream Processing at Internet Scale — Google Research paper on the tech underlying the new cloud DataFlow tool. Watch the video. Yow. The Integer Overflow Bug That Went to Mars — long-standing (20 year old!) bug in a …

Four short links: 27 June 2014

By Nat Torkington
June 27, 2014

MillWheel: Fault-Tolerant Stream Processing at Internet Scale — Google Research paper on the tech underlying the new cloud DataFlow tool. Watch the video. Yow. The Integer Overflow Bug That Went to Mars — long-standing (20 year old!) bug in a …

Four short links: 27 June 2014

By Nat Torkington
June 27, 2014

MillWheel: Fault-Tolerant Stream Processing at Internet Scale — Google Research paper on the tech underlying the new cloud DataFlow tool. Watch the video. Yow. The Integer Overflow Bug That Went to Mars — long-standing (20 year old!) bug in a …

Four short links: 24 June 2014

By Nat Torkington
June 24, 2014

Maximum Happy Imagination (Matt Jones) — questioning the true vision of Marc Andreessen’s recent Twitter discourse on the great future that awaits us. His analogies run out in the 20th century when it comes to the political, social and economic …

Four short links: 24 June 2014

By Nat Torkington
June 24, 2014

Maximum Happy Imagination (Matt Jones) — questioning the true vision of Marc Andreessen’s recent Twitter discourse on the great future that awaits us. His analogies run out in the 20th century when it comes to the political, social and economic …

Four short links: 24 June 2014

By Nat Torkington
June 24, 2014

Maximum Happy Imagination (Matt Jones) — questioning the true vision of Marc Andreessen’s recent Twitter discourse on the great future that awaits us. His analogies run out in the 20th century when it comes to the political, social and economic …

Four short links: 20 June 2014

By Nat Torkington
June 20, 2014

Dynamo and BigTable — good preso overview of two approaches to solving availability and consistency in the event of server failure or network partition. Goals Gone Wild (PDF) — In this article, we argue that the beneficial effects of goal …

Four short links: 20 June 2014

By Nat Torkington
June 20, 2014

Dynamo and BigTable — good preso overview of two approaches to solving availability and consistency in the event of server failure or network partition. Goals Gone Wild (PDF) — In this article, we argue that the beneficial effects of goal …

Four short links: 20 June 2014

By Nat Torkington
June 20, 2014

Dynamo and BigTable — good preso overview of two approaches to solving availability and consistency in the event of server failure or network partition. Goals Gone Wild (PDF) — In this article, we argue that the beneficial effects of goal …

Four short links: 9 June 2014

By Nat Torkington
June 9, 2014

textql — execute SQL against structured text like CSV or TSV. Social Network Structure of Fake Friends — author bought 4,000 Twitter followers and studied their relationships. Hidden Biases in Big Data — with every big data set, we need …

Four short links: 3 June 2014

By Nat Torkington
June 3, 2014

Machine Learning Done Wrong — [M]ost practitioners pick the modeling algorithm they are most familiar with rather than pick the one which best suits the data. In this post, I would like to share some common mistakes (the don’t-s). Bandits …

A growing number of applications are being built with Spark

By Ben Lorica
May 31, 2014

One of the trends we’re following closely at Strata is the emergence of vertical applications. As components for creating large-scale data infrastructures enter their early stages of maturation, companies are focusing on solving data problems in specific industries rather than …

How to be agile with your big data

By Mike Barlow
May 28, 2014

Data analysis, like other pursuits, is a balancing act. The rise of big data ratchets up the pressure on the traditional enterprise data warehouse (EDW) and associated software tools to handle rapidly evolving sets of new demands posed by the …

Four short links: 26 May 2014

By Nat Torkington
May 23, 2014

Car Alarms and Smoke Alarms (Slideshare) — how to think about and draw the line between sensitivity and specificity. 101 Uses for Content Mining — between the list in the post and the comments from readers, it’s a good introduction …

Four short links: 23 May 2014

By Nat Torkington
May 23, 2014

How to Educate Users (Luke Wroblewski) — help new users in your app, not in a video. Hardware By The Numbers (Renee DiResta) — slides from her keynote at the Solid conference. The mean success rate across all sectors is …

Four short links: 22 May 2014

By Nat Torkington
May 22, 2014

Ferry — helps you create big data clusters on your local machine. Define your big data stack using YAML and share your application with Dockerfiles. Ferry supports Hadoop, Cassandra, Spark, GlusterFS, and Open MPI. What Google Told SEC — For …

Four short links: 1 May 2014

By Nat Torkington
April 30, 2014

US Providers Must Divulge from Offshore Servers (Gigaom) — A U.S. magistrate judge ruled that U.S. cloud vendors must fork over customer data even if that data resides in data centers outside the country. (via Alistair Croll) Inside Google’s Self-Driving …


1 to 50 of 263 Next
The Watering Hole