Blogs

BROWSE: Most Recent | Popular Tags |

Tags > data

Four short links: 1 September 2014

By Nat Torkington
September 1, 2014

Sibyl: Google’s System for Large Scale Machine Learning (YouTube) — keynote at DSN2014 acting as an intro to Sibyl. (via KD Nuggets) Bitrot from 1997 — That’s 205 failures, an actual link rot figure of 91%, not 57%. That leaves …

Four short links: 1 September 2014

By Nat Torkington
September 1, 2014

Sibyl: Google’s System for Large Scale Machine Learning (YouTube) — keynote at DSN2014 acting as an intro to Sibyl. (via KD Nuggets) Bitrot from 1997 — That’s 205 failures, an actual link rot figure of 91%, not 57%. That leaves …

Four short links: 1 September 2014

By Nat Torkington
September 1, 2014

Sibyl: Google’s System for Large Scale Machine Learning (YouTube) — keynote at DSN2014 acting as an intro to Sibyl. (via KD Nuggets) Bitrot from 1997 — That’s 205 failures, an actual link rot figure of 91%, not 57%. That leaves …

Four short links: 27 August 2014

By Nat Torkington
August 27, 2014

Discourse turns 1.0 — community/forum software that doesn’t suck. Programmable Matter (IEEE Spectrum) — recap of where research is going in this area. Liquibase — source control for your database. Apache 2.0 licensed. A Few Useful Things to Know About …

Four short links: 27 August 2014

By Nat Torkington
August 27, 2014

Discourse turns 1.0 — community/forum software that doesn’t suck. Programmable Matter (IEEE Spectrum) — recap of where research is going in this area. Liquibase — source control for your database. Apache 2.0 licensed. A Few Useful Things to Know About …

Four short links: 27 August 2014

By Nat Torkington
August 27, 2014

Discourse turns 1.0 — community/forum software that doesn’t suck. Programmable Matter (IEEE Spectrum) — recap of where research is going in this area. Liquibase — source control for your database. Apache 2.0 licensed. A Few Useful Things to Know About …

How Flash changes the design of database storage engines

By Andy Oram
August 22, 2014

Over the past decade, SSD drives (popularly known as Flash) have radically changed computing at both the consumer level — where USB sticks have effectively replaced CDs for transporting files — and the server level, where it offers a price/performance …

How Flash changes the design of database storage engines

By Andy Oram
August 22, 2014

Over the past decade, SSD drives (popularly known as Flash) have radically changed computing at both the consumer level — where USB sticks have effectively replaced CDs for transporting files — and the server level, where it offers a price/performance …

How Flash changes the design of database storage engines

By Andy Oram
August 22, 2014

Over the past decade, SSD drives (popularly known as Flash) have radically changed computing at both the consumer level — where USB sticks have effectively replaced CDs for transporting files — and the server level, where it offers a price/performance …

Building pipelines to facilitate data analysis

By Hadley Wickham
August 21, 2014

In every data analysis, you have to string together many tools. You need tools for data wrangling, visualisation, and modelling to understand what’s going on in your data. To use these tools effectively, you need to be able to easily …

Building pipelines to facilitate data analysis

By Hadley Wickham
August 21, 2014

In every data analysis, you have to string together many tools. You need tools for data wrangling, visualisation, and modelling to understand what’s going on in your data. To use these tools effectively, you need to be able to easily …

Building pipelines to facilitate data analysis

By Hadley Wickham
August 21, 2014

In every data analysis, you have to string together many tools. You need tools for data wrangling, visualisation, and modelling to understand what’s going on in your data. To use these tools effectively, you need to be able to easily …

Four short links: 21 August 2014

By Nat Torkington
August 21, 2014

Dat — an open source project that provides a streaming interface between every file format and data storage backend. See the Wired piece on it. Smithsonian Crowdsourcing Transcription (Smithsonian) — 49 volunteers transcribed 200 pages of correspondence between the Monuments …

Four short links: 21 August 2014

By Nat Torkington
August 21, 2014

Dat — an open source project that provides a streaming interface between every file format and data storage backend. See the Wired piece on it. Smithsonian Crowdsourcing Transcription (Smithsonian) — 49 volunteers transcribed 200 pages of correspondence between the Monuments …

Four short links: 21 August 2014

By Nat Torkington
August 21, 2014

Dat — an open source project that provides a streaming interface between every file format and data storage backend. See the Wired piece on it. Smithsonian Crowdsourcing Transcription (Smithsonian) — 49 volunteers transcribed 200 pages of correspondence between the Monuments …

Four short links: 20 August 2014

By Nat Torkington
August 20, 2014

Machine Learning for Plant Properties — startup building database of plant genomics, properties, research, etc. for mining. The more familiar you are with your data and its meaning, the better your machine learning will be at suggesting fruitful lines of …

Four short links: 20 August 2014

By Nat Torkington
August 20, 2014

Machine Learning for Plant Properties — startup building database of plant genomics, properties, research, etc. for mining. The more familiar you are with your data and its meaning, the better your machine learning will be at suggesting fruitful lines of …

Four short links: 20 August 2014

By Nat Torkington
August 20, 2014

Machine Learning for Plant Properties — startup building database of plant genomics, properties, research, etc. for mining. The more familiar you are with your data and its meaning, the better your machine learning will be at suggesting fruitful lines of …

Ten years of OpenStreetMap

By Tyler Bell
August 15, 2014

Next to GPS, the most significant development in the Open Geo Data movement is OpenStreetMap (OSM), a community-driven mapping project whose goal is to create the most detailed, correct, and current open map of the world. This week, OSM celebrates …

Ten years of OpenStreetMap

By Tyler Bell
August 15, 2014

Next to GPS, the most significant development in the Open Geo Data movement is OpenStreetMap (OSM), a community-driven mapping project whose goal is to create the most detailed, correct, and current open map of the world. This week, OSM celebrates …

Ten years of OpenStreetMap

By Tyler Bell
August 15, 2014

Next to GPS, the most significant development in the Open Geo Data movement is OpenStreetMap (OSM), a community-driven mapping project whose goal is to create the most detailed, correct, and current open map of the world. This week, OSM celebrates …

Four short links: 13 August 2014

By Nat Torkington
August 13, 2014

Viv — another step in the cognition race. Wolfram Alpha was first out the gate, but Watson, Viv, and others are hot on heels of being able to parse complex requests, then seek and use information to fulfil them. Universal …

Four short links: 13 August 2014

By Nat Torkington
August 13, 2014

Viv — another step in the cognition race. Wolfram Alpha was first out the gate, but Watson, Viv, and others are hot on heels of being able to parse complex requests, then seek and use information to fulfil them. Universal …

Four short links: 13 August 2014

By Nat Torkington
August 13, 2014

Viv — another step in the cognition race. Wolfram Alpha was first out the gate, but Watson, Viv, and others are hot on heels of being able to parse complex requests, then seek and use information to fulfil them. Universal …

Four short links: 7 August 2014

By Nat Torkington
August 7, 2014

Material Design in the Google I/O App (Medium) — steps through design thinking as they put Google’s new design metaphor in place. I’ve been chewing on material design. It brings an internal consistency and logic to the Android world that …

Four short links: 7 August 2014

By Nat Torkington
August 7, 2014

Material Design in the Google I/O App (Medium) — steps through design thinking as they put Google’s new design metaphor in place. I’ve been chewing on material design. It brings an internal consistency and logic to the Android world that …

Four short links: 7 August 2014

By Nat Torkington
August 7, 2014

Material Design in the Google I/O App (Medium) — steps through design thinking as they put Google’s new design metaphor in place. I’ve been chewing on material design. It brings an internal consistency and logic to the Android world that …

Scaling up data frames

By Ben Lorica
August 7, 2014

Long before the advent of “big data,” analysts were building models using tools like R (and its forerunners S/S-PLUS). Productivity hinged on tools that made data wrangling, data inspection, and data modeling convenient. Among R users, this meant proficiency with …

Scaling up data frames

By Ben Lorica
August 7, 2014

Long before the advent of “big data,” analysts were building models using tools like R (and its forerunners S/S-PLUS). Productivity hinged on tools that made data wrangling, data inspection, and data modeling convenient. Among R users, this meant proficiency with …

Scaling up data frames

By Ben Lorica
August 7, 2014

Long before the advent of “big data,” analysts were building models using tools like R (and its forerunners S/S-PLUS). Productivity hinged on tools that made data wrangling, data inspection, and data modeling convenient. Among R users, this meant proficiency with …

Four short links: 6 August 2014

By Nat Torkington
August 6, 2014

Mesa: Geo-Replicated, Near Real-Time, Scalable Data Warehousing (PDF) — paper by Googlers on the database holding G’s ad data. Trillions of rows, petabytes of data, point queries with 99th percentile latency in the hundreds of milliseconds and overall query throughput …

Four short links: 6 August 2014

By Nat Torkington
August 6, 2014

Mesa: Geo-Replicated, Near Real-Time, Scalable Data Warehousing (PDF) — paper by Googlers on the database holding G’s ad data. Trillions of rows, petabytes of data, point queries with 99th percentile latency in the hundreds of milliseconds and overall query throughput …

Four short links: 6 August 2014

By Nat Torkington
August 6, 2014

Mesa: Geo-Replicated, Near Real-Time, Scalable Data Warehousing (PDF) — paper by Googlers on the database holding G’s ad data. Trillions of rows, petabytes of data, point queries with 99th percentile latency in the hundreds of milliseconds and overall query throughput …

Four short links: 5 August 2014

By Nat Torkington
August 5, 2014

Discussion Graph Tool (Microsoft Research) — simplifies social media analysis by making it easy to extract high-level features and co-occurrence relationships from raw data. Superlinear Productivity in Collective Group Actions (PLoS ONE) — study of open source projects shows small …

Four short links: 5 August 2014

By Nat Torkington
August 5, 2014

Discussion Graph Tool (Microsoft Research) — simplifies social media analysis by making it easy to extract high-level features and co-occurrence relationships from raw data. Superlinear Productivity in Collective Group Actions (PLoS ONE) — study of open source projects shows small …

Four short links: 5 August 2014

By Nat Torkington
August 5, 2014

Discussion Graph Tool (Microsoft Research) — simplifies social media analysis by making it easy to extract high-level features and co-occurrence relationships from raw data. Superlinear Productivity in Collective Group Actions (PLoS ONE) — study of open source projects shows small …

Four short links: 4 August 2014

By Nat Torkington
August 4, 2014

EtherCalc — open source web-based spreadsheet. Dynamics of Correlated Novelties (Nature) — paper on “the adjacent possible”. Here we propose a simple mathematical model that mimics the process of exploring a physical, biological, or conceptual space that enlarges whenever a …

Four short links: 4 August 2014

By Nat Torkington
August 4, 2014

EtherCalc — open source web-based spreadsheet. Dynamics of Correlated Novelties (Nature) — paper on “the adjacent possible”. Here we propose a simple mathematical model that mimics the process of exploring a physical, biological, or conceptual space that enlarges whenever a …

Four short links: 4 August 2014

By Nat Torkington
August 4, 2014

EtherCalc — open source web-based spreadsheet. Dynamics of Correlated Novelties (Nature) — paper on “the adjacent possible”. Here we propose a simple mathematical model that mimics the process of exploring a physical, biological, or conceptual space that enlarges whenever a …

Health games platforms mature in preparation for mainstream adoption

By Andy Oram
August 1, 2014

For the past several years, researchers have strived to create compelling games that improve behavior, reduce stress, or teach healthy responses to difficult life situations. Such healthy games tend to arise in research settings because of the need to demonstrate …

Health games platforms mature in preparation for mainstream adoption

By Andy Oram
August 1, 2014

For the past several years, researchers have strived to create compelling games that improve behavior, reduce stress, or teach healthy responses to difficult life situations. Such healthy games tend to arise in research settings because of the need to demonstrate …

Health games platforms mature in preparation for mainstream adoption

By Andy Oram
August 1, 2014

For the past several years, researchers have strived to create compelling games that improve behavior, reduce stress, or teach healthy responses to difficult life situations. Such healthy games tend to arise in research settings because of the need to demonstrate …

Four short links: 1 August 2014

By Nat Torkington
August 1, 2014

Miso — Dataset, a JavaScript client-side data management and transformation library, Storyboard, a state and flow-control management library & d3.chart, a framework for creating reusable charts with d3.js. Open source designed to expedite the creation of high-quality interactive storytelling and …

Four short links: 1 August 2014

By Nat Torkington
August 1, 2014

Miso — Dataset, a JavaScript client-side data management and transformation library, Storyboard, a state and flow-control management library & d3.chart, a framework for creating reusable charts with d3.js. Open source designed to expedite the creation of high-quality interactive storytelling and …

Four short links: 1 August 2014

By Nat Torkington
August 1, 2014

Miso — Dataset, a JavaScript client-side data management and transformation library, Storyboard, a state and flow-control management library & d3.chart, a framework for creating reusable charts with d3.js. Open source designed to expedite the creation of high-quality interactive storytelling and …

Why local state is a fundamental primitive in stream processing

By Jay Kreps
July 31, 2014

One of the concepts that has proven the hardest to explain to people when I talk about Samza is the idea of fault-tolerant local state for stream processing. I think people are so used to the idea of keeping all …

Why local state is a fundamental primitive in stream processing

By Jay Kreps
July 31, 2014

One of the concepts that has proven the hardest to explain to people when I talk about Samza is the idea of fault-tolerant local state for stream processing. I think people are so used to the idea of keeping all …

Why local state is a fundamental primitive in stream processing

By Jay Kreps
July 31, 2014

One of the concepts that has proven the hardest to explain to people when I talk about Samza is the idea of fault-tolerant local state for stream processing. I think people are so used to the idea of keeping all …

New scalable solutions for data analysis with R

By Federico Castanedo
July 24, 2014

The R programming language is the most popular statistical software in use today by data scientists, according to the 2013 Rexer Analytics Data Miner survey. One of the main drawbacks of vanilla R is the inability to scale and handle …

New scalable solutions for data analysis with R

By Federico Castanedo
July 24, 2014

The R programming language is the most popular statistical software in use today by data scientists, according to the 2013 Rexer Analytics Data Miner survey. One of the main drawbacks of vanilla R is the inability to scale and handle …


1 to 50 of 2113 Next
The Watering Hole