Blogs

BROWSE: Most Recent | Popular Tags |

Tags > data

Announcing Spark Certification

By Ben Lorica
September 18, 2014

Editor’s note: full disclosure — Ben is an advisor to Databricks. I am pleased to announce a joint program between O’Reilly and Databricks to certify Spark developers. O’Reilly has long been interested in certification, and with this inaugural program, we believe …

Announcing Spark Certification

By Ben Lorica
September 18, 2014

Editor’s note: full disclosure — Ben is an advisor to Databricks. I am pleased to announce a joint program between O’Reilly and Databricks to certify Spark developers. O’Reilly has long been interested in certification, and with this inaugural program, we believe …

Four short links: 15 September 2014

By Nat Torkington
September 15, 2014

The Care and Feeding of Weird Machines Found in Executable Metadata (YouTube) — talk from 29th Chaos Communication Congress, on using tricking the ELF linker/loader into arbitrary computation from the metadata supplied. Yes, there’s a brainfuck compiler that turns code …

Four short links: 15 September 2014

By Nat Torkington
September 15, 2014

The Care and Feeding of Weird Machines Found in Executable Metadata (YouTube) — talk from 29th Chaos Communication Congress, on using tricking the ELF linker/loader into arbitrary computation from the metadata supplied. Yes, there’s a brainfuck compiler that turns code …

One man willingly gave Google his data. See what happened next.

By Jonas Luster
September 10, 2014

Despite some misgivings about the company’s product course and service permanence (I was an early and fanatical user of Google Wave), my relationship with Google is one of mutual symbiosis. Its “better mousetrap” approach to products and services, the width …

One man willingly gave Google his data. See what happened next.

By Jonas Luster
September 9, 2014

Despite some misgivings about the company’s product course and service permanence (I was an early and fanatical user of Google Wave), my relationship with Google is one of mutual symbiosis. Its “better mousetrap” approach to products and services, the width …

Small brains, big data

By Jeremy Freeman
September 8, 2014

When we think about big data, we usually think about the web: the billions of users of social media, the sensors on millions of mobile phones, the thousands of contributions to Wikipedia, and so forth. Due to recent innovations, web-scale …

Small brains, big data

By Jeremy Freeman
September 4, 2014

When we think about big data, we usually think about the web: the billions of users of social media, the sensors on millions of mobile phones, the thousands of contributions to Wikipedia, and so forth. Due to recent innovations, web-scale …

Small brains, big data

By Jeremy Freeman
September 4, 2014

When we think about big data, we usually think about the web: the billions of users of social media, the sensors on millions of mobile phones, the thousands of contributions to Wikipedia, and so forth. Due to recent innovations, web-scale …

Four short links: 3 September 2014

By Nat Torkington
September 3, 2014

Distributed Systems Theory for the Distributed Systems Engineer — I tried to come up with a list of what I consider the basic concepts that are applicable to my every-day job as a distributed systems engineer; what I consider ‘table …

Four short links: 3 September 2014

By Nat Torkington
September 3, 2014

Distributed Systems Theory for the Distributed Systems Engineer — I tried to come up with a list of what I consider the basic concepts that are applicable to my every-day job as a distributed systems engineer; what I consider ‘table …

Four short links: 3 September 2014

By Nat Torkington
September 3, 2014

Distributed Systems Theory for the Distributed Systems Engineer — I tried to come up with a list of what I consider the basic concepts that are applicable to my every-day job as a distributed systems engineer; what I consider ‘table …

Four short links: 1 September 2014

By Nat Torkington
September 1, 2014

Sibyl: Google’s System for Large Scale Machine Learning (YouTube) — keynote at DSN2014 acting as an intro to Sibyl. (via KD Nuggets) Bitrot from 1997 — That’s 205 failures, an actual link rot figure of 91%, not 57%. That leaves …

Four short links: 1 September 2014

By Nat Torkington
September 1, 2014

Sibyl: Google’s System for Large Scale Machine Learning (YouTube) — keynote at DSN2014 acting as an intro to Sibyl. (via KD Nuggets) Bitrot from 1997 — That’s 205 failures, an actual link rot figure of 91%, not 57%. That leaves …

Four short links: 1 September 2014

By Nat Torkington
September 1, 2014

Sibyl: Google’s System for Large Scale Machine Learning (YouTube) — keynote at DSN2014 acting as an intro to Sibyl. (via KD Nuggets) Bitrot from 1997 — That’s 205 failures, an actual link rot figure of 91%, not 57%. That leaves …

Four short links: 27 August 2014

By Nat Torkington
August 27, 2014

Discourse turns 1.0 — community/forum software that doesn’t suck. Programmable Matter (IEEE Spectrum) — recap of where research is going in this area. Liquibase — source control for your database. Apache 2.0 licensed. A Few Useful Things to Know About …

Four short links: 27 August 2014

By Nat Torkington
August 27, 2014

Discourse turns 1.0 — community/forum software that doesn’t suck. Programmable Matter (IEEE Spectrum) — recap of where research is going in this area. Liquibase — source control for your database. Apache 2.0 licensed. A Few Useful Things to Know About …

Four short links: 27 August 2014

By Nat Torkington
August 27, 2014

Discourse turns 1.0 — community/forum software that doesn’t suck. Programmable Matter (IEEE Spectrum) — recap of where research is going in this area. Liquibase — source control for your database. Apache 2.0 licensed. A Few Useful Things to Know About …

How Flash changes the design of database storage engines

By Andy Oram
August 22, 2014

Over the past decade, SSD drives (popularly known as Flash) have radically changed computing at both the consumer level — where USB sticks have effectively replaced CDs for transporting files — and the server level, where it offers a price/performance …

How Flash changes the design of database storage engines

By Andy Oram
August 22, 2014

Over the past decade, SSD drives (popularly known as Flash) have radically changed computing at both the consumer level — where USB sticks have effectively replaced CDs for transporting files — and the server level, where it offers a price/performance …

How Flash changes the design of database storage engines

By Andy Oram
August 22, 2014

Over the past decade, SSD drives (popularly known as Flash) have radically changed computing at both the consumer level — where USB sticks have effectively replaced CDs for transporting files — and the server level, where it offers a price/performance …

Building pipelines to facilitate data analysis

By Hadley Wickham
August 21, 2014

In every data analysis, you have to string together many tools. You need tools for data wrangling, visualisation, and modelling to understand what’s going on in your data. To use these tools effectively, you need to be able to easily …

Building pipelines to facilitate data analysis

By Hadley Wickham
August 21, 2014

In every data analysis, you have to string together many tools. You need tools for data wrangling, visualisation, and modelling to understand what’s going on in your data. To use these tools effectively, you need to be able to easily …

Building pipelines to facilitate data analysis

By Hadley Wickham
August 21, 2014

In every data analysis, you have to string together many tools. You need tools for data wrangling, visualisation, and modelling to understand what’s going on in your data. To use these tools effectively, you need to be able to easily …

Four short links: 21 August 2014

By Nat Torkington
August 21, 2014

Dat — an open source project that provides a streaming interface between every file format and data storage backend. See the Wired piece on it. Smithsonian Crowdsourcing Transcription (Smithsonian) — 49 volunteers transcribed 200 pages of correspondence between the Monuments …

Four short links: 21 August 2014

By Nat Torkington
August 21, 2014

Dat — an open source project that provides a streaming interface between every file format and data storage backend. See the Wired piece on it. Smithsonian Crowdsourcing Transcription (Smithsonian) — 49 volunteers transcribed 200 pages of correspondence between the Monuments …

Four short links: 21 August 2014

By Nat Torkington
August 21, 2014

Dat — an open source project that provides a streaming interface between every file format and data storage backend. See the Wired piece on it. Smithsonian Crowdsourcing Transcription (Smithsonian) — 49 volunteers transcribed 200 pages of correspondence between the Monuments …

Four short links: 20 August 2014

By Nat Torkington
August 20, 2014

Machine Learning for Plant Properties — startup building database of plant genomics, properties, research, etc. for mining. The more familiar you are with your data and its meaning, the better your machine learning will be at suggesting fruitful lines of …

Four short links: 20 August 2014

By Nat Torkington
August 20, 2014

Machine Learning for Plant Properties — startup building database of plant genomics, properties, research, etc. for mining. The more familiar you are with your data and its meaning, the better your machine learning will be at suggesting fruitful lines of …

Four short links: 20 August 2014

By Nat Torkington
August 20, 2014

Machine Learning for Plant Properties — startup building database of plant genomics, properties, research, etc. for mining. The more familiar you are with your data and its meaning, the better your machine learning will be at suggesting fruitful lines of …

Ten years of OpenStreetMap

By Tyler Bell
August 15, 2014

Next to GPS, the most significant development in the Open Geo Data movement is OpenStreetMap (OSM), a community-driven mapping project whose goal is to create the most detailed, correct, and current open map of the world. This week, OSM celebrates …

Ten years of OpenStreetMap

By Tyler Bell
August 15, 2014

Next to GPS, the most significant development in the Open Geo Data movement is OpenStreetMap (OSM), a community-driven mapping project whose goal is to create the most detailed, correct, and current open map of the world. This week, OSM celebrates …

Ten years of OpenStreetMap

By Tyler Bell
August 15, 2014

Next to GPS, the most significant development in the Open Geo Data movement is OpenStreetMap (OSM), a community-driven mapping project whose goal is to create the most detailed, correct, and current open map of the world. This week, OSM celebrates …

Four short links: 13 August 2014

By Nat Torkington
August 13, 2014

Viv — another step in the cognition race. Wolfram Alpha was first out the gate, but Watson, Viv, and others are hot on heels of being able to parse complex requests, then seek and use information to fulfil them. Universal …

Four short links: 13 August 2014

By Nat Torkington
August 13, 2014

Viv — another step in the cognition race. Wolfram Alpha was first out the gate, but Watson, Viv, and others are hot on heels of being able to parse complex requests, then seek and use information to fulfil them. Universal …

Four short links: 13 August 2014

By Nat Torkington
August 13, 2014

Viv — another step in the cognition race. Wolfram Alpha was first out the gate, but Watson, Viv, and others are hot on heels of being able to parse complex requests, then seek and use information to fulfil them. Universal …

Four short links: 7 August 2014

By Nat Torkington
August 7, 2014

Material Design in the Google I/O App (Medium) — steps through design thinking as they put Google’s new design metaphor in place. I’ve been chewing on material design. It brings an internal consistency and logic to the Android world that …

Four short links: 7 August 2014

By Nat Torkington
August 7, 2014

Material Design in the Google I/O App (Medium) — steps through design thinking as they put Google’s new design metaphor in place. I’ve been chewing on material design. It brings an internal consistency and logic to the Android world that …

Four short links: 7 August 2014

By Nat Torkington
August 7, 2014

Material Design in the Google I/O App (Medium) — steps through design thinking as they put Google’s new design metaphor in place. I’ve been chewing on material design. It brings an internal consistency and logic to the Android world that …

Scaling up data frames

By Ben Lorica
August 7, 2014

Long before the advent of “big data,” analysts were building models using tools like R (and its forerunners S/S-PLUS). Productivity hinged on tools that made data wrangling, data inspection, and data modeling convenient. Among R users, this meant proficiency with …

Scaling up data frames

By Ben Lorica
August 7, 2014

Long before the advent of “big data,” analysts were building models using tools like R (and its forerunners S/S-PLUS). Productivity hinged on tools that made data wrangling, data inspection, and data modeling convenient. Among R users, this meant proficiency with …

Scaling up data frames

By Ben Lorica
August 7, 2014

Long before the advent of “big data,” analysts were building models using tools like R (and its forerunners S/S-PLUS). Productivity hinged on tools that made data wrangling, data inspection, and data modeling convenient. Among R users, this meant proficiency with …

Four short links: 6 August 2014

By Nat Torkington
August 6, 2014

Mesa: Geo-Replicated, Near Real-Time, Scalable Data Warehousing (PDF) — paper by Googlers on the database holding G’s ad data. Trillions of rows, petabytes of data, point queries with 99th percentile latency in the hundreds of milliseconds and overall query throughput …

Four short links: 6 August 2014

By Nat Torkington
August 6, 2014

Mesa: Geo-Replicated, Near Real-Time, Scalable Data Warehousing (PDF) — paper by Googlers on the database holding G’s ad data. Trillions of rows, petabytes of data, point queries with 99th percentile latency in the hundreds of milliseconds and overall query throughput …

Four short links: 6 August 2014

By Nat Torkington
August 6, 2014

Mesa: Geo-Replicated, Near Real-Time, Scalable Data Warehousing (PDF) — paper by Googlers on the database holding G’s ad data. Trillions of rows, petabytes of data, point queries with 99th percentile latency in the hundreds of milliseconds and overall query throughput …

Four short links: 5 August 2014

By Nat Torkington
August 5, 2014

Discussion Graph Tool (Microsoft Research) — simplifies social media analysis by making it easy to extract high-level features and co-occurrence relationships from raw data. Superlinear Productivity in Collective Group Actions (PLoS ONE) — study of open source projects shows small …

Four short links: 5 August 2014

By Nat Torkington
August 5, 2014

Discussion Graph Tool (Microsoft Research) — simplifies social media analysis by making it easy to extract high-level features and co-occurrence relationships from raw data. Superlinear Productivity in Collective Group Actions (PLoS ONE) — study of open source projects shows small …

Four short links: 5 August 2014

By Nat Torkington
August 5, 2014

Discussion Graph Tool (Microsoft Research) — simplifies social media analysis by making it easy to extract high-level features and co-occurrence relationships from raw data. Superlinear Productivity in Collective Group Actions (PLoS ONE) — study of open source projects shows small …

Four short links: 4 August 2014

By Nat Torkington
August 4, 2014

EtherCalc — open source web-based spreadsheet. Dynamics of Correlated Novelties (Nature) — paper on “the adjacent possible”. Here we propose a simple mathematical model that mimics the process of exploring a physical, biological, or conceptual space that enlarges whenever a …

Four short links: 4 August 2014

By Nat Torkington
August 4, 2014

EtherCalc — open source web-based spreadsheet. Dynamics of Correlated Novelties (Nature) — paper on “the adjacent possible”. Here we propose a simple mathematical model that mimics the process of exploring a physical, biological, or conceptual space that enlarges whenever a …


1 to 50 of 2125 Next
The Watering Hole