Blogs

BROWSE: Most Recent | Popular Tags |

Tags > data

Improving options for unlocking your graph data

By Ben Lorica
May 19, 2013

The popular open source project GraphLab received a major boost early this week when a new company comprised of its founding developers, raised funding to develop analytic tools for graph data sets. GraphLab Inc. will continue to use the open …

Strata Week: Are customized Google maps a neutrality win or the next “filter bubble”?

By Jenn Webb
May 17, 2013

Google aims for a new level of map customization Google introduced a new version of Google maps at Google I/O this week that learns from each use to customize itself to individual users, adapting based on user clicks and searches. …

Google I/O, Big Data Adolescence, Visualization, and the Future of Open Source

By Adam Flaherty
May 17, 2013

Google I/O: O’Reilly Editor Rachel Roumeliotis reports from the conference floor. Big Data, Cool Kids: Fumbling toward the adolescence of big data tools. Code as Art: Interactive Data Visualization for the Web author Scott Murray on becoming a code artist. …

Six disruptive possibilities from big data

By Jeff Needham
May 15, 2013

My new book, Disruptive Possibilities: How Big Data Changes Everything, is derived directly from my experience as a performance and platform architect in the old enterprise world and the new, Internet-scale world. I pre-date the Hadoop crew at Yahoo!, but …

Visualization of the Week: Real-time Wikipedia edits

By Jenn Webb
May 15, 2013

Stephen LaPorte and Mahmoud Hashemi have put together an addictive visualization of real-time edits on Wikipedia, mapped across the world. Every time an edit is made, the user’s location and the entry they edited are listed along with a corresponding …

Big data, cool kids

By Edd Dumbill
May 14, 2013

The big data world is a confusing place. We’re no longer in a market dominated mostly by relational databases, and the alternatives have multiplied in a baby boom of diversity. These child prodigies of the data scene show great promise …

Four short links: 14 May 2013

By Nat Torkington
May 14, 2013

Behind the Banner — visualization of what happens in the 150ms when the cabal of data vultures decide which ad to show you. They pass around your data as enthusiastically as a pipe at a Grateful Dead concert, and you’ve …

Big data, cool kids

By Edd Dumbill
May 13, 2013

The big data world is a confusing place. We’re no longer in a market dominated mostly by relational databases, and the alternatives have multiplied in a baby boom of diversity. These child prodigies of the data scene show great promise …

Genomics and Privacy at the Crossroads

By James Turner
May 13, 2013

Two weeks ago, I had the privilege to attend the 2013 Genomes, Environments and Traits conference in Boston, as a participant of Harvard Medical School’s Personal Genome Project. Several hundreds of us attended the conference, eager to learn what new breakthroughs might …

Evaluating machine learning systems: Kaggle’s not enough

By Beau Cronin
May 11, 2013

There is a tremendous amount of commercial attention on machine learning (ML) methods and applications. This includes product and content recommender systems, predictive models for churn and lead scoring, systems to assist in medical diagnosis, social network sentiment analysis, and …

11 Essential Features that Visual Analysis Tools Should Have

By Ben Lorica
May 11, 2013

After recently playing with SAS Visual Analytics, I’ve been thinking about tools for visual analysis. By visual analysis I mean the type of analysis most recently popularized by Tableau, QlikView, and Spotfire: you encounter a data set for the first …

Strata Week: President Obama opens up U.S. government data

By Jenn Webb
May 10, 2013

U.S. government data to be machine-readable, Nicole Wong may fill new White House chief privacy officer role The U.S. government took major steps this week to open up government data to the public. U.S. President Obama signed an executive order …

Four short links: 10 May 2013

By Nat Torkington
May 10, 2013

The Remixing Dilemma — summary of research on remixed projects, finding that (1) Projects with moderate amounts of code are remixed more often than either very simple or very complex projects. (2) Projects by more prominent creators are more generative. …

Genomics and Privacy at the Crossroads

By James Turner
May 9, 2013

Two weeks ago, I had the privilege to attend the 2013 Genomes, Environments and Traits conference in Boston, as a participant of Harvard Medical School’s Personal Genome Project. Several hundreds of us attended the conference, eager to learn what new breakthroughs might …

Steering the ship that is data science

By Q Ethan McCallum
May 8, 2013

Mike Loukides recently recapped a conversation we’d had about leading indicators for data science efforts in an organization. We also pondered where the role of data scientist is headed and realized we could treat software development as a prototype case. It’s easy (if …

Another serving of data skepticism

By Mike Loukides
May 8, 2013

I was thrilled to receive an invitation to a new meetup: the NYC Data Skeptics Meetup. If you’re in the New York area, and you’re interested in seeing data used honestly, stop by! That announcement pushed me to write another post …

Visualization of the Week: Building collapse rescue efforts

By Jenn Webb
May 8, 2013

In the wake of recent building collapses, the BBC addressed the question of what goes into the rescue efforts by creating an interactive guide outlining how rescuers approach a collapsed building. Using information from the International Rescue Corps, the BBC …

On becoming a code artist

By Ann Spencer
May 7, 2013

Scott Murray, a code artist, has written Interactive Data Visualization for the Web for nonprogrammers. In this interview, Scott provides some insights on what inspired him to write an introduction to D3 for artists, graphic designers, journalists, researchers, or anyone …

A different take on data skepticism

By Beau Cronin
May 7, 2013

Recently, the Mathbabe (aka Cathy O’Neil) vented some frustration about the pitfalls in applying even simple machine learning (ML) methods like k-nearest neighbors. As data science is democratized, she worries that naive practitioners will shoot themselves in the foot because these tools can …

Steering the ship that is data science

By Q Ethan McCallum
May 7, 2013

Mike Loukides recently recapped a conversation we’d had about leading indicators for data science efforts in an organization. We also pondered where the role of data scientist is headed and realized we could treat software development as a prototype case. …

Another Serving of Data Skepticism

By Mike Loukides
May 6, 2013

I was thrilled to receive an invitation to a new meetup: the NYC Data Skeptics Meetup. If you’re in the New York area, and you’re interested in seeing data used honestly, stop by! That announcement pushed me to write another …

Upward Mobility: Unit Testing Core Data

By James Turner
May 6, 2013

One of the more common issues that arises in creating OCUnit tests in iOS is how to test code that uses Core Data. There are several challenges, but with a little foresight, you can be sailing right along. The first …

Scalable streaming analytics using a single-server

By Ben Lorica
May 5, 2013

For many organizations real-time1 analytics entails complex event processing systems (CEP) or newer distributed stream processing frameworks like Storm, S4, or Spark Streaming. The latter have become more popular because they are able to scale (ingest) massive amounts of data, …

Strata Week: The power of the Internet, wielded by machines and things

By Jenn Webb
May 3, 2013

Soon, everything will be an Internet platform Ben Schiller at Fast Company took a look this week at a recent report by Jon Bruner on the industrial Internet. “According to Jon Bruner [the industrial Internet] is ‘machines becoming nodes on …

Leading Indicators

By Mike Loukides
May 2, 2013

In a conversation with Q Ethan McCallum (who should be credited as co-author), we wondered how to evaluate data science groups. If you’re looking at an organization’s data science group from the outside, possibly as a potential employee, what can …

Conquering iOS Core Data

By Rachel Roumeliotis
May 2, 2013

Joshua Smith (@kognate) is a Lead Mobile Developer at iRx Reminder, frequent Cocoa Conference speaker and author of an upcoming book with O’Reilly on core data. We recently sat down to talk about core data and its complexities. What exactly …

Visualization of the Week: A DDoS attack on VideoLAN downloads infrastructure

By Jenn Webb
May 1, 2013

In the wake of a recent DDoS attack on open source software distributor VideoLAN, developer Ludovic Fauvet created a video visualization to show what the attack looked like. As Ryan W. Neal notes in a post at International Business Times, …

Linking open data to augmented intelligence and the economy

By Alex Howard
April 30, 2013

After years of steady growth, open data is now entering into public discourse, particularly in the public sector. If President Barack Obama decides to put the White House’s long-awaited new open data mandate before the nation this spring, it will …

Leading Indicators

By Mike Loukides
April 30, 2013

In a conversation with Q Ethan McCallum (who should be credited as co-author), we wondered how to evaluate data science groups. If you’re looking at an organization’s data science group from the outside, possibly as a potential employee, what can …

Data sharing drives diagnoses and cures, if we can get there (part 2)

By Andy Oram
April 29, 2013

Editor’s note: Earlier this week, Part 1 of this article described Sage Bionetworks, a recent Congress they held, and their way of promoting data sharing through a challenge. Data sharing is not an unfamiliar practice in genetics. Plenty of cell …

Data sharing drives diagnoses and cures, if we can get there (part 1)

By Andy Oram
April 29, 2013

The glowing reports we read of biotech advances almost cause one’s brain to ache. They leave us thinking that medical researchers must command the latest in all technological tools. But the engines of genetic and pharmaceutical innovation are stuttering for …

Pricing decisions are going to be made whether you have analytics behind it or not

By Janaya Williams
April 29, 2013

In his role as chief scientist at Atlanta-based consulting firm Revenue Analytics, Jon Higbie helps clients make sound pricing decisions for everything from hotel rooms, to movie theater popcorn, to that carton of OJ in the fridge. And in the …

Tachyon: An open source, distributed, fault-tolerant, in-memory file system

By Ben Lorica
April 28, 2013

In earlier posts I’ve written about how Spark and Shark run much faster than Hadoop and Hive by1 caching data sets in-memory. But suppose one wants to share datasets across jobs/frameworks, while retaining speed gains garnered by being in-memory? An …

Strata Week: Revolutionizing human resource management with work-force science

By Jenn Webb
April 26, 2013

Big data replaces gut instinct in HR management In a post at the New York Times, Steve Lohr took a look this week at a new data discipline: work-force science. The field pairs big data with human resources to help …

A Day at the 2013 Genomes, Environments and Traits Conference

By James Turner
April 26, 2013

The GET (Genomes, Environments and Traits) conference is a confluence of parties interested in the advances being made in human genomes, the measurement of how the environment impacts individuals, and how the two come together to produce traits.  Sponsored by …

Every leader has their “how I got here” story

By O'Reilly Strata
April 25, 2013

On Goldstein, McCallum, and their upcoming book, Making Analytics Work: Case by Case By Alex Howard People have been crunching numbers to understand government since the first time an official used an abacus to compare one season’s grain harvest against …

Do publishers have the right people on the bus?

By Michael Foy
April 25, 2013

I know from talking to many of my clients that most have read Jim Collins’ book ‘Good to Great’. I have also been inspired by his research into what makes great companies great. Many of you will recall an article …

Visualization of the Week: Every recorded U.S terror attack 1970-2011

By Jenn Webb
April 24, 2013

The recent terror attack at the Boston Marathon prompted the Guardian’s Simon Rogers (who will soon be Twitter’s Simon Rogers) to look into the history of attacks on U.S. soil. Using data from the START Global Terrorism Database, Rogers mapped …

Four short links: 23 April 2013

By Nat Torkington
April 23, 2013

Drawscript — Processing for Illustrator. (via BERG London) Archive Team Warrior — a virtual archiving appliance. You can run it to help with the ArchiveTeam archiving efforts. It will download sites and upload them to our archive. (via Ed Vielmetti) …

Simpler workflow tools enable the rapid deployment of models

By Ben Lorica
April 21, 2013

Data science often depends on data pipelines, that involve acquiring, transforming, and loading data. (If you’re fortunate most of the data you need is already in usable form.) Data needs to be assembled and wrangled, before it can be visualized …

A different take on data skepticism

By Beau Cronin
April 19, 2013

Recently, the Mathbabe (aka Cathy O’Neil) vented some frustration about the pitfalls in applying even simple machine learning (ML) methods like k-nearest neighbors. As data science is democratized, she worries that naive practitioners will shoot themselves in the foot because …

Strata Week: Movers and shakers on the data journalism front

By Jenn Webb
April 19, 2013

Reuters launches Connected China, Pew instructs on downloading its data, and Twitter gets a data editor Yue Qiu and Wenxiong Zhang took a look this week at a data journalism effort by Reuters, the Connected China visualization application. Qiu and …

Four short links: 19 April 2013

By Nat Torkington
April 19, 2013

Bruce Sterling on Disruption — If more computation, and more networking, was going to make the world prosperous, we’d be living in a prosperous world. And we’re not. Obviously we’re living in a Depression. Slow first 25% but then it …

Finding and telling data-driven stories in billions of tweets

By Alex Howard
April 18, 2013

Twitter has hired its first data editor. Simon Rogers, one of the leading practitioners of data journalism in the world, will join Twitter. He will be moving his family from London to San Francisco and applying his skills to telling data-driven …

What is probabilistic programming?

By Beau Cronin
April 18, 2013

Probabilistic programming languages are in the spotlight. This is due to the announcement of a new DARPA program to support their fundamental research. But what is probabilistic programming? What can we expect from this research? Will this effort pay off? How long …

Sprinting toward the future of Jamaica

By Alex Howard
April 18, 2013

Creating the conditions for startups to form is now a policy imperative for governments around the world, as Julian Jay Robinson, minister of state in Jamaica’s Ministry of Science, Technology, Energy and Mining, reminded the attendees at the “Developing the …

Four short links: 18 April 2013

By Nat Torkington
April 18, 2013

The Well Deserved Fortune of Satoshi Nakamoto — I can’t assure with 100% certainty that the all the black dots are owned by Satoshi, but almost all are owned by a single entity, and that entity began mining right from …

Visualization of the Week: Commuting Paris

By Jenn Webb
April 17, 2013

The team at Dataveyes has launched its latest project, Metropolitain.io, an interactive map visualizing the Paris metro system. Using data provided by Autonomous Operator of Parisian Transports (RATP) and from Isokron, the team visualized the metro system from both a …

Four short links: 16 April 2013

By Nat Torkington
April 16, 2013

Triage — iPhone app to quickly triage your email in your downtime. See also the backstory. Awesome UI. Webcam Pulse Detector — I was wondering how long it would take someone to do the Eulerian video magnification in real code. …

Single server systems can tackle big data

By Ben Lorica
April 13, 2013

About a year ago a blog post from SAP posited1 that when it comes to analytics, most companies are in the multi-terabyte range: data sizes that are well-within the scope of distributed in-memory solutions like Spark, SAP HANA, ScaleOut Software, …


1 to 50 of 1675 Next
The Watering Hole