Moving beyond big data to smart data
William Plummer on the “closed loop of data.”
The growth of the consumer Internet has left many companies awash in analytics data. In this O’Reilly Podcast episode, I talk with William Plummer, chief strategy officer at TalkingData, about how to make sense of it.
Plummer’s key message is that companies need to embrace what he calls “smart data.” Smart data is an evolution of the “big data” idea that enterprises have been pursuing for about a decade now. Plummer links smart data to the two developments that have transformed the practice of data science over the last couple of decades: the arrival of nearly universal connectivity and the advent of the consumer internet.
Enterprises have rushed to collect the resulting data and use it for analytics. Smart data, says Plummer, is “the marriage between massive data sets and data processing advancements and tools in both machine learning and artificial intelligence.”
Plummer points to an analogy from Andrew Ng, the Baidu chief scientist and Stanford machine learning researcher: artificial intelligence is a rocket; data is the fuel, and the algorithm is the engine. In order to launch, you need both. Ng has said that the growth of available data outpaced the development of algorithms and deployment of computing resources for a while, but that the algorithms are now catching up. “When we say ‘smart,’ what we really mean is ‘more useful,’” Plummer says.
So, how might a company embrace smart data? Plummer lists two critical attributes of “smart enterprises.” They systematically plan for and execute data collection, and they are “data-aware”—that is, they believe that data will actually be actionable. “Beautiful dashboards are great, but we try to deliver something that’s a step further,” says Plummer.
Smart enterprises also embrace what Plummer calls “the closed loop of data,” a practice that carefully tunes not only data collection and analytics systems, but also the way that companies act on the insights they draw from data. There are four steps to the loop: data acquisition, preparation, and enrichment; analytics; deployment in a specific use case; and, finally, assessment. The last step is essential; smart enterprises use it to improve their operations. “If you think of [these steps] as a loop, rather than a straight-line output, you really get the benefit of multiple iterations,” says Plummer.
Also in this episode: some insight into the state of data science in China. Plummer is based in Beijing, and he describes the “difference in capabilities” between enterprises in China and enterprises in the United States. “In China, companies are using the latest tools in their big-data strategy,” says Plummer, but never invested in the data-warehousing tools that American companies built out over the last decade.
This post and podcast is a collaboration between O’Reilly and TalkingData. See our statement of editorial independence.