Tweets loud and quiet
Twitter’s long, long, long tail suggests the service is less democratic than it seems.
Our take on the ideas, information, and tools that make data work.
Twitter’s long, long, long tail suggests the service is less democratic than it seems.
There's a lot of new ground to be explored in large-scale image processing.
Python and Scala are popular among members of several well-attended SF Bay Area Meetups
We are in the early days of productivity technology in data science
The inaugural Spark Summit will feature a wide variety of real-world applications
Myths and Realities
You're sitting on a pile of interesting data. How do you transform that into money?
A general purpose stream processing framework from the team behind Kafka and new techniques for computing approximate quantiles.
A distributed, near real-time system simplifies the collection, storage, and mining of massive amounts of event data
Specialized tools run the risk of being replaced by others that have more coverage.
Tools simplify the application of advanced analytics and the interpretation of results
Increasingly available data spurs organizations to make analysis easier
As data sizes continue to grow, interactive query systems may start adopting the sampling approach central to BlinkDB.
A new crop of data science tools for deploying, monitoring, and maintaining models
Graph data is an area that has attracted many enthusiastic entrepreneurs and developers
Visual analysis tools are adding advanced analytics for big data
Tachyon enables data sharing across frameworks and performs operations at memory speed
Researchers begin to scale up pattern recognition, machine-learning, and data management tools.
Insights from a Strata Santa Clara 2013 session.
A variety of tools are making data science tasks easy to do in Python
Two open data items of note from readers.
Shark is 100X faster than Hive for SQL, and 100X faster than Hadoop for machine-learning
Spark is becoming a key part of a big data toolkit.
DJ Patil details a new approach to solving data problems: use a problem's "weight" against itself.