The problem of managing schemas
Schemas inevitably will change — Apache Avro offers an elegant solution.
Our take on the ideas, information, and tools that make data work.
Schemas inevitably will change — Apache Avro offers an elegant solution.
Solutions to a number of problems must be found to unlock PAPI value.
How NoSQL databases scale vertically and horizontally, and what you should consider when building a DB cluster.
High-performing memory throws many traditional decisions overboard
A new operator from the magrittr package makes it easier to use R for data analysis.
Learn simple ways to improve data models by cleaning up and tweaking the distribution of training data.
OSM is moving out of its awkward adolescence and into its mature, young adult phase.
New frameworks for interactive business analysis and advanced analytics fuel the rise in tabular data objects.
What do you get if you cross a distributed database with a stream processing system?
Addressing in-memory limitations and scalability issues of R.
Business users are becoming more comfortable with graph analytics.
A practical example of how anomaly detection makes complex data problems easier to solve.
Researchers and startups are building tools that enable feature discovery.
Many more companies want to highlight how they're using Apache Spark in production.
There are many ways a system can be like the brain, but only a fraction of these will prove important.
Casting a critical eye on the exciting developments in the world of AI.
Commerce and censorship in in cross-cultural social media.
An array of tools for tackling data visualizations.
It has roots in academic scientific computing, but has features that appeal to many data scientists.
Resources for getting started with Apache Mesos.
Mesos offers reliability, efficiency, and faster developer productivity.
Apache Hadoop 2.0 represents a generational shift in the architecture of Apache Hadoop.
4.6 million phone numbers, is one of them yours?
It's an extensive, well-documented, and accessible, curated library of machine-learning models