In defense of the pie chart
Common mistakes that thwart a simple data visualization technique.
Ideas and resources related to data tools.
Common mistakes that thwart a simple data visualization technique.
Moving data transformation into the hands of administrators, analysts, and other non-developers.
How to build, maintain, and derive value from your Hadoop data lake.
Exploring the intersections and compatibility of data science and procurement.
Query languages, like BQL, offer a bridge between domain experts and software experts.
Dive into creating your own databases and learn how to design them efficiently.
Guaranteeing data availability in distributed systems.
Jesse Anderson walks viewers through the path data can take from publishers through a Kafka cluster and on to consumers.
Over-allocated but underutilized clusters require more than best practice solutions.
Metadata, governance, and other considerations for building ground-to-cloud.
Comparing the effects of storage format, modeling/filtering, caching, and other effects on analytical query speed and storage cost.
Knowing the architectures is key to thinking strategically and delivering value.
How to group users’ events using machine learning and distributed computing
Choosing the right tools for the job.
Ranking algorithms bolster intrusion detection systems.
Globalize your data with Apache Cassandra.
Data management is an important step in deriving business value from your Hadoop data lake.
Streaming analytics are only worthwhile if the data leads to action.
Identification of data sources is the first step in warehouse development. In this video training segment, Michael Blaha provides a framework by reviewing data modeling constructs and terminology, including dependent and independent entity types. Using IE (information engineering) notation and the ERwin tool, Blaha walks through a sample operational data model.
Everyone loves data, so it's no surprise that we've been innovating by orders of magnitude in data storage. But has analytics innovation kept up?
What it looks like to analyze, visualize, and even forecast human society using global news coverage.
Consolidating data across silos improves business insight.
How Baidu combined Tachyon with Spark SQL to increase speed 30-fold
In this O'Reilly training video, the "Hadoop Application Architectures" authors present an end-to-end case study of a clickstream analytics engine to provide a concrete example of how to architect and implement a complete solution with Hadoop. In this segment, they provide an overview of the complete architecture. Presenters: Mark Grover, Gwen Shapira, Jonathan Seidman, Ted Malaska