Running a word count application using Spark
How to use Apache Spark’s Resilient Distributed Dataset (RDD) API.
How to use Apache Spark’s Resilient Distributed Dataset (RDD) API.
A look at the tools and patterns for accessing and processing data in Hadoop.
Mark Grover and Ted Malaska offer an overview of projects for streaming applications, including Kafka, Flume, and Spark Streaming, and discuss the architectural schemas available, such as Lambda and Kappa.
How decoupling, optimization, and specialization resemble connective systems in our bodies.
Ted Malaska explains how long hours of training, blisters, and shin splints relate to life-changing lessons in software architecture.
Good code comes from motivation and fresh minds.
In this O'Reilly training video, the "Hadoop Application Architectures" authors present an end-to-end case study of a clickstream analytics engine to provide a concrete example of how to architect and implement a complete solution with Hadoop. In this segment, they provide an overview of the complete architecture. Presenters: Mark Grover, Gwen Shapira, Jonathan Seidman, Ted Malaska