Part 1. Batch layer

Part 1 focuses on the batch layer of the Lambda Architecture. Chapters alternate between theory and illustration.

Chapter 2 discusses how you model and schematize the data in your master dataset. Chapter 3 illustrates these concepts using the tool Apache Thrift.

Chapter 4 discusses the requirements for storage of your master dataset. You’ll see that many features typically provided by database solutions are not needed for the master dataset, and in fact get in the way of optimizing master dataset storage. A simpler and less feature-full storage solution meets the requirements better. Chapter 5 illustrates practical storage of a master dataset using the Hadoop Distributed Filesystem.

Chapter 6 discusses computing arbitrary ...

Get Big Data now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.