Part 3. Big data patterns

Now that you’ve completed part 1, which introduced you to Hadoop, and part 2, which covered how to best move and store your data in Hadoop, you’re ready to explore part 3 of this book, which examines the techniques you need to know to streamline your work with big data.

In chapter 4 we’ll examine techniques to optimize MapReduce operations, such as joining and sorting on large datasets. These techniques make jobs run faster and allow for more efficient use of computational resources.

Chapter 5 applies the same principles to HDFS and looks at how to work with small files, as well as how compression can save you from many storage and computational headaches.

Finally, chapter 6 looks at how to measure, collect, and ...

Get Hadoop in Practice now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.