Book description
Until now, design patterns for the MapReduce framework have been scattered among various research papers, blogs, and books. This handy guide brings together a unique collection of valuable MapReduce patterns that will save you time and effort regardless of the domain, language, or development framework you’re using.
Each pattern is explained in context, with pitfalls and caveats clearly identified to help you avoid common design mistakes when modeling your big data architecture. This book also provides a complete overview of MapReduce that explains its origins and implementations, and why design patterns are so important. All code examples are written for Hadoop.
- Summarization patterns: get a top-level view by summarizing and grouping data
- Filtering patterns: view data subsets such as records generated from one user
- Data organization patterns: reorganize data to work with other systems, or to make MapReduce analysis easier
- Join patterns: analyze different datasets together to discover interesting relationships
- Metapatterns: piece together several patterns to solve multi-stage problems, or to perform several analytics in the same job
- Input and output patterns: customize the way you use Hadoop to load or store data
"A clear exposition of MapReduce programs for common data processing patterns—this book is indespensible for anyone using Hadoop."
--Tom White, author of Hadoop: The Definitive Guide
Publisher resources
Table of contents
- Dedication
- Preface
- 1. Design Patterns and MapReduce
- 2. Summarization Patterns
- 3. Filtering Patterns
- 4. Data Organization Patterns
- 5. Join Patterns
- 6. Metapatterns
- 7. Input and Output Patterns
- 8. Final Thoughts and the Future of Design Patterns
- A. Bloom Filters
- Index
- About the Authors
- Colophon
- Copyright
Product information
- Title: MapReduce Design Patterns
- Author(s):
- Release date: December 2012
- Publisher(s): O'Reilly Media, Inc.
- ISBN: 9781449327170
You might also like
book
Design Patterns for Cloud Native Applications
With the immense cost savings and scalability the cloud provides, the rationale for building cloud native …
book
Java Concurrency in Practice
"I was fortunate indeed to have worked with a fantastic team on the design and implementation …
book
Machine Learning Design Patterns
The design patterns in this book capture best practices and solutions to recurring problems in machine …
book
Architecture Patterns with Python
As Python continues to grow in popularity, projects are becoming larger and more complex. Many Python …