MapReduce

MapReduce is a programming pattern used by Apache Hadoop. Hadoop MapReduce works in providing the systems that can store, process, and mine huge data with parallel multi node clusters in a scalable, reliable, and error-absorbing inexpensive distributed system. In MapReduce, the data analysis and data processing are split into individual phases called the Map phase and Reduce phase as represented in the following figure:

In the preceding diagram, the word count process is handled by MapReduce with multiple phases. The set of words in Input are first split into three nodes in the (K1, V1) process. These three split nodes communicate ...

Get Distributed Computing in Java 9 now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.