Appendix D. The Old and New Java MapReduce APIs
The Java MapReduce API used throughout this book is called the “new API,” and it replaces the older,
functionally equivalent API. Although Hadoop ships with both the old and new MapReduce APIs,
they are not compatible with each other. Should you wish to use the old API, you can, since the
code for all the MapReduce examples in this book is available for the old API on the book’s
website (in the oldapi
package).
There are several notable differences between the two APIs:
The new API is in the
org.apache.hadoop.mapreduce
package (and subpackages). The old API can still be found inorg.apache.hadoop.mapred
.The new API favors abstract classes over interfaces, since these are easier to evolve. This means that you can add a method (with a default implementation) to an abstract class without breaking old implementations of the class.[168] For example, the
Mapper
andReducer
interfaces in the old API are abstract classes in the new API.The new API makes extensive use of context objects that allow the user code to communicate with the MapReduce system. The new
Context
, for example, essentially unifies the role of theJobConf
, theOutputCollector
, and theReporter
from the old API.In both APIs, key-value record pairs are pushed to the mapper and reducer, but in addition, the new API allows both mappers and reducers to control the execution flow by overriding the
run()
method. For example, records can be processed in batches, or the execution can ...
Get Hadoop: The Definitive Guide, 4th Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.