Errata

Errata for Hadoop: The Definitive Guide

Submit your own errata for this product.

The errata list is a list of errors and their corrections that were found after the product was released. If the error was corrected in a later version or reprint the date of the correction will be displayed in the column titled "Date Corrected".

The following errata were submitted by our customers and approved as valid errors by the author or editor.

Color key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update

Version	Location	Description	Submitted By	Date submitted	Date corrected
	37 3rd paragraph (box note)	boxnote says: "In this case, you need to implement the close() method so that you know when the last record has been read, so you can finish processing the last group of lines." org.apache.hadoop.mapreduce.Mapper doesn't have close method to override, so implementing close method will not help since it's not called anywhere. However it has empty cleanup method that can be overridden specifically for such kind of purpose. Note from the Author or Editor: Change "In this case, you need to implement the close() method" to "In this case, you need to implement the cleanup() method". This is on p38 in the 4th ed print edition.	Anonymous	Mar 04, 2015	Apr 17, 2015
Printed, PDF	Page 118 6th paragraph, first line (two lines before "The specific API" subsection).	The text says, "the objects returned by result.get("left") and result.get("left") are of type Utf-8, so we can convert them into Java String objects by calling their toString() methods." We are discussing the objects returned by result.get on "left" and "right", not "left" and "left". The text should read, "the objects returned by result.get("left") and result.get("right") are of type Utf-8, so we can convert them into Java String objects by calling their toString() methods." Note from the Author or Editor: Change the second result.get("left") to result.get("right"). This appears on p351 in the 4th edition.	Myles Baker	Jan 21, 2015	Apr 17, 2015
ePub	Page N/A (ebook) N/A (ebook)	From Chapter 1, "A Brief History of Hadoop" “Early in 2005, the Nutch developers had a working MapReduce implementation in Nutch, and by the middle of that year. All the major Nutch algorithms had been ported to run using MapReduce and NDFS.” I think this was intended to say (note punctuation after "middle of that year"): “Early in 2005, the Nutch developers had a working MapReduce implementation in Nutch, and by the middle of that year, all the major Nutch algorithms had been ported to run using MapReduce and NDFS.”	Trevor Harmon	Oct 21, 2014
PDF	Page 299 Last paragraph	In describing network topology, reference is made to switches and routers connecting machines on a rack. "GB" is used to refer to gigabit when it should probably be written as "Gb." Note from the Author or Editor: Change "with a 1 GB switch" to "with a 1 Gb switch"; and "normally 1 GB or better" to "normally 1 Gb or better".	Dima Spivak	Apr 14, 2014
PDF	Page 65 1st paragraph	A tiny typo: the last sentence of the first paragraph (under the heading "File patterns") reads: "Hadoop provides two FileSystem method for processing globs..." The word "method" should be pluralized (i.e. "methods").	Dima Spivak	Apr 11, 2014
Printed	Page 134 United States	When reading about reader.sync and reader.getPosition it led me to believe that the example output shown on page 134 would work regardless of the compression used. The positions increment in an orderly fashion as you iterate through the rows. With compression, the read position apparently stays the same for all the records in the decompressed block. In other words I look at two adjacent records and reader.getPosition yields the same value. Of course this makes sense when you think about how the compressed formats work, but the book is obviously for folks who haven't fully wrapped their minds around Hadoop. Just a suggestion to note the different behavior when compression is used.	David Larsen	Dec 06, 2013
PDF	Page 207 1st paragraph	The paragraph says "However, with the FIFO scheduler, priorities do not support preemption, so a high-priority job can still be blocked by a long-running, low-priority job that started before the high-priority job was scheduled." This is not true, as confirmed by Tom himself. A high-priority job will actually block any lower priotized jobs when being submitted onto a busy cluster. Eric Sammer's book is bescribing the correct behaviour: "The FIFO scheduler supports five levels of job prioritization, from lowest to highest: very low, low, normal, high, very high. Each priority is actually implemented as a sep- arate FIFO queue. All tasks from higher priority queues are processed before lower priority queues and as described earlier, tasks are scheduled in the order of their jobs� submission. The easiest way to visualize prioritized FIFO scheduling is to think of it as five FIFO queues ordered top to bottom by priority. Tasks are then scheduled left to right, top to bottom. This means that all very high priority tasks are processed before any high priority tasks, which are processed before any normal priority tasks, and so on." Note from the Author or Editor: This is indeed incorrect. I have reworked and added to the material on schedulers for the fourth edition to cover scheduling in YARN. The FIFO scheduler in YARN doesn't support priorities, so the statement I wrote is actually correct for MR2. In MR1 however, the FIFO scheduler does support priorities, and the behaviour described by Eric is correct.	Kai Voigt	Oct 16, 2013
Printed	Page 168 paragraph	"Farther down the page" should say "Further down the page" Note from the Author or Editor: BTW this section has been re-written for the fourth edition to cover the new web flow for YARN, and the phrase "Further down the page" no longer appears.	Tulio Domingos	May 22, 2013
Printed	Page 57 paragraph	"Sometimes it is possible to set a URLStreamHandlerFactory". It should say "Sometimes it is NOT possible" Note from the Author or Editor: Changed to "Sometimes it is impossible to set a URLStreamHandlerFactory"	Tulio Domingos	May 22, 2013
ePub	Page 113 Bottom paragraph, first sentence	"Hadoop cannot divine" should be "Hadoop cannot define" Note from the Author or Editor: Changed to "Hadoop cannot magically discover" in the fourth edition.	Robert A. Wlodarczyk	Feb 18, 2013
Printed	Page 304 Footnote (3)	Within footnote 3 on page 304 the sentence: "See its main page for instructions on how to start ssh-agent" should probably say man instead of main e.g.: "See its man page for instructions on how to start-ssh-agent" Note from the Author or Editor: Changed to "See its man page for instructions on how to start ssh-agent."	Vijay	Dec 24, 2012
Printed	Page 309 3rd paragraph	Missed the y on directory. The log director ---> The log directory	Ryan Tabora	Dec 04, 2012
	625 3rd paragraph	All filenames in the sample listing in Appendix C pg 625 are the same. % ls -l 1901 \| head 011990-99999-1950.gz 011990-99999-1950.gz ... 011990-99999-1950.gz The sample listing on pg 18 appears correct. Note from the Author or Editor: Changed to % ls 1901 \| head 029070-99999-1901.gz 029500-99999-1901.gz 029600-99999-1901.gz 029720-99999-1901.gz 029810-99999-1901.gz 227070-99999-1901.gz	Anonymous	Nov 13, 2012
Printed	Page 38 2nd and 3rd % command	regular expression in $HADOOP_INSTALL/contrib/streaming/hadoop--streaming.jar doesn't mach 1.0.3 file naming hadoop-streaming-1.0.3.jar perhaps hadoopstreaming.jar to match both old and new file naming Note from the Author or Editor:* For Hadoop 2 the correct form is "hadoop jar $HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming-*.jar". The fourth edition has been updated to reflect this change.	Mark Anderson	Aug 17, 2012
PDF	Page 5 3rd paragraph, first sentence	"In many ways, MapReduce can be seen as a complement to a Rational Database Management System (RDBMS)." I believe that the word "Rational" should be "Relational." Kathleen Note from the Author or Editor: I think this was introduced during copy-editing. Fixed in the next edition (4th).	lenni	Jul 23, 2012
PDF	Page 79 3rd Paragraph	Minor sentence construction problem: The tool runs a MapReduce job to process the input files in parallel, so to run it, you need a MapReduce cluster running to use it. Note from the Author or Editor: This section has been removed in the fourth edition.	Anonymous	May 15, 2012
PDF	Page 39 1st paragraph	In this paragraph, discussing the Ruby reducer in Hadoop Streaming, we have; "In this case, the keys are weather station identifiers, ...". However as in the Java example the keys are the years. At least that is how I read the example...	SAStanley	May 04, 2012
PDF	Page 616 5th paragraph	May need ssh-add to work properly, after cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys and before ssh localhost. Otherwise may receive "Agent admitted failure to sign using the key" error Note from the Author or Editor: Add the following paragraph after the "cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys" line: "You may also need to run ssh-add if you are running ssh-agent."	Anonymous	Apr 30, 2012
PDF	Page 86 footer	1. For a comprehensive set of compression benchmarks, https://github.com/ning/jvm-compressor -benchmark is a good reference for JMV-compatible libraries (includes some native libraries). For command line tools, see Jeff Gilchrist?s Archive Comparison Test at http://compression.ca/act/act -summary.html. It says "JMV-compatible", should be "JVM"	Emīls ?olmanis	Mar 13, 2012