The errata list is a list of errors and their corrections that were found after the product was released. If the error was corrected in a later version or reprint the date of the correction will be displayed in the column titled "Date Corrected".
The following errata were submitted by our customers and approved as valid errors by the author or editor.
Version |
Location |
Description |
Submitted By |
Date submitted |
Date corrected |
ePub |
Page N/A (ebook)
N/A (ebook) |
From Chapter 1, "A Brief History of Hadoop"
“Early in 2005, the Nutch developers had a working MapReduce implementation in Nutch, and by the middle of that year. All the major Nutch algorithms had been ported to run using MapReduce and NDFS.”
I think this was intended to say (note punctuation after "middle of that year"):
“Early in 2005, the Nutch developers had a working MapReduce implementation in Nutch, and by the middle of that year, all the major Nutch algorithms had been ported to run using MapReduce and NDFS.”
|
Trevor Harmon |
Oct 21, 2014 |
|
PDF |
Page 5
3rd paragraph, first sentence |
"In many ways, MapReduce can be seen as a complement to a Rational Database Management System (RDBMS)."
I believe that the word "Rational" should be "Relational."
Kathleen
Note from the Author or Editor: I think this was introduced during copy-editing. Fixed in the next edition (4th).
|
lenni |
Jul 23, 2012 |
|
|
37
3rd paragraph (box note) |
boxnote says:
"In this case, you need to implement the close() method so that
you know when the last record has been read, so you can finish processing
the last group of lines."
org.apache.hadoop.mapreduce.Mapper doesn't have close method to override, so implementing close method will not help since it's not called anywhere.
However it has empty cleanup method that can be overridden specifically for such kind of purpose.
Note from the Author or Editor: Change "In this case, you need to implement the close() method" to "In this case, you need to implement the cleanup() method".
This is on p38 in the 4th ed print edition.
|
Anonymous |
Mar 04, 2015 |
Apr 17, 2015 |
Printed |
Page 38
2nd and 3rd % command |
regular expression in
$HADOOP_INSTALL/contrib/streaming/hadoop-*-streaming.jar
doesn't mach 1.0.3 file naming
hadoop-streaming-1.0.3.jar
perhaps
hadoop*streaming*.jar
to match both old and new file naming
Note from the Author or Editor: For Hadoop 2 the correct form is "hadoop jar $HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming-*.jar". The fourth edition has been updated to reflect this change.
|
Mark Anderson |
Aug 17, 2012 |
|
PDF |
Page 39
1st paragraph |
In this paragraph, discussing the Ruby reducer in Hadoop Streaming, we have;
"In this case, the keys are weather station identifiers, ...".
However as in the Java example the keys are the years.
At least that is how I read the example...
|
SAStanley |
May 04, 2012 |
|
Printed |
Page 57
paragraph |
"Sometimes it is possible to set a URLStreamHandlerFactory". It should say "Sometimes it is NOT possible"
Note from the Author or Editor: Changed to "Sometimes it is impossible to set a URLStreamHandlerFactory"
|
Tulio Domingos |
May 22, 2013 |
|
PDF |
Page 65
1st paragraph |
A tiny typo: the last sentence of the first paragraph (under the heading "File patterns") reads: "Hadoop provides two FileSystem method for processing globs..." The word "method" should be pluralized (i.e. "methods").
|
Dima Spivak |
Apr 11, 2014 |
|
PDF |
Page 79
3rd Paragraph |
Minor sentence construction problem:
The tool runs a MapReduce job to process the input files in parallel, so to run it, you need a MapReduce cluster running to use it.
Note from the Author or Editor: This section has been removed in the fourth edition.
|
Anonymous |
May 15, 2012 |
|
PDF |
Page 86
footer |
1. For a comprehensive set of compression benchmarks, https://github.com/ning/jvm-compressor
-benchmark is a good reference for JMV-compatible libraries (includes some native libraries). For
command line tools, see Jeff Gilchrist?s Archive Comparison Test at http://compression.ca/act/act
-summary.html.
It says "JMV-compatible", should be "JVM"
|
Emīls ?olmanis |
Mar 13, 2012 |
|
ePub |
Page 113
Bottom paragraph, first sentence |
"Hadoop cannot divine" should be "Hadoop cannot define"
Note from the Author or Editor: Changed to "Hadoop cannot magically discover" in the fourth edition.
|
Robert A. Wlodarczyk |
Feb 18, 2013 |
|
Printed, PDF |
Page 118
6th paragraph, first line (two lines before "The specific API" subsection). |
The text says, "the objects returned by result.get("left") and result.get("left") are of type Utf-8, so we can convert them into Java String objects by calling their toString() methods."
We are discussing the objects returned by result.get on "left" and "right", not "left" and "left".
The text should read, "the objects returned by result.get("left") and result.get("right") are of type Utf-8, so we can convert them into Java String objects by calling their toString() methods."
Note from the Author or Editor: Change the second result.get("left") to result.get("right").
This appears on p351 in the 4th edition.
|
Myles Baker |
Jan 21, 2015 |
Apr 17, 2015 |
Printed |
Page 134
United States |
When reading about reader.sync and reader.getPosition it led me to believe that the example output shown on page 134 would work regardless of the compression used. The positions increment in an orderly fashion as you iterate through the rows. With compression, the read position apparently stays the same for all the records in the decompressed block. In other words I look at two adjacent records and reader.getPosition yields the same value.
Of course this makes sense when you think about how the compressed formats work, but the book is obviously for folks who haven't fully wrapped their minds around Hadoop.
Just a suggestion to note the different behavior when compression is used.
|
David Larsen |
Dec 06, 2013 |
|
Printed |
Page 168
paragraph |
"Farther down the page" should say "Further down the page"
Note from the Author or Editor: BTW this section has been re-written for the fourth edition to cover the new web flow for YARN, and the phrase "Further down the page" no longer appears.
|
Tulio Domingos |
May 22, 2013 |
|
PDF |
Page 207
1st paragraph |
The paragraph says
"However, with the FIFO scheduler, priorities do not support preemption, so a high-priority job can still be blocked by a long-running, low-priority job that started before the high-priority job was scheduled."
This is not true, as confirmed by Tom himself. A high-priority job will actually block any lower priotized jobs when being submitted onto a busy cluster.
Eric Sammer's book is bescribing the correct behaviour:
"The FIFO scheduler supports five levels of job prioritization, from lowest to highest: very low, low, normal, high, very high. Each priority is actually implemented as a sep- arate FIFO queue. All tasks from higher priority queues are processed before lower priority queues and as described earlier, tasks are scheduled in the order of their jobs� submission. The easiest way to visualize prioritized FIFO scheduling is to think of it as five FIFO queues ordered top to bottom by priority. Tasks are then scheduled left to right, top to bottom. This means that all very high priority tasks are processed before any high priority tasks, which are processed before any normal priority tasks, and so on."
Note from the Author or Editor: This is indeed incorrect. I have reworked and added to the material on schedulers for the fourth edition to cover scheduling in YARN. The FIFO scheduler in YARN doesn't support priorities, so the statement I wrote is actually correct for MR2. In MR1 however, the FIFO scheduler does support priorities, and the behaviour described by Eric is correct.
|
Kai Voigt |
Oct 16, 2013 |
|
PDF |
Page 299
Last paragraph |
In describing network topology, reference is made to switches and routers connecting machines on a rack. "GB" is used to refer to gigabit when it should probably be written as "Gb."
Note from the Author or Editor: Change "with a 1 GB switch" to "with a 1 Gb switch"; and "normally 1 GB or better" to "normally 1 Gb or better".
|
Dima Spivak |
Apr 14, 2014 |
|
Printed |
Page 304
Footnote (3) |
Within footnote 3 on page 304 the sentence:
"See its main page for instructions on how to start ssh-agent"
should probably say man instead of main e.g.:
"See its man page for instructions on how to start-ssh-agent"
Note from the Author or Editor: Changed to "See its man page for instructions on how to start ssh-agent."
|
Vijay |
Dec 24, 2012 |
|
Printed |
Page 309
3rd paragraph |
Missed the y on directory.
The log director ---> The log directory
|
Ryan Tabora |
Dec 04, 2012 |
|
PDF |
Page 616
5th paragraph |
May need ssh-add to work properly, after cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
and before ssh localhost.
Otherwise may receive "Agent admitted failure to sign using the key" error
Note from the Author or Editor: Add the following paragraph after the "cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys" line:
"You may also need to run ssh-add if you are running ssh-agent."
|
Anonymous |
Apr 30, 2012 |
|
|
625
3rd paragraph |
All filenames in the sample listing in Appendix C pg 625 are the same.
% ls -l 1901 | head
011990-99999-1950.gz
011990-99999-1950.gz
...
011990-99999-1950.gz
The sample listing on pg 18 appears correct.
Note from the Author or Editor: Changed to
% ls 1901 | head
029070-99999-1901.gz
029500-99999-1901.gz
029600-99999-1901.gz
029720-99999-1901.gz
029810-99999-1901.gz
227070-99999-1901.gz
|
Anonymous |
Nov 13, 2012 |
|