Errata for Hadoop: The Definitive Guide
Submit your own errata for this product.
The errata list is a list of errors and their corrections that were found after the product was released. If the error was corrected in a later version or reprint the date of the correction will be displayed in the column titled "Date Corrected".
The following errata were submitted by our customers and approved as valid errors by the author or editor.
Color Key: Serious Technical Mistake Minor Technical Mistake Language or formatting error Typo Question Note Update
| Version |
Location |
Description |
Submitted By |
Date Submitted |
Date Corrected |
| Printed |
Page 27
Example 2-6 main() function |
The instance "job" changed from the example on page 22 where the instance was "conf". There are 6 usages of job that should be highlighted in bold.
|
John Martin |
Jan 05, 2012 |
|
| PDF |
Page 214
2nd paragraph |
"Combined with SequenceFile.Reader's appendRaw() method" should be "nextRaw()".
Note from the Author or Editor: The replacement should be: "Combined with a process that creates sequence files with SequenceFile.Writer’s appendRaw() method or SequenceFileAsBinaryOutputFormat"
|
Davey Yan |
Dec 19, 2011 |
|
| Printed |
Page 32
1st paragraph |
In first sentence after the parentheses, the phrase "between the maps and reduces" should read "between the maps and reducers".
|
Johnny Tolliver |
Nov 21, 2011 |
|
| Printed |
Page 136
Box: Which Properties Can I Set? |
The box points to the book's website for a configuration property reference. I looked but could not find the reference there.
Note from the Author or Editor: You're right, there is no property reference - I had planned to write one, but have never done so, I'm afraid. The best bet to look at the *-default.xml files, which list all the default settings, with some documentation for each one. You can also view them online - I'm adding the following sentence to the third edition to make this clear (and removing the sentence that is the subject of this erratum):
The default settings documentation files can be found online at URLs of the form
http://hadoop.apache.org/common/docs/r<version>/<component>-default.html;
for example the HDFS defaults for release 1.0.0 are at http://hadoop.apache.org/common/docs/r1.0.0/hdfs-default.html.
|
Anonymous |
Oct 13, 2011 |
|
| PDF |
Page 34
Example 2-9 code |
max_val should be initialized to the smallest negative integer possible instead of 0.
You could have the unlikely case that a year consists entirely of negative temperatures. If this happened, the code as currently written would then erroneously return 0 as the maximum temperature instead of the largest negative temperature from the actual data set.
This is also true for the Python code in example 2-10 on page 36.
|
David Egts |
Sep 24, 2011 |
|
| PDF |
Page 35
1st paragraph |
"In this case, the keys are the weather station identifiers" should read "In this case, the keys are the years."
The key-value pair output from the map on page 33 is the year and temperature, not the weather station identifier and temperature.
|
David Egts |
Sep 24, 2011 |
|
| Printed, PDF, Safari Books Online |
Page 55
code snippet in middle of page |
The discussion in that and the previous page is about the PositionedReadable interface. Here's the confusing snippet:
All of these methods preserve the current offset in the file and are thread-safe.. In fact, they are just implemented using the
Seekable interface using the following pattern:
long oldPos = getPos();
try {
seek(position);
// read data
} finally {
seek(oldPos);
}
**
Now, clearly, that is _not_ a thread-safe pattern. On checking the actual source code, I found that the pattern is in fact wrapped in a synchronized block (as it should be).
So the implementation is thread safe, but not exactly concurrent. And since I/O is many orders of magnitude slower than most any other operation, the window in which this mutex is held is indeed quite long.
It would be better to clear this all up, by either (i) omitting the implementation details altogether, (ii) explicitly wrapping the code snippet in a synchronized block, and/or (iii) noting that while the operation is thread safe, it's not designed for concurrent access.
(By contrast, if this were implement via an nio FileChannel then it would be both thread safe _and_ concurrent--something many a reader knows.)
Note from the Author or Editor: Thanks for your analysis - I agree entirely. For the third edition I've opted to omit the implementation details (i) and mention that a single instance of FSDataInputStream is not designed for concurrent access (iii), and it's better to create multiple instances.
|
Babak |
Apr 16, 2011 |
|
| Printed |
Page 473
First paragraph of Resilience and Performance section |
Third sentence of first paragraph of "Resilience and Performance" section: "ZooKeeper replies on having low-latency connections..." should be "ZooKeeper relies on having low-latency connections..."
|
Keith McDonald |
Apr 03, 2011 |
Apr 21, 2011 |
| Printed |
Page 352
Second paragraph of FOREACH...GENERATE section |
The third sentence of the second paragraph says, "B's second field is the third field of A ($1) with one added to it." The "$1" should be changed to a "$2".
|
Keith McDonald |
Apr 02, 2011 |
Apr 21, 2011 |
| Safari Books Online |
182
Description of last item |
"...the reduce begins, to give the reduces as much..." should be "the reduce begins, to give the reducers as much...", i.e. "reducers" not "reduces".
|
Lars George |
Dec 16, 2010 |
Apr 21, 2011 |
| Safari Books Online |
181
3rd paragraph |
"Hadoop’s uses a buffer size..." should be without the 's.
|
Lars George |
Dec 16, 2010 |
Apr 21, 2011 |
| PDF |
Page 300
5rd paragraph |
In section "Audit Logging", there are two appearances of "org.apache.hadoop.fs.FSNamesystem.audit". They should be "org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit". http://wiki.apache.org/hadoop/HowToConfigure has the same mistake. I have updated it.
Note from the Author or Editor: Change "org.apache.hadoop.fs.FSNamesystem.audit" to "org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit" (I could only see one occurrence on the page.)
|
Jingguo Yao |
Nov 28, 2010 |
Apr 21, 2011 |
| PDF |
Page 160
6th paragraph |
"if set in the tasktracker’s mapred-site.html file" should be "if set in the tasktracker’s mapred-site.xml file"
Note from the Author or Editor: This error is on page 136 in the print version second edition.
|
Dave Brondsema |
Oct 25, 2010 |
Apr 21, 2011 |
| Printed |
Page 420
1st paragraph |
"We ask the org.apache.hadoop.hbase.HBaseConfigurationn class..." there aren't supposed to be two n's in that class. (Via Doug Meil.)
|
 Tom White
|
Oct 18, 2010 |
Apr 21, 2011 |
| Safari Books Online |
368
JP |
> % ls /user/hive/warehouse/record/
should be
> % ls /user/hive/warehouse/records/
miss "s"
|
lan |
Sep 29, 2010 |
Apr 21, 2011 |
|