Errata

Data Analytics with Hadoop

Errata for Data Analytics with Hadoop

Submit your own errata for this product.

The errata list is a list of errors and their corrections that were found after the product was released. If the error was corrected in a later version or reprint the date of the correction will be displayed in the column titled "Date Corrected".

The following errata were submitted by our customers and approved as valid errors by the author or editor.

Color key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update

Version Location Description Submitted By Date submitted Date corrected
Printed
Page xii
end

There is not source code except a README file on https://github.com/oreillymedia/Data_Analytics_with_Hadoop .

Note from the Author or Editor:
The official repository for the book is at https://github.com/bbengfort/hadoop-fundamentals -- @oreillymedia should either link to this repository or fork it.

Jan Janka  Feb 04, 2018 
Printed
Page 33
2nd Paragraph

' ' embarrassingly parallel" instead of "embarrassingly parallel". The quotation marks are not formatted properly.

Andres Mack  Jun 29, 2016 
Printed
Page 33
4th paragraph, code snippet area

The Python pseudocode goes:

def map(dockey, line):
for word in value.split():
emit(word, 1)

The variable "value" to which split() is being applied does not exist.

The correct variable to be used is "line", passed on the arguments.

def map(dockey, line):
for word in line.split():
emit(word, 1)

Andres Mack  Jul 01, 2016 
Printed
Page 34
Bottom paragraph, code snippet

The period as a key is missing a quotation mark:

(.", [1, 1])

Should be

(".", [1, 1])

Andres Mack  Jul 01, 2016 
Printed
Page 34
Figure 2-7

The two blocks on the left side of the figure are identical. The text within the lower block should look something like this instead:

Block 2
The cat in the
hat ran fast.

Norbert Rump  May 02, 2017 
Printed
Page 35
code snippet at the top of the page

In the output by all WordCount reducers snippet, the first line is missing a quotation mark to start the string in the tuple.

(.", 2) should be changed to (".", 2)

Benjamin Bengfort
Benjamin Bengfort
 
Apr 14, 2017 
Printed
Page 36
Code block

In the input, there were five elements but the output only shows four mappers used. There should also be a mapper for Chris.

The results at the bottom of the page are correct. For example, the output (Betty, Chris) -> (A B D E) appears in the second block but it is not an output from a mapper.

Note from the Author or Editor:
Add to the text block:

# Mapper 5 output
(Allen, Chris) → (Allen, Chris, David, Ellen)
(Betty, Chris) → (Allen, Chris, David, Ellen)
(Chris, David) → (Allen, Chris, David, Ellen)
(Chris, Ellen) → (Allen, Chris, David, Ellen)

Anonymous  Feb 12, 2018 
Printed
Page 40
5th paragraph

The final two sentences of the Conclusion are redundant --

"To that end, in the next chapter we will look at writing simple distributed jobs with MapReduce in Python by using a framework called Hadoop Streaming. However, in the next chapter, we will take a specific look at how to write MapReduce jobs in Python using Hadoop Streaming."

Tim Hutchinson  Jul 12, 2017 
Printed
Page 68
2nd sentence of inset paragraph

The second sentence reads "... and acylic because ...", when it should read "... and acyclic because ..."

Note from the Author or Editor:
Change "acylic" to "acyclic".

Tim Hutchinson  Oct 04, 2017 
PDF
Page 92
Figure 5-1

The key Y in the second table of the first line should be replaced with X

王纯超  Mar 13, 2017 
PDF
Page 112
the 3rd paragraph

"The term frequency, tf i,k " should be "The term frequency, tf i,j"

王纯超  Mar 13, 2017 
PDF
Page 122
1st paragraph

The last sentence:
"If you’re willing to have some fuzziness, most bloom filters can be constructed with a threshold for the probability of a false negative, by increasing or decreasing
the size of the bloom filter."

Should "false negative" be "false positive" since there are no false negatives in a bloom filter?

Note from the Author or Editor:
Change last sentence of p122 1st para from "If you're willing to have some fuzziness, most bloom filters can be constructed with a threshold for the probability of a false negative, by increasing or decreasing the size of the bloom filter."

To: "This tradeoff allows you to adjust the fuzziness which you are willing to accept, most bloom filters can be constructed with a threshold for the probability of a false positive, which increases or decreases the size of the bloom filter."

王纯超  Nov 13, 2017