The errata list is a list of errors and their corrections that were found after the product was released.
The following errata were submitted by our customers and have not yet been approved or disproved by the author or editor. They solely represent the opinion of the customer.
Version |
Location |
Description |
Submitted by |
Date submitted |
PDF |
Page xxvi
end of first parahraph |
The line,
"For example, if DNA-Sequencing1 takes 60
hours with three servers, then by "scaling out" the solution might produce
the same DNA-Sequencing with 50 similar servers in less than 2 hours."
Should say
"For example, if DNA-Sequencing1 takes 60
hours with three servers, then by "scaling out" the solution might produce
the same DNA-Sequencing with 50 similar servers in less than 4 hours."
Reason: 60 hours on 3 servers is 180 server hours. We can hope to achieve the same amount of work done by 50 servers in approximately 4 hours, or 100 servers in 2 hours.
|
Manoj Agarwal |
Nov 23, 2014 |
Printed |
Page 3
2nd bullet point |
Page 3 (second bullet point) refers to Java Code Geeks, for Secondary Sorting. I think this should be attributed to "Hadoop: The Definitive Guide by Tom White" as this was widely publicized by him. Even the Java Code Geeks link says this, see Resources section.
|
Anonymous |
May 19, 2016 |
Printed |
Page 4
Example 1-1. DateTemperaturePair class |
Page 4: Example 1-1. The DateTemperaturePair class is defined as "
public class DateTemperaturePair implements Writable, WritableComparable<DateTemperaturePair> {
........................
}
There is no need to implement "Writable"separately as "WritableComparable" already extends it. See WritableComparable Doc at http://hadoop.apache.org/docs/current/api/org/apache/hadoop/io/WritableComparable.html for more details
|
Anonymous |
May 19, 2016 |
Printed |
Page 6
Code line immediately above Data Flow Using Plug-in Classes |
job.setGroupingComparatorClass(YearMonthGroupingComparator.class)
should instead read:
job.setGroupingComparatorClass(DateTempuratureGroupingComparator.class)
Note that there is no YearMonthGroupingComparator class. The code found in GitHub shows this correctly:
https://github.com/mahmoudparsian/data-algorithms-book/blob/master/src/main/java/org/dataalgorithms/chap01/mapreduce/SecondarySortDriver.java
|
Todd Farmer |
May 12, 2016 |
Printed |
Page 7
Figure 1-2. Secondary sorting data flow |
The output of partition() shows data for YearMonth value of 2000-11 appearing in both partitions. The DateTemperaturePartitioner class partitions by the YearMonth value, and should result in pairs with the same YearMonth value routed to the same partition.
|
Todd Farmer |
May 12, 2016 |
PDF |
Page 47
Chapter 2, Top-10 List |
The described parallelisation approach has a fundamental flaw. Constructing a global top-N from a series of local top-N's might not result in the correct output when members of the global top-N are not present in some (or all) of the local top-N lists.
To illustrate with a very simple example of a top-2 calculation based on the following local top-3 lists.
top-3 list 1:
A, 5
B, 4
C, 3
top-3 list 2:
D, 5
E, 4
C, 3
The global nr 1 key is C with a value of 6, but if we'd take the local top-2 lists only, C would be left out entirely.
See also this discussion on stackoverflow: http://stackoverflow.com/questions/15613966/parallel-top-ten-algorithm-for-distributed-data
|
Robbert Zijp |
Aug 24, 2014 |
Printed |
Page 260
2nd paragraph |
the 1st bullet point
"Give that today is foggy, what is the probability that it will be rainy two days from now?"
The problem asks S3 to be "Rainy" - but the solution given in the text after the above line - is done with S3 to be "Foggy"
|
Sumit Pal |
Feb 28, 2016 |
PDF |
Page 687
3rd bullet item, starting with 'It does not allow false negative errors' |
There is an error in this sentence:
'This means that if x is /not/ in the set, then for sure it will indicate that x is not in the set.'
This should be:
'This means that if x is in the set, then for sure it will /not/ indicate that x is not in the set.'
The original sentence is also contradicting the previous bullet about false positive errors, which are allowed: 'This means that for some x, which is not in the set, Bloom filter might indicate that x is in the set.'
In both the 2nd and the 3rd bullet the situation is described that x is not in the set.
- According to the 2nd bullet, a bloom filter might report that x is in the set,
- but according to the 3rd bullet the bloom filter in the same case will never report that x is in the set
|
Robbert Zijp |
Aug 24, 2014 |