Errata
The errata list is a list of errors and their corrections that were found after the product was released. If the error was corrected in a later version or reprint the date of the correction will be displayed in the column titled "Date Corrected".
The following errata were submitted by our customers and approved as valid errors by the author or editor.
Color key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update
Version | Location | Description | Submitted By | Date submitted | Date corrected |
---|---|---|---|---|---|
http://techbus.safaribooksonline.com/9781449359034/subsec_passing_functions_html#example3-21 Example 3-21 |
In Example 3-21, return types for the getMatches* methods are incorrect. getMatchesFunctionReference() should return RDD[Boolean], and getMatchesFieldReference() and getMatchesNoReference() should either return RDD[Array[String]] or change the implementation to use flatMap instead of map. Note from the Author or Editor: |
Anonymous | Feb 19, 2015 | Mar 27, 2015 | |
ex6-16 example 6-16, code |
The R library Imap function gdist takes as the first four arguments longitude, latitude, longitude, and latitude. The calling code writes values to stdout in the wrong order (latitude, longitude, ...). This order is not corrected in the R code that passes them to gdist. Note from the Author or Editor: |
Waclaw Kusnierczyk | Jul 09, 2015 | ||
Page vii 2nd parapgraph |
duplicated wording Note from the Author or Editor: |
Ricardo Almeida | Oct 08, 2014 | Jan 26, 2015 | |
Page 9, 10 P9 - Downloading Spark: P1; P10 - 1st paragraph after the notes |
Page 9 has the following text: |
Kevin D'Elia | Oct 20, 2014 | Jan 26, 2015 | |
Page 20 Fifth line from the bottom. |
"Example 2-13. Maven build file" is invalid because there is an extra </plugin> closing tag. Note from the Author or Editor: |
Michah Lerner | Feb 02, 2015 | Mar 27, 2015 | |
Page 20 Example 2-13. maven build example |
Is there any reason why Akka repo is needed to build the mini project? It seems like all dependencies of spark-core_2.10:1.1.0 are already available in the maven central. Note from the Author or Editor: |
Uladzimir Makaranka | Sep 21, 2014 | Jan 26, 2015 | |
Page 20 Example 2-13 |
<artifactId>learning-spark-mini-example/artifactId> is missing closing < Note from the Author or Editor: |
Kevin D'Elia | Oct 19, 2014 | Jan 26, 2015 | |
Page 21 Example 2-15 |
Maven command line executable is called 'mvn'. Please replace "maven clean && maven compile && maven package" with "mvn clean && mvn compile && mvn package". Note from the Author or Editor: |
Uladzimir Makaranka | Sep 21, 2014 | Jan 26, 2015 | |
PDF, ePub | Page 21 Example 2-14 and Example 2-15 |
In order to match with the code in Github: Note from the Author or Editor: |
Murali Raju | Dec 13, 2014 | Jan 26, 2015 |
Page 25 Example 3-4 |
Small typo in Example 3-4. Note from the Author or Editor: |
Tatsuo Kawasaki | May 01, 2015 | ||
Page 29 Example 3-15 |
Python example : Example 3-15 Note from the Author or Editor: |
Tatsuo Kawasaki | May 01, 2015 | ||
Page 29 Example 3-17 |
No semicolon ends. Note from the Author or Editor: |
Tatsuo Kawasaki | May 01, 2015 | ||
Printed | Page 29 Code example 3-15 |
Greetings, I'm going through "Learning Spark" (3rd release, 1st edition). Note from the Author or Editor: |
Doug Meil | Jul 28, 2015 | |
Page 32 1 |
I have been earlier asked by the author to return the book because I reported an issue. Note from the Author or Editor: |
Gourav Sengupta | Apr 15, 2015 | May 08, 2015 | |
Page 33 Example 3-23. Java function passing with named class |
There are extra () in the class i.e. Note from the Author or Editor: |
Guillermo Schiava | Aug 12, 2015 | ||
Page 33 Figure 3-3 |
READ: RDD2.subtract(RDD2) Note from the Author or Editor: |
Tatsuo Kawasaki | Aug 18, 2014 | Jan 26, 2015 | |
Printed | Page 36 figure 3-4 |
Page 36, figure 3-4, RDD2: list should be: coffee, monkey, kitty. Note from the Author or Editor: |
Anonymous | Mar 02, 2015 | Mar 27, 2015 |
Page 37 Figure 3-2. Map and filter on an RDD |
FilteredRDD Note from the Author or Editor: |
Tang Yong | Aug 18, 2014 | Jan 26, 2015 | |
Page 37 Example 3-24. Scala squaring the values in an RDD |
println(result.collect()) Note from the Author or Editor: |
Tang Yong | Aug 18, 2014 | Jan 26, 2015 | |
Page 40 Example 3-35. Python |
Python aggregate example Note from the Author or Editor: |
Anonymous | Jan 25, 2015 | Mar 27, 2015 | |
Printed | Page 45 Example 3-40 |
In example 3-40, "result.persist(StorageLevel.DISK_ONLY)" will not work, as it is not imported in the example. Note from the Author or Editor: |
Tom Hubregtsen | Apr 19, 2015 | May 08, 2015 |
Page 50 Table 4-2 |
Right outer join and left outer join "Purpose" descriptions are reversed; in the right outer join, the key must be present in the "other" RDD, not "this" RDD. Reverse mistake is made in the left outer join purpose description. Note from the Author or Editor: |
Wayne M Adams | Feb 24, 2015 | Mar 27, 2015 | |
Printed | Page 53 Example 4-11. Second line. |
Shouldn't "rdd.flatMap(...)" be "input.flatMap(...)" Note from the Author or Editor: |
Jim Williams | Apr 06, 2015 | May 08, 2015 |
Page 54 United States |
Example 4-12 does not print out its results as the others do. Also, 4-13 should arguably use a foreach to print as it uses side effects. Note from the Author or Editor: |
Justin Pihony | Jan 25, 2015 | Mar 27, 2015 | |
Page 57 Example 4-16 |
Apparent cut-and-paste mistake: the "Custom parallelism" example is the same as the default one, in that no parallelism Int was specified in the example call. Note from the Author or Editor: |
Wayne M Adams | Feb 24, 2015 | Mar 27, 2015 | |
ePub | Page 58 Example 4-12 |
Example 4-12 (Python) is not equivalent to the others: the sum of numbers must be divided by the count to yield the average. Having the Python example implement the same behavior as the Scala and Java examples will aid the reader. My version of the example is: Note from the Author or Editor: |
Andres Moreno | Dec 02, 2014 | Jan 26, 2015 |
Page 60 Table 4-3 |
collectAsMap() doesn't return multi Map. so Result should be Note from the Author or Editor: |
Tatsuo Kawasaki | May 08, 2015 | ||
Page 64 4th paragraph |
Page 64, Section "Determining an RDD’s Partitioner", second line says, "or partitioner() method in Java". Note from the Author or Editor: |
Anonymous | May 09, 2015 | ||
Printed | Page 65 Example 4-24 |
In example 4-24, "val partitioned = pairs.partitionBy(new spark.HashPartitioner(2))" will not work, as it is not imported in the example. Either an import or a change into "new org.apache.spark.HashPartitioner(2)" would work. Note from the Author or Editor: |
Tom Hubregtsen | Mar 10, 2015 | Mar 27, 2015 |
Page 66 JSON |
It is mentioned that liftweb-json is used for JSON-parsing, however Play JSON is used for parsing and then liftweb-json for JSON output. This is a bit confusing. Note from the Author or Editor: |
Anonymous | Aug 05, 2014 | Jan 26, 2015 | |
Page 67 United States |
The case under //Run 10 iterations shadows the links variable. This might be confusing for new developers Note from the Author or Editor: |
Justin Pihony | Apr 27, 2015 | May 08, 2015 | |
Page 70 United States |
feildnames Note from the Author or Editor: |
Anonymous | Aug 17, 2014 | Jan 26, 2015 | |
Page 70 first paragraph |
"In Python if an value isn’t present None is used and if the value is present the regular value" Note from the Author or Editor: |
Mark Needham | Nov 30, 2014 | Jan 26, 2015 | |
Page 72 United States |
"The input formats that Spark wraps all Note from the Author or Editor: |
Justin Pihony | Apr 27, 2015 | May 08, 2015 | |
Page 73 Example 5-4 |
(p. 91 of the PDF doc; p. 73 of the book). This is a total nitpick, but the file url is Note from the Author or Editor: |
Wayne M Adams | Feb 26, 2015 | Mar 27, 2015 | |
Page 73 United States |
"Sometimes it’s important to know which file which piece of input came from" Note from the Author or Editor: |
Justin Pihony | Apr 27, 2015 | May 08, 2015 | |
Printed, PDF | Page 76 Example 5-10 |
Reads: Note from the Author or Editor: |
Myles Baker | May 11, 2015 | |
Page 78 United States |
import Java.io.StringReader should use a lowercase j Note from the Author or Editor: |
Justin Pihony | Apr 27, 2015 | May 08, 2015 | |
Page 79 United States |
"If there are only a few input files, and you need to use the wholeFile() method," Note from the Author or Editor: |
Justin Pihony | Apr 27, 2015 | May 08, 2015 | |
Printed | Page 82 example 5-20 |
Example 5-20. Loading a SequenceFile in Python should drop the "val" on "val data = ..." Works otherwise. Note from the Author or Editor: |
jonathan greenleaf | Apr 09, 2015 | May 08, 2015 |
Page 84 United States |
"A similar function, hadoopFile(), exists for working with Hadoop input formats implemented with the older API." Note from the Author or Editor: |
Justin Pihony | Apr 27, 2015 | May 08, 2015 | |
Page 85 Example 5-13/5-14 |
Minor issue; there should be a Note from the Author or Editor: |
Timothy Elser | Oct 07, 2014 | Jan 26, 2015 | |
ePub | Page 87 |
"We have looked at the fold, combine, and reduce actions on basic RDDs". There is no RDD.combine(), did you mean aggregate()? Note from the Author or Editor: |
Thomas Oldervoll | Jan 25, 2015 | Mar 27, 2015 |
Page 90 United States |
"you can specify SPARK_HADOOP_VERSION= as a environment variable" should be as AN environment variable. Note from the Author or Editor: |
Justin Pihony | Apr 27, 2015 | May 08, 2015 | |
Page 91 Example 5-31 |
(p. 109 PDF document; page 91 of book). Minor -- with the import of the HiveContext class, there's no need to fully qualify the class name when invoking the HiveContext constructor. Note from the Author or Editor: |
Wayne M Adams | Feb 26, 2015 | Mar 27, 2015 | |
Page 95 United States |
Why is the SparkContext and JavaSparkContext in example 5-40 and 5-41 using different arguments? If no reason, then they should be synchronized. Note from the Author or Editor: |
Justin Pihony | Apr 27, 2015 | May 08, 2015 | |
Page 101 United States |
Example 6-3 creates the SparkContext, while the other examples do not. Note from the Author or Editor: |
Justin Pihony | Apr 27, 2015 | May 08, 2015 | |
Page 102 Third paragraph |
Don't need a comma before the word "or" in: Note from the Author or Editor: |
Anonymous | Feb 04, 2015 | Mar 27, 2015 | |
Page 103 United States |
Example 6-5 outputs Too many errors: # in # Note from the Author or Editor: |
Justin Pihony | Apr 27, 2015 | May 08, 2015 | |
Page 111 United States |
String interpolation in example 6-17 needs to be in brackets as it uses a property of the y object. Note from the Author or Editor: |
Justin Pihony | Apr 27, 2015 | May 08, 2015 | |
ePub | Page 112 3rd |
Text reads: “Spark has many levels of persistence to chose from based on what our goals are. ” Note from the Author or Editor: |
Bruce Sanderson | Nov 15, 2014 | Jan 26, 2015 |
Page 114 United States |
std.stdev in example 6-19 should be stats.stdev Note from the Author or Editor: |
Justin Pihony | Apr 27, 2015 | May 08, 2015 | |
Printed, PDF | Page 123 Example 7-4 |
Incorrect variable name: Note from the Author or Editor: |
Myles Baker | May 13, 2015 | |
ePub | Page 126 1st paragraph |
The text "...to how we used fold and map compute the entire RDD average” Note from the Author or Editor: |
Bruce Sanderson | Nov 18, 2014 | Jan 26, 2015 |
Page 127 United States |
The comment in example 7-7 "A special option to exclude Scala itself form our assembly JAR, since Spark" should be "A special option to exclude Scala itself FROM our assembly JAR, since Spark" Note from the Author or Editor: |
Justin Pihony | Apr 27, 2015 | May 08, 2015 | |
PDF, ePub | Page 130 5th step |
It says "(...) run bin/stop-all.sh (...)" Note from the Author or Editor: |
Alejandro Ramon Lopez del Huerto | Mar 10, 2015 | Mar 27, 2015 |
Page 134 United States |
"to elect a master when running in multimaster node" should be "to elect a master when running in multimaster MODE" Note from the Author or Editor: |
Justin Pihony | Apr 27, 2015 | May 08, 2015 | |
Page 141 Example 8-1 |
Example 8-1. Creating an application using a SparkConf in Python Note from the Author or Editor: |
Tatsuo Kawasaki | Jun 16, 2015 | ||
Page 142 Example 8-3 |
READ: Note from the Author or Editor: |
Tatsuo Kawasaki | Jun 16, 2015 | ||
Page 145 United States |
spark.[X}.port explanation is missing the ui value (spark.ui.port) Note from the Author or Editor: |
Justin Pihony | Apr 27, 2015 | May 08, 2015 | |
Page 146 Example 8-7 |
The code example: Note from the Author or Editor: |
Wayne M Adams | Mar 10, 2015 | Mar 27, 2015 | |
Page 147 United States |
"To trigger computation, let’s call an action on the counts RDD and collect() it to the driver, as shown in Example 8-9" might read better as "To trigger computation, let’s call an action on the counts RDD BY collect()ING it to the driver, as shown in Example 8-9." Note from the Author or Editor: |
Justin Pihony | Apr 27, 2015 | May 08, 2015 | |
Page 156 United States |
Example 8-11 coalesces to 5, but the number of partitions is listed as 4. |
Justin Pihony | Apr 27, 2015 | May 08, 2015 | |
Printed | Page 157 6th line down of text |
Extra "this". Note from the Author or Editor: |
Jim Williams | Apr 07, 2015 | May 08, 2015 |
Page 157 1st paragraph |
READ: Note from the Author or Editor: |
Tatsuo Kawasaki | Jul 20, 2015 | ||
Page 162 2nd Paragraph of section called "Linking with Spark SQL" |
Text page 162, PDF page 180, of the 1 April edition contains the following fragment, with duplicated reference to Hive query language: Note from the Author or Editor: |
Wayne M Adams | Apr 02, 2015 | May 08, 2015 | |
Page 162 United States |
Hive query language is parenthesized twice, once as (HiveQL) and another as (HQL). This should probably be made common. Note from the Author or Editor: |
Justin Pihony | Apr 27, 2015 | ||
Page 163 Table |
Table 9.1 lists the Scala and Java types/imports for Timestamp. Note from the Author or Editor: |
Anirudh Koul | Feb 03, 2015 | Mar 27, 2015 | |
Page 165 United States |
Example 9-8 does not match 9-6 and 9-7 in that it does not show the creation of the SparkContext. Note from the Author or Editor: |
Justin Pihony | Apr 27, 2015 | May 08, 2015 | |
Page 179 United States |
Example 9-39 has a collect and println, whereas 9-36 and 9-37 do not. Note from the Author or Editor: |
Justin Pihony | Apr 27, 2015 | May 08, 2015 | |
Page 181 Table 9-2 |
Default 'spark.sql.parquet.compression.codec' property is gzip. Note from the Author or Editor: |
Tatsuo Kawasaki | Jun 27, 2015 | ||
Page 185 United States |
Example 10-5 @override is missing on the call method. Note from the Author or Editor: |
Justin Pihony | Apr 27, 2015 | May 08, 2015 | |
Page 195 United States |
In Figure 10-7, An arrow is missing on the inverse graph from {4,2} to 20 Note from the Author or Editor: |
Justin Pihony | Apr 27, 2015 | May 08, 2015 | |
Page 196 Third line from top |
The example given is: Note from the Author or Editor: |
Anonymous | Jun 02, 2015 | ||
Page 198 Example 10-22 |
In last two lines, "Durations" is misprinted as "Dirations". Note from the Author or Editor: |
Jongyoung Park | May 31, 2015 | ||
Page 199 United States |
"...it needs to have a consistent date format..." Note from the Author or Editor: |
Justin Pihony | Apr 28, 2015 | May 08, 2015 | |
Page 201 United States |
Example 10-32 uses a helper with a map for the final action, whereas 10-33 simply calls print Note from the Author or Editor: |
Justin Pihony | Apr 27, 2015 | May 08, 2015 | |
Mobi | Page 10409 Example 10-34 |
There are couple of naming errors in Scala version of example for newer (as of Spark 1.3) Spark Streaming and Apache Kafka createDirectStream() method (Example 10-34: Apache Kafka directly reading Panda's topic in Scala): Note from the Author or Editor: |
Ivanov Vladimir | Apr 22, 2015 | May 08, 2015 |