Errata
The errata list is a list of errors and their corrections that were found after the product was released. If the error was corrected in a later version or reprint the date of the correction will be displayed in the column titled "Date Corrected".
The following errata were submitted by our customers and approved as valid errors by the author or editor.
Color key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update
Version | Location | Description | Submitted By | Date submitted | Date corrected |
---|---|---|---|---|---|
Other Digital Version | 22% Figure 5-1 |
in this figure a set of partitions is displayed, across 3 different Spark transformations. Note from the Author or Editor: |
Pablo Rodriguez Bertorello | Jul 02, 2017 | Oct 20, 2017 |
Page 39 table 3-3 |
In table 3-3, 'gt' of last row should be 'geq' Note from the Author or Editor: |
Jongyoung Park | Jul 28, 2017 | Oct 20, 2017 | |
Page 116 1st paragraph |
In the sentence "checkpointing or off_heap persistence or checkpointing", one of two 'checkpoint' should be removed. Note from the Author or Editor: |
Jongyoung Park | Aug 19, 2017 | Oct 20, 2017 | |
Page 121 2nd line in 'LRU caching' |
Intead -> Instead Note from the Author or Editor: |
Jongyoung Park | Aug 20, 2017 | Oct 20, 2017 | |
Page 130 2nd paragraph from bottom |
'of of' must be 'of' Note from the Author or Editor: |
Jongyoung Park | Aug 26, 2017 | Oct 20, 2017 | |
Page 131 TIP |
IMO, "an ordering an an object" shold be "an ordering of an object" Note from the Author or Editor: |
Jongyoung Park | Aug 26, 2017 | Oct 20, 2017 | |
Page 161 last paragraph |
"(value, column index pairs)" should be "(value, column index) pairs". Note from the Author or Editor: |
Jongyoung Park | Sep 07, 2017 | Oct 20, 2017 | |
Page 187 "Installing PySpark" section |
1. In the second paragraph, last right parenthesis looks useless. |
Jongyoung Park | Sep 18, 2017 | Oct 20, 2017 | |
Other Digital Version | 2091 Example 4-4 |
The author says "you can prevent the shuffle [...] and persisting the RDD before the join." However, in Example 4-4, the RDD is not persisted before the join. In addition, the author does not explain the difference between persisting and not persisting, do they really affect the performance of the join? Note from the Author or Editor: |
Yong-Siang Shih | Jul 15, 2017 | Oct 20, 2017 |
Other Digital Version | 2141 Example 4-5 |
Although a broadcast variable of smallRDDLocal is created, the the original smallRDDLocal is used. This seems like a mistake as official document points out: Note from the Author or Editor: |
Yong-Siang Shih | Jul 15, 2017 | Oct 20, 2017 |
Other Digital Version | 2912 TIP of Example 5-14 |
The tip says: "calling distinct will cause a shuffle if the partitioner is not known." However, since the distinct function is implemented by Note from the Author or Editor: |
Yong-Siang Shih | Jul 15, 2017 | Oct 20, 2017 |
Other Digital Version | 3209 Example 5-23 |
The author claims that by persisting rddA, the "sort stage" will occur only once. This is incorrect. In fact, the "sorted" RDD should be persisted instead. Also, it should be persisted before the count action rather than after that. Note from the Author or Editor: |
Yong-Siang Shih | Jul 15, 2017 | Oct 20, 2017 |