Errata
The errata list is a list of errors and their corrections that were found after the product was released. If the error was corrected in a later version or reprint the date of the correction will be displayed in the column titled "Date Corrected".
The following errata were submitted by our customers and approved as valid errors by the author or editor.
Color key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update
Version | Location | Description | Submitted By | Date submitted | Date corrected |
---|---|---|---|---|---|
PDF, |
chapter 1, 8th paragraph |
Chapter 1: Paragraph 8: This is from ebook Note from the Author or Editor: |
Saad Khawaja | Oct 17, 2017 | Feb 08, 2018 |
NA subsection Columns, 2nd Paragraph |
The seems to be a typo in Chapter 3 subsection Columns, Paragraph 2....."this column may or may not exist in our of our DataFrames." probably should be Note from the Author or Editor: |
Emmanuel Asimadi | Nov 18, 2017 | Feb 08, 2018 | |
NA Chapter 3, Subsection "Creating Row" |
The return type for below should be Int instead of string. Note from the Author or Editor: |
Emmanuel Asimadi | Nov 18, 2017 | Feb 08, 2018 | |
NA Chapter 3, Subsection "Creating Dataframes" |
Probably should be "encounter" instead of "encourage". Note from the Author or Editor: |
Emmanuel Asimadi | Nov 18, 2017 | Feb 08, 2018 | |
na Chapter 5, Section: Aggregating to complex types |
repeated. Note from the Author or Editor: |
Emmanuel Asimadi | Nov 22, 2017 | Feb 08, 2018 | |
PDF, | Page cover |
The cover of the 1st edition still says it's an "Early Release". |
Harald Gegenfurtner | Dec 31, 2017 | Feb 08, 2018 |
I "Who This Book is For" section |
There is a typo of "efficienly". The correct word is "efficiently". Note from the Author or Editor: |
Keiji Yoshida | Jan 12, 2018 | Feb 08, 2018 | |
1 Chapter 1, under the "Spark Applications" header, just before Figure 1-1 |
Hi, Note from the Author or Editor: |
Simon Bensoussan | Mar 27, 2017 | Feb 08, 2018 | |
1 First chapter (Safari Books Online), in the "A Basic Transformation Data Flow" section, under Figure-9. |
Hi, Note from the Author or Editor: |
Simon Bensoussan | Mar 27, 2017 | Feb 08, 2018 | |
1 Chapter 16 |
chapter 15 and chapter 16 have the same content on Safari Books Online early release https://www.safaribooksonline.com/library/view/spark-the-definitive/9781491912201/ Note from the Author or Editor: |
Anonymous | Jul 16, 2017 | Feb 08, 2018 | |
PDF, | Page 10 4th paragraph |
PDF has "ight" instead of "might" in the paragraph describing Lazy Evaluation. Note from the Author or Editor: |
Pradeep Nalabalapu | Jun 07, 2017 | Feb 08, 2018 |
PDF, | Page 19 Chapter 1, paragraph 3 |
Last word of the paragraph contains a typo: "langauge". It should be "language": Note from the Author or Editor: |
Anonymous | Jan 18, 2018 | Feb 08, 2018 |
Printed | Page 20 3rd paragraph (Lazy Evaluation section) |
In the start of the paragraph "Lazy evaulation" the word "evaluation" has a typo. Note from the Author or Editor: |
Sertan Şentürk | Apr 13, 2018 | |
PDF, | Page 30 Last Paragraph (scala version of code) |
On Page-30, below is the original scala version of code - Note from the Author or Editor: |
Manish Bahrani | Jul 05, 2017 | Feb 08, 2018 |
Printed | Page 35 Scala code at the bottom |
The code is missing a "sort descending". It is implied this was present at some point, both from the import and from the results on the next page (which you only get if you apply a sort), but it is no longer in either the Scala or the Python code. Note from the Author or Editor: |
Tom Geudens | Apr 10, 2018 | |
Printed | Page 37 middle code block |
The code blocks for both Scala and Python define a purchaseByCustomerPerHour. Which is very specific, but the window function used states window(col("InvoiceDate"), "1 day"). Now I'm not a specialist on the Spark function-set yet, but based on what I read there I would say it should be PerDay and not PerHour ? Note from the Author or Editor: |
Tom Geudens | Apr 10, 2018 | |
Printed | Page 44 Last two lines |
'The only difference will by syntax' should read 'The only difference will be syntax' Note from the Author or Editor: |
Elias Strehle | Mar 28, 2018 | |
PDF, | Page 61 Last Section |
Hi, Note from the Author or Editor: |
Manish Bahrani | Jul 05, 2017 | Feb 08, 2018 |
Printed | Page 74 Changing a Column's Type (cast) |
The count-column is actually already of the LongType (which you show on page 60). So it may make more sense to cast("integer"). Note from the Author or Editor: |
Tom Geudens | Apr 15, 2018 | |
Printed | Page 90 second code block |
The describe method will actually compute statistics on almost any column, not just numeric ones. The df.describe.show() also shows results for Country and Descripition (string), but not for the InvoiceDate (timestamp). This is also reflected if you select this columns : Note from the Author or Editor: |
Tom Geudens | Apr 17, 2018 | |
Printed | Page 97 7th line in code block |
The 7th line in the '# in python' code block at the top of the page contains an undefined variable 'c'. This should be 'color_string' instead: Note from the Author or Editor: |
Elias Strehle | Mar 28, 2018 | |
Printed | Page 98 1st sentence after code block |
'Although Spark will do read dates or times on a best-effort basis' should read 'Spark will read dates or times on a best-effort basis' Note from the Author or Editor: |
Elias Strehle | Mar 28, 2018 | |
Printed | Page 102 Last paragraph, 5th sentence |
'When we declare [...] not having a null time [...]' should read 'When we declare [...] not having a null type [...]' Note from the Author or Editor: |
Elias Strehle | Mar 28, 2018 | |
Printed | Page 122 sumDistinct code block |
The SQL statement for sumDistinct is not correct as the DISTINCT keyword is missing, it should be Note from the Author or Editor: |
Tom Geudens | Apr 24, 2018 | |
129 SQL query of subDistinct |
sumDistinct example in SQL format require correction. Note from the Author or Editor: |
Amit Kumar | Nov 15, 2018 | ||
Printed | Page 131 first line of python code |
The piece of code should clear nulls, but the .na has not been included. Note from the Author or Editor: |
Jonathan Wharton | Jan 12, 2019 | |
Printed | Page 155 Last paragraph, 3rd sentence |
"format is optional because by default, Spark will use the arquet format." should read "format is optional because by default, Spark will use the parquet format.". Note from the Author or Editor: |
Anonymous | Jan 19, 2019 | |
Printed | Page 194 last set of SQL code on page |
SELECT * FROM flights Note from the Author or Editor: |
Jonathan Wharton | Jan 15, 2019 | |
Printed | Page 212 Last sentence |
'You get the both of best worlds.' I think the incorrect is order. Note from the Author or Editor: |
Elias Strehle | Mar 28, 2018 | |
Printed | Page 229 First paragraph in section 'Understanding Aggregation Implementations' |
'We'll do these in the context of a key, but the same basic principles apply to the groupBy and reduce methods' should read 'We'll do these in the context of a key, but the same basic principles apply to the groupByValue and reduceValue methods' Note from the Author or Editor: |
Elias Strehle | Mar 28, 2018 | |
Printed | Page 245 1st paragraph of subsection 'Custom Accumulators' |
'In this example, you we will add [...]' should contain either 'you' or 'we', not both Note from the Author or Editor: |
Elias Strehle | Mar 28, 2018 | |
Printed | Page 256 2nd paragraph, last word |
'Appication' should read 'Application' Note from the Author or Editor: |
Elias Strehle | Mar 29, 2018 | |
Printed | Page 257 Info box, 6th sentence |
'communtiy' should read 'community' Note from the Author or Editor: |
Elias Strehle | Mar 29, 2018 | |
Printed | Page 272 3rd paragraph, 1st sentence |
'When submitting applciations, [...]' should read 'When submitting applications, [...]' Note from the Author or Editor: |
Elias Strehle | Mar 29, 2018 | |
Printed | Page 276 1st and 2nd paragraph |
The code block should be below the 2nd paragraph, not above, so the last sentence 'The example that follows [...]' becomes correct Note from the Author or Editor: |
Elias Strehle | Mar 29, 2018 | |
Printed | Page 336 Subsection 'Real-time decision making', 2nd sentence |
The last word 'fradulent' should read 'fraudulent' Note from the Author or Editor: |
Elias Strehle | Apr 03, 2018 | |
Printed | Page 339 1st paragraph, 2nd sentence |
'[...] require deep expertise to be develop and maintain.' should read '[...] require deep expertise to be developed and maintained.' Note from the Author or Editor: |
Elias Strehle | Apr 03, 2018 | |
Printed | Page 342 1st paragraph, 4th sentence |
'[...] (all of its the windowing operators [...]' should read '[...] (all of its windowing operators [...]' or '[...] (all of the windowing operators [...]' Note from the Author or Editor: |
Elias Strehle | Apr 03, 2018 | |
Printed | Page 372 2nd paragraph, code block |
The code block Note from the Author or Editor: |
Elias Strehle | Apr 03, 2018 | |
Printed | Page 378 Section 'Arbitrary Stateful Processing', 1st sentence |
'The first section if this chapter [...]' should read 'The first section of this chapter [...]' Note from the Author or Editor: |
Elias Strehle | Apr 03, 2018 | |
Printed | Page 381 General note |
'[...] output of the dream [...]' is a lovely metaphor, but should probably read '[...] output of the stream [...]' Note from the Author or Editor: |
Elias Strehle | Apr 03, 2018 | |
Printed | Page 402 3rd Paragraph |
The sentence, "O'Reilly should we link to or mention any specific ones?" is left in the text. Note from the Author or Editor: |
Anonymous | Mar 27, 2018 | |
Printed | Page 437 Subsection 'Advanced bucketing techniques', 1st sentence |
'descriubed' should read 'described' Note from the Author or Editor: |
Elias Strehle | Apr 03, 2018 | |
Printed | Page 462 Subsection 'Multilabel Classification', 4th sentence |
'Another example of multilabel classification is identifying the number of objects that appear in an image.' Note from the Author or Editor: |
Elias Strehle | Apr 03, 2018 | |
Printed | Page 518 1st paragraph, 3rd sentence |
'[...] combine motif finding with DataFarme queries [...]' should read '[...] combine motif finding with DataFrame queries [...]' Note from the Author or Editor: |
Elias Strehle | Apr 04, 2018 |