Errata
The errata list is a list of errors and their corrections that were found after the product was released.
The following errata were submitted by our customers and have not yet been approved or disproved by the author or editor. They solely represent the opinion of the customer.
Color Key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update
Version | Location | Description | Submitted by | Date submitted |
---|---|---|---|---|
O'Reilly learning platform | Page Chapter 4, Categorical Features Revisited First block of code |
The UDF defined in the code block returns StringType(), which will raise an error if the unencoded training set is transformed by the VectorAssembler in the pipeline. The UDF should be modified to return IntegerType instead, i.e. |
Philipp Spengler | Feb 21, 2024 |
Printed | Page 44 Building a first model |
The book includes possibly redundant code: |
Ben Halicki | Aug 18, 2022 |
Printed | Page 49 First block of code |
The book refers to ‘top_prediction_pandas’ on line 2 – this should be ‘top_predictions_pandas’. |
Ben Halicki | Aug 18, 2022 |
Printed | Page 51 Computing AUC |
On page 51 (section Computing AUC), it appears some source code has been truncated (indicated by the ... in module def area_under_curve). I have checked the source repository for this code (https://github.com/sryza/aas) so I can see how it is implemented, however, the source code in the repository is in SCALA, not PySpark, so not really helpful. Is there a repository containing the PySpark code for this book? |
Ben Halicki | Aug 18, 2022 |
Page 51 fourth last line of code |
incorrect : all_artist_ids = all_data.select("artist").distinct().count() |
Anonymous | Apr 13, 2023 |