The errata list is a list of errors and their corrections that were found after the product was released. If the error was corrected in a later version or reprint the date of the correction will be displayed in the column titled "Date Corrected".
The following errata were submitted by our customers and approved as valid errors by the author or editor.
Version |
Location |
Description |
Submitted By |
Date submitted |
Date corrected |
PDF |
Page X
Penultimate paragraph |
“ If you already work as a data scientist or ML engineer, this book will add new techniques to your ML development tool.” Appears to be a typo, possibly tool should be toolkit?
Note from the Author or Editor: This is a typo indeed, good catch!
"If you already work as a data scientist or ML engineer, this book will add new techniques to your ML development tool."
should be
If you already work as a data scientist or ML engineer, this book will add new techniques to your ML development toolkit.
|
Richard Morton |
Feb 06, 2020 |
Feb 14, 2020 |
Printed, PDF, ePub, Mobi, , Other Digital Version |
Page ***
Code Example |
In Chapter 4, Acquire an initial Dataset. There is an error on the code example.
********************************************************************************************
questions_with_accepted_answers = df[
df["is_question"] & ~(df["AcceptedAnswerId"].isna())
]
q_and_a = questions_with_accepted_answers.join(
df[["Text"]], on="AcceptedAnswerId", how="left", rsuffix="_answer"
)
*******************************************************************************************
df[['Text']] shall have been df[['body_text']]. There is no 'Text' in the df.info().
Note from the Author or Editor: df[["Text"]] should be changed to df[["body_text"]]
|
Chris Chen |
Mar 21, 2020 |
|
|
Page 63
In the code block |
Second release:
At 5th line in the code block, df[["Text"]] should be df[["body_text"]].
At 8th line in the code block, q_and_a[["Text", "Text_answer"]] should be q_and_a[["body_text", "body_text_answer"]].
Thank you.
Note from the Author or Editor: Errata confirmed, this should be changed to match the code that is provided with the book
|
Haesun Park |
Jul 13, 2021 |
|
, Printed, PDF, ePub, Mobi, , Other Digital Version |
Page 71
Second to last paragraph |
"five" in five most popular ones should be "seven"
Because we have more than three hundred tags in our dataset, here we chose to only create a column for the five most popular ones
|
Emmanuel Ameisen |
Sep 30, 2020 |
|
Printed, PDF, ePub, Mobi, , Other Digital Version |
Page 87
Table 4-5 |
The text on last paragraph of page 86 describes crossing features as multiplying them, this means on the table in page 87, i expect DoW x DoM = Cross column. However DoW in the table is 7,7,...,1 down the rows. It should be 6,6,...7. Only then when multiplied by DoM of 29,29,...,30 gives 174,174,...210.
Note from the Author or Editor: This is correct, there is a slight inaccuracy in the figure. It currently reads
7 | 29 | 174
7 | 29 | 174
...
1 | 30 | 210
It should be
6 | 29 | 174
6 | 29 | 174
...
7 | 30 | 210
|
Han Qi |
Mar 19, 2021 |
|
|
Page 88
list items |
Second release:
In 2nd bullet, has_question should be question_mark.
In 3rd bullet, is_language_question should be language_question.
Thank you
Note from the Author or Editor: Errata confirmed, the field names should be changes
|
Haesun Park |
Jul 14, 2021 |
|
|
Page 101
2nd paragraph from the bottom |
Second release:
In 9th line from the bottom, writers.stackoverflow.com should be writers.stackexchange.com
Thank you
Note from the Author or Editor: Errata is confirmed, site URL should be changed
|
Haesun Park |
Jul 14, 2021 |
|
|
Page 105
12th line from the top |
Second release:
12th line from the top, "Stack Overflow" should be "Stack Exchange"
Thank you
Note from the Author or Editor: Correct, we should make the suggested change
|
Haesun Park |
Jul 14, 2021 |
|
Printed, PDF, ePub, Mobi, , Other Digital Version |
Page 114
2nd paragraph |
Text explains calibration curve as "gives a probability of being classified as positive that is higher than 80%".
This makes students think the upper bound is 100%, whereas in reality, the intuitive understanding is a bucket of size 10% and the y-axis is calculated from observations within a bucket of predicted probability of 80-90% and not the 80-100% as the language (higher than 80%) in the book implies.
The above error would be easier to identify if a 10% example was given, then it would be obviously wrong that only 10% of the observations with predicted probability of 10-100% are actually correct. Anyway, this was how i reasoned that something is wrong before actually researching more to find a better definition. I'm actually not sure what the upper bound or bucket size should be, maybe 80-90% is still not narrow enough for a bucket.
Note from the Author or Editor: This is an imprecise use of language here. The paragraph currently reads:
"For example, out of all the data points our classifier gives a probability of being classified as positive that is higher than 80%, how many of those data points are actually positive?"
It should read:
"For example, out of all the data points our classifier gives a probability of being classified as positive that is close to 80%, how many of those data points are actually positive?"
|
Han Qi |
Mar 18, 2021 |
|
|
Page 129
Figure 6-2 |
Second release:
In 3rd plot of Firgure 6-2, 'Model can fit unseen data' should be 'Model can predict unseen data'
Thank you
Note from the Author or Editor: The Errata is correct, this should be changed as suggested
|
Haesun Park |
Jul 14, 2021 |
|
|
Page 132
Figure 6-3 |
Second release:
In figure 6-3, 'Format to model specification', 'Cleaning', 'Feature Generation' should be rearraged like 'Cleaning'-->'Feature Generation'-->'Format to model specification'
Thank you
Note from the Author or Editor: This report is correct, and the figure should be corrected (I am happy to help).
The italics above the arrows in figure 6-3 should be reorganized. More specifically, the 2nd, 3rd and 4th are in the wrong order. Copying from the Errata:
'Format to model specification', 'Cleaning', 'Feature Generation' should be rearraged to
'Cleaning'-->'Feature Generation'-->'Format to model specification'
|
Haesun Park |
Jul 14, 2021 |
|
|
Page 151
8th line from the bottom |
Second release:
In 8th line from the bottom, 'perform well on a training test' should be 'perform well on a training set'.
Thank you
Note from the Author or Editor: Errata is correct, we should change as requested
|
Haesun Park |
Jul 14, 2021 |
|
|
Page 157
Fist code block |
Second release:
In last line of the first code block, 'positive_probs = clf[:, 1]' should be 'positive_probs = probabilities[:, 1]'
Thank you
Note from the Author or Editor: This is a nice catch. While the code in notebooks is correct, this reproduction is wrong.
We should change:
# probabilities is an array containing one probability per class
probabilities = clf.predict_proba(features)
# Positive probas contains only the score of the positive class
positive_probs = clf[:,1]
to:
# probabilities is an array containing one probability per class
probabilities = clf.predict_proba(features)
# Positive probas contains only the score of the positive class
positive_probs = probabilities[:,1]
|
Haesun Park |
Jul 14, 2021 |
|
|
Page 164
2nd paragraph from the bottom |
Second release:
In first line of 2nd paragraph, 'the body of the question' should be 'the body of the function'
Thank you
Note from the Author or Editor: This errata is correct
|
Haesun Park |
Jul 14, 2021 |
|
, Printed, PDF, ePub, Mobi, , Other Digital Version |
Page 197
Top of the page |
"outputs" should be "inputs"
To prevent a model from running on incorrect outputs, we need to detect that these
|
Emmanuel Ameisen |
Sep 30, 2020 |
|
|
Page 223
2nd line from the bottom |
Second release:
In 2nd line from the bottom, 'evaluating a mode' should be 'evaluating a model'
Thank you
Note from the Author or Editor: The errata is correct
|
Haesun Park |
Jul 14, 2021 |
|
, Printed, PDF, ePub, Mobi, , Other Digital Version |
Page 232
left side of index |
"continuous improvemen" should be "continuous integration"
CI/CD (continuous improvement/continuous delivery)
|
Emmanuel Ameisen |
Sep 30, 2020 |
|