Errata

Errata for Building Machine Learning Powered Applications

Submit your own errata for this product.

The errata list is a list of errors and their corrections that were found after the product was released. If the error was corrected in a later version or reprint the date of the correction will be displayed in the column titled "Date Corrected".

The following errata were submitted by our customers and approved as valid errors by the author or editor.

Color key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update

Version	Location	Description	Submitted By	Date submitted	Date corrected
PDF	Page X Penultimate paragraph	“ If you already work as a data scientist or ML engineer, this book will add new techniques to your ML development tool.” Appears to be a typo, possibly tool should be toolkit? Note from the Author or Editor: This is a typo indeed, good catch! "If you already work as a data scientist or ML engineer, this book will add new techniques to your ML development tool." should be If you already work as a data scientist or ML engineer, this book will add new techniques to your ML development toolkit.	Richard Morton	Feb 06, 2020	Feb 14, 2020
Printed, PDF, ePub, Mobi, , Other Digital Version	Page *** Code Example	In Chapter 4, Acquire an initial Dataset. There is an error on the code example. ****************************************************************************************** questions_with_accepted_answers = df[ df["is_question"] & ~(df["AcceptedAnswerId"].isna()) ] q_and_a = questions_with_accepted_answers.join( df[["Text"]], on="AcceptedAnswerId", how="left", rsuffix="_answer" ) *************************************************************************************** df[['Text']] shall have been df[['body_text']]. There is no 'Text' in the df.info(). Note from the Author or Editor:** df[["Text"]] should be changed to df[["body_text"]]	Chris Chen	Mar 21, 2020
	Page 63 In the code block	Second release: At 5th line in the code block, df[["Text"]] should be df[["body_text"]]. At 8th line in the code block, q_and_a[["Text", "Text_answer"]] should be q_and_a[["body_text", "body_text_answer"]]. Thank you. Note from the Author or Editor: Errata confirmed, this should be changed to match the code that is provided with the book	Haesun Park	Jul 13, 2021
, Printed, PDF, ePub, Mobi, , Other Digital Version	Page 71 Second to last paragraph	"five" in five most popular ones should be "seven" Because we have more than three hundred tags in our dataset, here we chose to only create a column for the five most popular ones	Emmanuel Ameisen	Sep 30, 2020
Printed, PDF, ePub, Mobi, , Other Digital Version	Page 87 Table 4-5	The text on last paragraph of page 86 describes crossing features as multiplying them, this means on the table in page 87, i expect DoW x DoM = Cross column. However DoW in the table is 7,7,...,1 down the rows. It should be 6,6,...7. Only then when multiplied by DoM of 29,29,...,30 gives 174,174,...210. Note from the Author or Editor: This is correct, there is a slight inaccuracy in the figure. It currently reads 7 \| 29 \| 174 7 \| 29 \| 174 ... 1 \| 30 \| 210 It should be 6 \| 29 \| 174 6 \| 29 \| 174 ... 7 \| 30 \| 210	Han Qi	Mar 19, 2021
	Page 88 list items	Second release: In 2nd bullet, has_question should be question_mark. In 3rd bullet, is_language_question should be language_question. Thank you Note from the Author or Editor: Errata confirmed, the field names should be changes	Haesun Park	Jul 14, 2021
	Page 101 2nd paragraph from the bottom	Second release: In 9th line from the bottom, writers.stackoverflow.com should be writers.stackexchange.com Thank you Note from the Author or Editor: Errata is confirmed, site URL should be changed	Haesun Park	Jul 14, 2021
	Page 105 12th line from the top	Second release: 12th line from the top, "Stack Overflow" should be "Stack Exchange" Thank you Note from the Author or Editor: Correct, we should make the suggested change	Haesun Park	Jul 14, 2021
Printed, PDF, ePub, Mobi, , Other Digital Version	Page 114 2nd paragraph	Text explains calibration curve as "gives a probability of being classified as positive that is higher than 80%". This makes students think the upper bound is 100%, whereas in reality, the intuitive understanding is a bucket of size 10% and the y-axis is calculated from observations within a bucket of predicted probability of 80-90% and not the 80-100% as the language (higher than 80%) in the book implies. The above error would be easier to identify if a 10% example was given, then it would be obviously wrong that only 10% of the observations with predicted probability of 10-100% are actually correct. Anyway, this was how i reasoned that something is wrong before actually researching more to find a better definition. I'm actually not sure what the upper bound or bucket size should be, maybe 80-90% is still not narrow enough for a bucket. Note from the Author or Editor: This is an imprecise use of language here. The paragraph currently reads: "For example, out of all the data points our classifier gives a probability of being classified as positive that is higher than 80%, how many of those data points are actually positive?" It should read: "For example, out of all the data points our classifier gives a probability of being classified as positive that is close to 80%, how many of those data points are actually positive?"	Han Qi	Mar 18, 2021
	Page 129 Figure 6-2	Second release: In 3rd plot of Firgure 6-2, 'Model can fit unseen data' should be 'Model can predict unseen data' Thank you Note from the Author or Editor: The Errata is correct, this should be changed as suggested	Haesun Park	Jul 14, 2021
	Page 132 Figure 6-3	Second release: In figure 6-3, 'Format to model specification', 'Cleaning', 'Feature Generation' should be rearraged like 'Cleaning'-->'Feature Generation'-->'Format to model specification' Thank you Note from the Author or Editor: This report is correct, and the figure should be corrected (I am happy to help). The italics above the arrows in figure 6-3 should be reorganized. More specifically, the 2nd, 3rd and 4th are in the wrong order. Copying from the Errata: 'Format to model specification', 'Cleaning', 'Feature Generation' should be rearraged to 'Cleaning'-->'Feature Generation'-->'Format to model specification'	Haesun Park	Jul 14, 2021
	Page 151 8th line from the bottom	Second release: In 8th line from the bottom, 'perform well on a training test' should be 'perform well on a training set'. Thank you Note from the Author or Editor: Errata is correct, we should change as requested	Haesun Park	Jul 14, 2021
	Page 157 Fist code block	Second release: In last line of the first code block, 'positive_probs = clf[:, 1]' should be 'positive_probs = probabilities[:, 1]' Thank you Note from the Author or Editor: This is a nice catch. While the code in notebooks is correct, this reproduction is wrong. We should change: # probabilities is an array containing one probability per class probabilities = clf.predict_proba(features) # Positive probas contains only the score of the positive class positive_probs = clf[:,1] to: # probabilities is an array containing one probability per class probabilities = clf.predict_proba(features) # Positive probas contains only the score of the positive class positive_probs = probabilities[:,1]	Haesun Park	Jul 14, 2021
	Page 164 2nd paragraph from the bottom	Second release: In first line of 2nd paragraph, 'the body of the question' should be 'the body of the function' Thank you Note from the Author or Editor: This errata is correct	Haesun Park	Jul 14, 2021
, Printed, PDF, ePub, Mobi, , Other Digital Version	Page 197 Top of the page	"outputs" should be "inputs" To prevent a model from running on incorrect outputs, we need to detect that these	Emmanuel Ameisen	Sep 30, 2020
	Page 223 2nd line from the bottom	Second release: In 2nd line from the bottom, 'evaluating a mode' should be 'evaluating a model' Thank you Note from the Author or Editor: The errata is correct	Haesun Park	Jul 14, 2021
, Printed, PDF, ePub, Mobi, , Other Digital Version	Page 232 left side of index	"continuous improvemen" should be "continuous integration" CI/CD (continuous improvement/continuous delivery)	Emmanuel Ameisen	Sep 30, 2020