Errata

Errata for Data Science from Scratch

Submit your own errata for this product.

The errata list is a list of errors and their corrections that were found after the product was released. If the error was corrected in a later version or reprint the date of the correction will be displayed in the column titled "Date Corrected".

The following errata were submitted by our customers and approved as valid errors by the author or editor.

Color key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update

Version	Location	Description	Submitted By	Date submitted	Date corrected
Printed	Page 4 Block of code below second paragraph	The hashed comments are incorrect. They read: # add i as a friend of j # add j as a friend of i They should read: # add j as a friend of i # add i as a friend of j Note from the Author or Editor: agreed, those two comments should be switched	James Whitehead	Jan 15, 2016	Mar 10, 2017
Printed	Page 7 3rd paragraph	The statement "For example, Thor (id 4) has no friends in common with Devin (id 7) , . . ." is incorrect. They share the friend Clive (id 5). Note from the Author or Editor: good point, change that sentence to For example, Hero (id 0) has no friends in common with Klein (id 9), but they share interests in Java and big data.	Stephen N. Cole	May 17, 2015	Mar 10, 2017
Printed	Page 40 Code	In the histogram code, please add the following to resolve Counter(): from collections import Counter Note from the Author or Editor: I don't care that much either way, I sort of assumed importing Counter was implied, but I don't mind adding a from collections import Counter to the start of the example	_j_j	Jun 01, 2016	Mar 10, 2017
Printed	Page 52 Top of page	The following sentence at the top of the page: "The dot product measures how far the vector v extends in the w direction." is usually false, but can be true if w is a unit vector. Alternatively, the following correction would make the statement true: "The dot product of v and w, divided by the magnitude of w, measures how far the vector v extends in the w direction." Or alternatively: "Given two vectors w and v, if w is a unit vector, then the dot product measures how far the vector v extends in the w direction." Note from the Author or Editor: yeah, this is a fair criticism. I would simply change the first sentence on the page to If _w_ has magnitude 1, the dot product measures how far the vector _v_ extends in the _w_ direction.	Matt Goldwasser	Feb 10, 2016	Mar 10, 2017
Printed	Page 67 2nd paragraph	The list x contains second value 1, whereas it should contain second value -1. Note from the Author or Editor: yes, this is a mistake, x should be [-2, -1, 0, 1, 2]	Zach Landes	Jul 05, 2015	Mar 10, 2017
Printed, PDF	Page 69 Paragraph #2	In the first sentence of paragraph #2, it says: "For our purposes you should think of probability as a way of quantifying the uncertainty associated with events chosen from a some universe of events." ('universe' is italicized) Does 'a some <i>universe</i>' include an extra word, or does it have a special meaning in this context? Note from the Author or Editor: the "a" should not be there, it should just say "chosen from some universe"	Anonymous	Dec 14, 2016	Mar 10, 2017
Printed	Page 75 second paragraph	The sentence "It has the distribution function:" would be improved by substituting "probability density" in place of "distribution". In the preceding section the author introduced the "probability density function" and the "cumulative distribution function". Given that context the reader might incorrectly infer that the equation following the second paragraph is the Normal cumulative distribution function. Note from the Author or Editor: agree, should change to It has the probability density function:	Stephen N. Cole	Jan 01, 2016	Mar 10, 2017
Printed	Page 78 the function inverse_normal_cdf at the top of the page	Values are assigned to low_p and hi_p, but these are never used. Statements that refer to low_p and hi_p should be simplified. Note from the Author or Editor: agree with this, revised version at https://gist.github.com/joelgrus/71c1ba8f96b6422a12adf10d04783512	Stephen N. Cole	Mar 27, 2016	Mar 10, 2017
PDF	Page 83 1st paragraph	In: " X should be distributed approximately normally with mean 50 and standard deviation 15.8:" mean should be 500 not 50 Note from the Author or Editor: confirmed, the mean should be 500	Luis Miguel Soares	Jun 24, 2015	Mar 10, 2017
Printed	Page 83 last line	both 50 should be 500 Note from the Author or Editor: agreed, change both 50 to 500	Dong Zhou	Apr 16, 2016	Mar 10, 2017
Printed	Page 84 2nd/3rd paragraph	The title of a section has disappeared between the 2nd and 3rd paragraphs. It should be p-values. This title appears at the end of the 2nd paragraph with its markup before: === Note from the Author or Editor: yes, looks like the markup wasn't quite right.	Pierre Nugues	Aug 12, 2015	Mar 10, 2017
Printed	Page 89 1st statement in function beta_pdf	If beta_pdf is called with [x=0 and alpha<1] or with [x=1 and beta<1], the function crashes, because python does not permit 0 to be raised to a negative power. An easy fix is to change the 1st statement to if x <= 0 or x >= 1: Note from the Author or Editor: I agree, change the first line of the beta_pdf function to if x <= 0 or x >= 1:	Stephen N. Cole	Jun 15, 2016	Mar 10, 2017
Printed	Page 106 2nd code block	for line in file: should be for line in f: Note from the Author or Editor: agree, should be for line in f: # look at each line in the file	Dong Zhou	Apr 16, 2016	Mar 10, 2017
Printed	Page 108 lines 1 through 14	The script in lines 1 through 9 on page 108 is incorrect, because it does not produce the results printed on lines 11 through 14. Instead, the lines of text in bad_csv.txt get merged into a single line of text - as if the f.write("\n") were missing. One way to correct the error is to change the 6th line to "with open('bad_csv.txt', 'w') as f:" (omitting 'b' from open's 2nd argument). This script does not need 'wb', because it does not use the CSV module. Another (less elegant) resolution is to replace line 9 with f.write("\r\n"). Note from the Author or Editor: I cannot reproduce this (the code as written works for me); however, I agree that the 'b' parameter does not need to be there, and I am ok with getting rid of the b and changing the line of code to `with open('bad_csv.txt', 'w') as f:`	Stephen N, Cole	Nov 27, 2016	Mar 10, 2017
Printed	Page 147 3rd paragraph from bottom	figure out what do should be figure out what to do Note from the Author or Editor: just like the errata says	Dong Zhou	Apr 16, 2016	Mar 10, 2017
Printed	Page 167 2nd paragraph from bottom	all 'spam' in this paragraph should be 'non-spam' Note from the Author or Editor: agree, in that paragraph both "additional spams" should instead be "additional non-spams" (the spam in "spam probabilities" can stay as is)	Dong Zhou	Apr 17, 2016	Mar 10, 2017
Printed	Page 181 Paragaph after "we will understimage beta(1)"	The predictions would tend to be too small for users who work many hours and too large for users who work few hours, because Beta(2) > 0 and we "forgot" to include it. In the actual model, Beta(2) is < 0 (on page 182 in the example, this is confirmed), since in the example "people who work more hours spend less time on the site" Note from the Author or Editor: yes, that whole paragraph is wrong, it should be Think about what would happen if we made predictions using the single variable model with the "actual" value of beta_1. (That is, the value that arises from minimizing the errors of what we called the "actual" model.) The predictions would tend to be way too large for users who work many hours and a little too large for users who work few hours, because beta_2 < 0 and we "forgot" to include it. Because work hours is positively correlated with number of friends, this means the predictions tend to be way too large for users with many friends, and only slightly too large for users with few friends.	J.R. Scally	Jan 21, 2016	Mar 10, 2017
Printed	Page 185 2nd code block	'unemployed' should be 'work_hours' Note from the Author or Editor: agree, the comment on the third line should say work hours # 0.131, # work hours, actual error = 0.127	Dong Zhou	Apr 18, 2016	Mar 10, 2017
PDF	Page 219 backpropagate code listing	``` # back-propagate errors to hidden layer hidden_deltas = [hidden_output * (1 - hidden_output) * dot(output_deltas, [n[i] for n in output_layer]) for i, hidden_output in enumerate(hidden_outputs)] ``` `output_layer` is not defined in the `backpropagate` function, and not passed in, hence running the example produces `NameError: global name 'output_layer' is not defined` Note from the Author or Editor: the line of code dot(output_deltas, [n[i] for n in output_layer]) should be replaced with dot(output_deltas, [n[i] for n in network[-1]])	Anonymous	Feb 26, 2016	Mar 10, 2017
Printed	Page 295 code block for matrix_multiply_mapper and matrix_multiply_reducer	For the function of matrix_multiply_mapper, two matrix indexes should be passed: the row number of A and column number of B For any nonzero A_ij, all C_ik may be affected, with k being any column index of B Similarly, for any nonzero B_ij, all C_kj may be affected, with k being any row index of A In the text, the common dimension was used, which is wrong. Also, for the function of matrix_multiply_reducer, m is not used. Note from the Author or Editor: yes, the code is wrong. here is a fixed version of the functions https://gist.github.com/joelgrus/cd0558f2fc6eeaea22ba8d286775e6a1 and then at the very bottom of the page you need to change the definition of mapper mapper = partial(matrix_multiply_mapper, 2, 3) and at the top of the next page change the definition of reducer reducer = matrix_multiply_reducer	Dong Zhou	Apr 23, 2016	Mar 10, 2017