Errata
The errata list is a list of errors and their corrections that were found after the product was released.
The following errata were submitted by our customers and have not yet been approved or disproved by the author or editor. They solely represent the opinion of the customer.
Color Key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update
Version | Location | Description | Submitted by | Date submitted |
---|---|---|---|---|
Other Digital Version | 5. Statistics | For instance, if you don’t mind being angrily accused of https://www.nytimes.com/2014/06/30/technology/facebook-tinkers-with-users-emotions-in-news-feed-experiment-stirring-outcry.html?r=0[experimenting on your users], you could randomly choose a subset of your users and show them content from only a fraction of their friends. If this subset subsequently spent less time on the site, this would give you some confidence that having more friends _causes more time to be spent on the site. |
Anonymous | Feb 18, 2021 |
Page Chapter 7. Hypothesis and Inference - Example: Flipping a Coin the first paragraph before the next section; p-Values |
"Imagine instead that our null hypothesis was that the coin is not biased |
Milad N Rahbar | Nov 18, 2022 | |
Other Digital Version | Section 11 Code for normal_pdf function |
While the code for the return statement is accurate, it is slightly difficult to follow because it is structured differently from the equation given above it. It could be made easier to comprehend by utilizing parentheses like |
Neelakantan | Nov 24, 2022 |
Printed | Page p. 107 both at top and bottom |
When running the code at the bottom of the page, trying to find the slope and intercept of the line, the code does as asked for [-14, 14] input range, but I get numeric overflow for [-15, 15] (and larger). |
Dave Cooke | Mar 01, 2024 |
Printed | Page p. 107 both at top and bottom |
I submitted a "numerical overflow" issue, and said I fixed it by unitizing the gradient average. Well...never mind. Much later I found that I had a typo in vector_mean. Fix that. All good. Sorry. |
David Barton Cooke | Mar 05, 2024 |
Printed | Page 7 penultimate paragraph |
While the phrase "...they share interests in Java and big data" is correct, it would be more complete to also include Hadoop in this summary of shared interests. |
Matt S | Jan 15, 2020 |
Printed | Page 16 Middle of the page |
The text says: "Whitespace is ignored inside parentheses and brackets...", which is true, but it is meant that line breaks are ignored inside parentheses. |
Markus Gottwald | Feb 17, 2021 |
Printed | Page 17 2nd and 3rd paragraphs |
"source activate" should be replaced with "conda activate". (on 2 lines) |
Jonathan | Oct 31, 2021 |
Printed | Page 22 4th paragraph |
"If you leave off the start of the slice, you'll slice from the beginning of the list, and if you leave of the end of the slice,..." |
Bill Ward | Sep 22, 2020 |
Printed | Page 26 2.9" from top |
# {"Joel": {"City": Seattle"}} |
ColinGT | Apr 16, 2020 |
Page 33 4, 5, 6 |
Personal opinion about optimizing part of the Data Science from Scratch 2nd edition book |
RZM | Apr 04, 2022 | |
Printed | Page 35 last lines |
To choose elements with replacement (i.e., allowing duplicates), you can just make multiple calls to random.choice: |
Gregory Sherman | Nov 22, 2020 |
ePub | Page 35 sort function at bottom of page |
The lambda expression in this function should be |
John Kilbourne | Dec 06, 2020 |
ePub | Page 36 middle of page |
the code |
Anonymous | Oct 04, 2021 |
Printed | Page 41 1st paragraph |
It's the same as already listed for ePub, return sum(xs) instead of sum(total), just asking to note it's present in the print version too, and different page number, 41 vs 87. |
Charles Shoopak | Dec 23, 2019 |
Printed | Page 41 first 2 code samples |
Both of the sample functions are returning sum(total) when they should return sum(xs). |
Dylan Kaufman | Jan 25, 2022 |
Printed | Page 50 plt.annotate() call |
plt.annotate() call results in |
Gregory Sherman | Dec 12, 2019 |
Printed | Page 62 assertions |
assert friend_matrix[0][2] == 1, "0 and 2 are friends" |
Gregory Sherman | Dec 12, 2019 |
Printed | Page 63 list |
This ambiguity makes Figure 5-1 and the code that produced it difficult to understand |
Gregory Sherman | Nov 24, 2020 |
Printed | Page 63 The whole chapter |
I’m working through the examples in the statistics chapter in “Data Science from Scratch, 2nd edition”,by Joel Grus, and I am getting the following error: |
Anonymous | May 10, 2021 |
Printed | Page 64 code at top of page |
comment below should be "#height is just # of people" |
Gregory Sherman | Dec 12, 2019 |
Page 68 2nd paragraph |
Paragraph states that East coast data scientists skew more towards PhD types, but it makes nos sense with the exp[lanation and the table above shows the oposite. |
Anonymous | Oct 09, 2020 | |
Printed | Page 70 first 3 Python statements |
The 3 statements can be written more succinctly as: |
Gregory Sherman | Nov 24, 2020 |
Printed | Page 84 paragraph at top |
"The mean of a Bernoulli variable is p, and its standard deviation |
Gregory Sherman | Dec 12, 2019 |
Printed | Page 84 last paragraph |
"A Binomial(n,p) random variable is simply the sum of |
Gregory Sherman | Dec 12, 2019 |
Printed | Page 85 last sentence |
"make_hist" should be "binomial_histogram" |
Gregory Sherman | Dec 12, 2019 |
Printed | Page 85 3 |
When using a line chart to show the normal approximation, you create the heights by taking the difference of two CDF calls. In the first CDF call, you add 0.5 and in the second CDF call you subtract 0.5. I suspect this is because you assume the integer value of x covers the range from x + 0.5 to x - 0.5 and you map this full probability to x. It would be ideal if you clarified the reason behind this. Thank you. |
Mateusz Rakowski | Apr 30, 2022 |
Printed | Page 86 first sentence |
"...if you want to know the probability that (say) a fair coin |
Gregory Sherman | Dec 12, 2019 |
Printed | Page 86 For Further Information second bullet point |
Link to Introduction to Probability is broken. New link is ~prob/prob/prob.pdf |
Jamie Mellway | Aug 19, 2023 |
Printed | Page 89 3 |
In your normal_two_sided_bounds function, defining the tail_probability as (1-probability)/2 makes the upper_bound < lower_bound in your return result which then feeds into an incorrect answer on page 90 regarding the result of power = 1 - type_2_probability. To produce the correct answer, you should subtract the tail_probability from 1, and use this value instead of tail_probability inside the calls to normal_lower_bound and normal_upper_bound. The use of an assert statement would have been perfect to validate your answer on page 90 which would have caught the bug on page 89. |
Mateusz Rakowski | May 01, 2022 |
Printed | Page 91 simulation code |
import random |
Gregory Sherman | Nov 25, 2020 |
Printed | Page 95 4th paragraph |
For the first example of the A/B testing code ["tastes great" 200 clicks, "less bias" 180 clicks] the books says: "The probability of seeing such a large difference if the means were actually equal..." |
Steffen | Jun 04, 2022 |
Printed | Page 96 beta_pdf definition code-block |
The text defines the beta_pdf like so: |
Brendan King | Mar 06, 2020 |
Printed | Page 104 First paragraph, second to last sentence |
Book states "...we can estimate derivatives by evaluating the difference quotient for a very small e". I believe the text should be "for a very small h". |
Andrew Mathena | Jan 21, 2020 |
Printed | Page 105 Code Sample |
Instead of using the specific sum_of_squares_gradient we could have used the generic estimate_gradient method as |
Michael Shearer | Feb 07, 2021 |
Printed | Page 107 first line |
squared_error = error * 2 |
Gregory Sherman | Dec 13, 2019 |
Printed | Page 112 last sentence of note |
"... use chmod x egrep.py++ to make the file executable" |
Gregory Sherman | Dec 13, 2019 |
Printed | Page 127 code and following paragraph |
if len(tweets) >= 100: |
Gregory Sherman | Dec 16, 2019 |
Printed | Page 137 Figure 10-5 |
"if stock_price.clo" |
Gregory Sherman | Dec 16, 2019 |
Printed | Page 142 2nd line of the 3rd code snippet |
The news.cnet.com site is not available. |
Ryoko | Mar 09, 2020 |
Printed | Page 144 1st |
The standard deviation calculated is the sample standard deviation, not the population standard deviation. In this example, you never mention that the vectors used in the calculation are part of a sample and not an entire population. In the text, you also don't specify which you intend to calculate. |
Mateusz Rakowski | May 22, 2022 |
Printed | Page 156 both code blocks |
data = [n for n in range(1000)] |
Gregory Sherman | Nov 28, 2020 |
Page 157 Last sentence between parahrEnd of 9th paragraph |
I think that at this sentence "(Of course the model that performed best on the test set is going to perform well on the test set)", the second "test set" should be "training set". I think it has no sense with two test sets. |
Anonymous | May 02, 2022 | |
Printed | Page 167 code at bottom of page |
"with open ('iris.dat', 'w') as f:" |
Gregory Sherman | Dec 21, 2019 |
Printed | Page 168 parse_iris_row() |
Upon running the code on iris.data (both downloaded from github), |
Gregory Sherman | Dec 21, 2019 |
Printed | Page 168 parse_iris_row() |
The previously reported crash of the program can be avoided by deleting the blank lines at the end of the downloaded data file. |
Gregory Sherman | Nov 29, 2020 |
Printed | Page 168 iris_data = [parse_iris_row(row) for row in reader] |
iris_data = [parse_iris_row(row) for row in reader] |
Karl Wilson | Apr 20, 2024 |
Printed | Page 182 next-to-last paragraph |
This could be due to the SpamAssassin files on the site changing. |
Gregory Sherman | Dec 21, 2019 |
Printed | Page 198 last paragraph |
The beginning of this paragraph talks about testing the null hypothesis "beta_i = 0". However, the subsequent formula and example code all uses "beta_j" / "beta_hat_j". Is this difference in subscript letter deliberate? Do beta_i and beta_j actually mean slightly different things, or is this just a typo? Thanks very much! |
Anji Z | Feb 04, 2021 |
Page 202 code comment of _negative_log_partial_j function |
This comment is not necessary. |
Anonymous | Jan 10, 2020 | |
Printed | Page 245 Linear code |
In the comments below is it the o-th neuron or the o-th layer of neurons? |
Michael Shearer | Feb 28, 2021 |
Printed | Page 247 assignment in middle |
xor_net = Sequential([ |
Gregory Sherman | Dec 04, 2020 |
Printed | Page 247 middle |
On looking more closely at the github code, there is another similarly named variable. |
Gregory Sherman | Dec 08, 2020 |
Printed | Page 282 1st code block |
content = soup.find('div', 'post-radar-content') |
Michael Shearer | Mar 06, 2021 |
Printed | Page 287 3rd code block, else comment |
‘If the total is 8 or more’ not 7. 7 is dealt with in <=7 case. |
Michael Shearer | Mar 06, 2021 |
Printed | Page 299 Penultimate code block |
Should model description be ‘as a word_id’ versus ‘as a vector of word_ids’ in this particular example. |
Michael Shearer | Mar 06, 2021 |
Printed | Page 305 Code block |
The tags have changed and the page currently lists 137 companies. |
Michael Shearer | Mar 07, 2021 |
Printed | Page 319, 320 code |
for iter in tqdm.trange(num_iters): |
Gregory Sherman | Dec 08, 2020 |
Printed | Page 319 code block at bottom of page |
the first line in the `page_rank` function says |
Anji Z | Mar 05, 2021 |
Printed | Page 328 Penultimate para. |
Link to file download should be: |
Michael Shearer | Mar 12, 2021 |