The errata list is a list of errors and their corrections that were found after the product was released.
The following errata were submitted by our customers and have not yet been approved or disproved by the author or editor. They solely represent the opinion of the customer.
Version |
Location |
Description |
Submitted by |
Date Submitted |
PDF |
Page P. 10
6th paragraph |
It seems that "65%" is not correct in the following sentence and must be changed to "79166.67/(61500+48000)=72.3%"!
Data scientists with more than five years experience
earn 65% more than data scientists with little or no experience!
|
A. R. Nematollahi |
Sep 29, 2023 |
PDF |
Page 99
last paragraph |
There i see below sentence:
And changing one of our data points by a small amount e might increase the median by e, by some number less than e, or not at all (depending on the rest of the data).
I'm confused. Changing a value might change the median by e?
I think the median does not change until number changing happened in a way that sorted array of number change the data before and after the median.
|
Sina Saeednia |
Apr 22, 2021 |
PDF |
Page 5
12th total line of the page. Inside sorted(), 2nd line. |
sorted(num_friends_by_id, key=lambda (user_id, num_friends): num_friends, reverse=True)
The same call in my spyder (python3.7) returns that lambda is missing 1 required positional argument.
I had to sort the list using other key. Just want to know if it is duo to python's version (the book says it is built on python 2.7) or anything else.
Note: no value named "num_friends" was not previosly assigned in any other examples. may be useful.
|
Raul Dias Barboza |
Jul 21, 2019 |
Printed |
Page 5
Code block above figure |
sorted(num_friends_by_id,
key=lambda (user_id, num_friends): num_friends,
reverse=True)
# code provided in book (above) does not work in Python3 due to invalid syntax
# this works
sorted(num_friends_by_id,
key=lambda num_friends: num_friends[1],
reverse=True)
|
Anastasia Gkelameri |
Jan 13, 2019 |
PDF |
Page 219
'backpropagate' function |
Two things. First the expression for output_deltas is wrong. There is actually no factor of output*(1 - output), this only comes in when considering hidden layers due to the chain rule. Although we do differentiate the sigmoid once, the result simplifies from the two terms in the definition of the logistic cost function.
Secondly, it is wrong to update the weights going into the final layer before the errors for preceding layers have been calculated. As the calculation for the errors depends on the weight, we end up with wrong values for the hidden errors, and hence do not update the weights going into the hidden layer correctly.
Correction of both of these yields significant improvement in performance.
|
Sam Vs |
Feb 10, 2018 |
Printed |
Page 284
4th paragraph |
The SQL query has two errors:
1. user.id in SELECT should be users.user_id.
2. The following GROUP BY statement should be added at the end of the query:
GROUP BY users.user_id
The complete query should be as follows:
SELECT users.user_id, COUNT(user_interests.interest) AS num_interests
FROM users
LEFT JOIN user_interests
ON users.user_id = user_interests.user_id
GROUP BY users.user_id
|
Sergiy Kolesnikov |
Jan 10, 2018 |
Printed |
Page 34
last paragraph |
the link ipython.org/videos.html is no longer valid.
perhaps ipython.org/presentation.html can be used as an alternative.
|
Adrian |
Nov 22, 2017 |
Printed |
Page 26
first code snippet |
the text claims that this bit of code:
s = some_function_that_returns_a_string()
if s:
first_char = s[0]
else:
first_char = ""
is equivalent, due to truthiness, to this:
first_char = s and s[0]
This is not accurate.
As an example, assume either s = None
The result of the above if statement will be first_char equal ""
The result of first_char = s and s[0] will be first_char equal to None
|
Adrian |
Nov 17, 2017 |
Printed |
Page 51
6th block of example code |
def vector_mean(vectors):
"""compute the vector whose ith element is the mean of the
ith elements of the input vectors"""
n = len(vectors)
return scalar_multiply(1/n, vector_sum(vectors))
When you run the vector_mean function the result is always a vector full of zeros unless the list of vectors passed into the function only contains one vector. The scalar_multiply function has 1/n passed into it, but this is rounded to 0 when dividing 1 by any integer greater than 1.
This is corrected by changing the 4th line of code to:
n = float(len(vectors))
|
Jeff Wallace |
Nov 15, 2017 |
Printed |
Page 4
Second code block populating users list with friendship data |
The current text is:
for i, j in friendships:
# this works because users[i] is the user whose id is i
users[i]["friends"].append(users[j]) #add i as a friend of j
users[j]["friends"].append(users[i]) #add j as a friend of i
There are two issues:
(1) the comments are reversed between the two code lines (already reported and listed as confirmed error).
(2) the correct code for the two statements inside the for loop should actually be:
users[i]["friends"].append(j) #add j as a friend of i
users[j]["friends"].append(i) #add i as a friend of j
|
Anonymous |
Sep 25, 2017 |
Other Digital Version |
147
2nd to last |
"model on page 142" --> model is actually on pg 143
|
Patrick |
Jul 26, 2017 |
Printed, ePub |
Page 17
Middle of the page |
for the line
import re as regex
Python reports that regex is an alternative regular expression module, to replace re. In other words, re is now out of date.
See here for more details:
https://bitbucket.org/mrabarnett/mrab-regex
|
Russ Conte |
Jul 13, 2017 |
Printed, ePub |
Page 10
Just below the middle of the page |
The line in question is:
for tenure_bucket, salaries in salary_by_tenure_bucket.iteritems()
That generates an error message (Python 3.6.0, PyCharm 2016.3.3):
AttributeError: 'collections.defaultdict' object has no attribute 'iteritems'
A line that runs is:
for tenure_bucket, salaries in salary_by_tenure_bucket.items()
|
Russ Conte |
Jul 12, 2017 |
Printed, ePub |
Page 6
Middle of the page |
Both the printed and ePub version have three lines in the middle of the page that start:
print [friend["id"] for friend in users[0]["friends"]]
The other two print lines are analogous.
The print command is missing parentheses and does not run on my system (Python 3.6, up to date). Adding parentheses allows the lines to run correctly:
print([friend["id"] for friend in users[0]["friends"])
|
Russ Conte |
Jul 12, 2017 |
Printed, ePub |
Page 5
3rd set of code text |
The text in the ebook and printed book says:
sorted(num_friend_by_id, key=lambda...
but lambda is deprecated and does not work. The specific error message is:
"tuple unpacking is not supported in Python 3"
|
Russ Conte |
Jul 12, 2017 |
Printed, ePub |
Page 6
last line of code |
The line of code in the printed and pdf version reads:
print friends_of_friend_ids(users[3])
it should have an extra left parenthesis, as follows:
print (friends_of_friend_ids(users[3])
Note this is correct on the github page:
https://github.com/joelgrus/data-science-from-scratch/blob/master/code-python3/introduction.py
|
Russ Conte |
Jul 11, 2017 |
PDF |
Page 219
backpropagate function definition |
Dear Joel,
sorry to bother you but I have a question regarding the computation of the output_deltas.
It appears in the code that if the output = 1, the term
[ output * (1 - output) * (output - target) ] = 0 whatever is the target value.
So, I do not understand this part because the output could be 1 but not the correct value, which is expected to be equal to target value.
Is it something wrong in my brain or in the code ? :)
Thanks
Best regards
Jerome
|
Jerome_Massot |
Jul 05, 2017 |
Other Digital Version |
loc 2048
The Central Limit Theorem (chapter 6) |
In the Kindle book, bernoulli binomial is incorrectly defined as
def binomial(n,p):
return sum(bernoulli_trial(p) for _ in range(n))
.... while in the code repo, it is correct:
def binomial(p, n):
return sum(bernoulli_trial(p) for _ in range(n))
the erroneous transposition is confusing, as later an example is given: make_hist(0.75, 100, 10000) where make_hist(p,n,num_points)
|
Pablo Rodriguez Bertorello |
Jun 29, 2017 |
PDF |
Page 84
2nd paragraph (below first code block) |
Rejection range is incorrent.
"... rejects H0 when X is between 526 and 531 ..."
It should be
"... rejects H0 when X is larger than 526 ..."
|
Anonymous |
Mar 14, 2017 |
Printed |
Page 100
last code block before the "return min_theta" statement |
In the line of code:
# and take a gradient step for each of the data points
for x_i, y_i in in_random_order(data):
gradient_i = gradient_fn(x_i, y_i, theta)
theta = vector_subtract(theta, scalar_multiply(alpha, gradient_i))
I think you meant to take the gradient on only a subset of "data". Otherwise, by looping over the entire dataset you are taking a gradient step which includes all of the data.
|
Eder Izaguirre |
Mar 03, 2017 |
PDF |
Page 39
Second last line of code on the page. |
The code says:
"# label x-axis with movie names at bar centers
plt.xticks( [ i + 0.5 for i, _ in enumerate(movies) ], movies)
plt.show()"
The 0.5 in plt.xticks should be replaced with the value 0.1 so that the movie names are at the bar centers. Thus the code should be:
plt.xticks( [ i + 0.1 for i, _ in enumerate(movies) ], movies)
|
Gavan Corke |
Feb 23, 2017 |