The errata list is a list of errors and their corrections that were found after the product was released.
The following errata were submitted by our customers and have not yet been approved or disproved by the author or editor. They solely represent the opinion of the customer.
Version |
Location |
Description |
Submitted by |
Date Submitted |
Printed, PDF |
Page 45
1st paragraph |
When discussing R^2, the statement "a value of 0 corresponds to a constant model that just predicts the mean of the training set responses, y_train" is only true for reg.score(X_train, y_train). If you are doing reg.score(X_test, y_test) then the statement should have y_train replaced by y_test. In general the statement should just read as "a value of 0 corresponds to a constant model that just predicts the mean of the responses, y". Thanks.
|
RAMZI KUTTEH |
Jan 05, 2024 |
PDF |
Page 34
on the In[8] entry of jupyter notebook |
There is a call for boston dataset, but this has been removed from sklearn dataset, as you can look for in the documentation page. So I believe there should be a warning, use another dataset, or indicate how to use this boston dataset (documentation page indicates this but it would be useful for a self contained book).
|
Daniel Jimenez |
Jun 08, 2023 |
PDF |
Page 19
bottom of the page, on entry 24 of jupyter notebook |
In the code example, a line says:
grr = pd.scatter_matrix(iris_dataframe, c=y_train, figsize=(15, 15), marker='o',
hist_kwds={'bins': 20}, s=60, alpha=.8, cmap=mglearn.cm3)
but pd.scatter_matrix is deprecated, or at least i wasn't able to make it work as it is with python 3 (maybe this works for python 2). Instead I had to look it up and relace it with:
pd.plotting.scatter_matrix
and it run well.
|
Daniel Jimeenz |
Jun 08, 2023 |
Printed |
Page p. 141
top |
github.com/amueller/introduction_to_ml_with_python/blob/master/03-unsupervised-learning.ipynb got .97 and I got .97, but book got .63 (?)
|
Mike Sweeney |
Sep 12, 2022 |
Printed |
Page 41
bottom |
"On the other hand, when considering 10 neighbors, the model is too simple and performance is even worse".
The graph seems to indicate that 10 gives a better result than 1 (but optimal is about six, as stated).
|
Mike Sweeney |
Sep 01, 2022 |
PDF |
Page 152
In[24] |
Stackexchange suggests that updates to the KNeighborsClassifier on sklearn are invalidating the code. Using older versions though trigger other issues. Please revise!
In[24] code is:
from sklearn.neighbors import KNeighborsClassifier
# split the data in training and test set
X_train, X_test, y_train, y_test = train_test_split(
X_people, y_people, stratify=y_people, random_state=0)
# build a KNeighborsClassifier with using one neighbor:
knn = KNeighborsClassifier(n_neighbors=1)
knn.fit(X_train, y_train)
print("Test set score of 1-nn: {:.2f}".format(knn.score(X_test, y_test)))
Out[24] in the text is:
Test set score of 1-nn: 0.27
In github the results are:
Test set score of 1-nn: 0.23
In my Jupyter notebook its a disaster
ValueError Traceback (most recent call last)
<ipython-input-64-87c847658059> in <module>
1 from sklearn.neighbors import KNeighborsClassifier
2 # split the data in training and test set
----> 3 X_train, X_test, y_train, y_test = train_test_split(
4 X_people, y_people, stratify=y_people, random_state=0)
5 # build a KNeighborsClassifier with using one neighbor:
~\anaconda3\lib\site-packages\sklearn\model_selection\_split.py in train_test_split(test_size, train_size, random_state, shuffle, stratify, *arrays)
2173
2174 n_samples = _num_samples(arrays[0])
-> 2175 n_train, n_test = _validate_shuffle_split(n_samples, test_size, train_size,
2176 default_test_size=0.25)
2177
~\anaconda3\lib\site-packages\sklearn\model_selection\_split.py in _validate_shuffle_split(n_samples, test_size, train_size, default_test_size)
1855
1856 if n_train == 0:
-> 1857 raise ValueError(
1858 'With n_samples={}, test_size={} and train_size={}, the '
1859 'resulting train set will be empty. Adjust any of the '
ValueError: With n_samples=0, test_size=0.25 and train_size=None, the resulting train set will be empty. Adjust any of the aforementioned parameters.
|
Regis O'Connor |
Nov 19, 2021 |
PDF |
Page 105
out[86] and last paragraph |
The accuracy of the model in the text does not align with the Github results (or mine either). The conclusions drawn in the text therefore are in error.
Here is the text:
Accuracy on training set: 0.988
Accuracy on test set: 0.972
Here, increasing C allows us to improve the model significantly, resulting in 97.2%
accuracy
Here are the results form github:
Accuracy on training set: 1.000
Accuracy on test set: 0.958
This will be an awkward one to explain to my students!
|
Anonymous |
Nov 08, 2021 |
PDF |
Page 103
Out[81] and In[82] |
Out[81] in the text does not match the results posted on github
Here is the text:
Out[81]:
Accuracy on training set: 1.00
Accuracy on test set: 0.63
The model overfits quite substantially, with a perfect score on the training set and
only 63% accuracy on the test set
Here are the github results:
Accuracy on training set: 0.90
Accuracy on test set: 0.94
In [82] has a typo - the correction is noted in github but not the text
Text:
plt.boxplot(X_train, manage_xticks=False)
Correct code:
plt.boxplot(X_train, manage_ticks=False)
|
Anonymous |
Nov 08, 2021 |
PDF |
Page 91
bottom |
The results of the code do not match the text.
The code results are:
Accuracy of gbrt on training set 1.000
Accuracy of gbrt on test set 0.965
The results in the text are:
Accuracy on training set: 1.000
Accuracy on test set: 0.958
|
Anonymous |
Nov 07, 2021 |
|
7
Figure 7-6. Topic weights learned by LDA |
Chapter 7. Working with Text Data
Shouldn't the figure 7-6 match the output (first 2 rows of each topic) given by Out[48]?
When I run this on my Python those do match.
Best
André
|
Anonymous |
Jun 25, 2021 |