Errata
The errata list is a list of errors and their corrections that were found after the product was released. If the error was corrected in a later version or reprint the date of the correction will be displayed in the column titled "Date Corrected".
The following errata were submitted by our customers and approved as valid errors by the author or editor.
Color key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update
Version | Location | Description | Submitted By | Date submitted | Date corrected |
---|---|---|---|---|---|
safari app (no pages available) Chapter 2, section on Lasso, just after In[39]: |
Decreasing alpha to 0.01, we obtain the solution shown as the green dots Note from the Author or Editor: |
Thierry Herrmann | Oct 08, 2016 | Jan 13, 2017 | |
safari app (no pages available) Chapter 2, section on Naive Bayes Classifiers, subsection Strengths, weaknesses, and parameters, 2nd paragraph |
... performs better than BinaryNB Note from the Author or Editor: |
Thierry Herrmann | Oct 08, 2016 | Jan 13, 2017 | |
safari app (no pages available) in all notebook cells from In[93] up to In[97] and in the text just above In[94] |
for scikit-learn 0.18 as mentioned in chapter 1, Note from the Author or Editor: |
Thierry Herrmann | Oct 10, 2016 | Jan 13, 2017 | |
safari app (no pages available) Chapter 3, section "Applying PCA to the cancer dataset for visualization", just below the graph after In[17]: |
"We can also see that the malignant (red) points are more spread out than the benign (blue) points" Note from the Author or Editor: |
Thierry Herrmann | Oct 13, 2016 | Jan 13, 2017 | |
safari app (no pages available) Chapter 3, section "Eigenfaces for feature extraction", just below Out[28]: |
"The input space here is 50×37-pixel grayscale images, so directions within this space are also 50×37-pixel grayscale images" Note from the Author or Editor: |
Thierry Herrmann | Oct 13, 2016 | Jan 13, 2017 | |
safari app (no pages available) Chapter 4, section on One-Hot-Encoding, code in cell In[2] |
data = pd.read_csv( Note from the Author or Editor: |
Thierry Herrmann | Oct 19, 2016 | Jan 13, 2017 | |
safari app (no pages available) Chapter 4, section on Univariate Nonlinear Transformations, text after Out[33] |
"The value 2 seems to be the most common, with 62 appearances ..." Note from the Author or Editor: |
Thierry Herrmann | Oct 19, 2016 | Jan 13, 2017 | |
safari app (no pages available) Chapter 4, section "Utilizing Expert Knowledge", text below figure 4.16 |
"The reason for this is that we encoded day of week and time of day using integers, which are interpreted as categorical variables" Note from the Author or Editor: |
Thierry Herrmann | Oct 19, 2016 | Jan 13, 2017 | |
safari app (no pages available) Chapter 5, section "Using Pipelines in Grid Searches", "Illustrating Information Leakage" |
Very minor typo: the text mentions: Note from the Author or Editor: |
Thierry Herrmann | Oct 23, 2016 | Jan 13, 2017 | |
safari app (no pages available) Chapter 7, section "Topic Modeling and Document Clustering", text just above In[41]: |
Text says: "We’ll remove words that appear in at least 20 percent of the documents, and we’ll limit the bag-of-words model to the 10,000 words that are most common after removing the top 20 percent" Note from the Author or Editor: |
Thierry Herrmann | Oct 25, 2016 | Jan 13, 2017 | |
Printed |
last paragraph |
(1st edition) |
Haesun Park | Feb 25, 2017 | Jun 09, 2017 |
Chapter 2 Predicting Probabilities |
"We’ve reproduced this in Figure 2-57, and we encourage youto go though the example there." |
Mirwaisse DJANBAZ | Oct 22, 2017 | Oct 19, 2018 | |
Chapter 3 APPLYING PCA TO THE CANCER DATASET FOR VISUALIZATION |
"Each plot overlays two histograms, one for all of the points in the benign class (blue) and one for all the points in the malignant class (red)." Note from the Author or Editor: |
Mirwaisse DJANBAZ | Oct 23, 2017 | Oct 19, 2018 | |
Mobi | Page vii last paragraph |
The link to "The Elements of Statistical Learning" under the text "the authors’ website." is incorrect. The correct link is https://web.stanford.edu/~hastie/pub.htm Note from the Author or Editor: |
Gabor Szabo | Nov 27, 2017 | Oct 19, 2018 |
ePub |
Below figure 2-27 |
„Following the branches to the right, we see that worst radius <= 16.795 creates a node that contains only 8 benign but 134 malignant samples“ Note from the Author or Editor: |
Mile Dragosavac | Dec 01, 2017 | Oct 19, 2018 |
ePub |
Below figure 2-29 |
„meaning we cannot say “a high value of X[0] means class 0, and a low value means class 1” (or vice versa).“ Note from the Author or Editor: |
Mile Dragosavac | Dec 01, 2017 | Oct 19, 2018 |
Printed, PDF, ePub, Mobi, , Other Digital Version | Page 11 bottom of page |
Earlier versions of the book were missing "from IPython import display" in the import statements in the note at the bottom of page 11 (top of page 12 in newer versions). |
Andreas C Müller |
Apr 25, 2017 | Jun 09, 2017 |
Page 12 1st paragraph under the figure |
The text says: Note from the Author or Editor: |
Joaquin Vanschoren | Feb 15, 2017 | Jun 09, 2017 | |
Page 13 under Knowing your data |
On Page 13 under Knowing your data, there are 4 questions that you are proposing to answer before modeling. I am able to follow all questions excluding one (given below). Note from the Author or Editor: |
Sreejith Nair | Jan 19, 2019 | ||
Page 14 fourth paragraph |
"is the foundation upon which machine learning is BUILD" should be |
A Aziz | Apr 27, 2017 | Jun 09, 2017 | |
Page 16 Jupyter Notebook |
In line 5 of 1st paragraph under the topic Jupyter Notebook: |
Manpreet Singh | Sep 29, 2016 | Sep 22, 2016 | |
Printed | Page 16 In[17] and Out[17] |
"First five columns" Note from the Author or Editor: |
HIDEMOTO NAKADA | Feb 10, 2017 | Jun 09, 2017 |
Printed | Page 20 Code Block |
If using newer version of Pandas (ie. 0.24.1) the scatter_matrix method is actually inside the package plotting and needs to be called like this: Note from the Author or Editor: |
Cristian Varela | Feb 09, 2019 | |
Page 34 4th paragraph |
The book references "91 possible combinations of two features within those 13" and further clarifies in the foot note to use "13 choose 2" . 13 choose 2 is 78, 14 choose 2 is 91. Note from the Author or Editor: |
Mike Hancock | Oct 18, 2016 | Jan 13, 2017 | |
Page 40 Paragraph below figure |
In "In other words, using few neighbors corresponds to high model com‐ |
Andreas Mueller | Jan 18, 2017 | Jun 09, 2017 | |
PDF, | Page 45 1st paragram of section "Linear models for regression" |
"For regression, the general prediction formula for a linear model looks as follows: Note from the Author or Editor: |
Anonymous | Nov 11, 2016 | Jan 13, 2017 |
Printed | Page 47 The first paragraph of "Linear regression (aka ordinary least squares)" |
"The mean squared error is the sum of the squared differences between the predictions and the true values." Note from the Author or Editor: |
HIDEMOTO NAKADA | Feb 11, 2017 | Jun 09, 2017 |
Printed | Page 48,53,54 |
1st edition, 1st release |
Haesun Park | Jul 17, 2017 | Oct 19, 2018 |
Printed | Page 49 footnote |
(1st edition) |
Haesun Park | Apr 28, 2017 | Jun 09, 2017 |
Page 52 paragraph starts with 'Here, alpha=0.1 ' |
"Here, alpha=0.1 seems to be working well. We could try decreasing alpha even more to improve generalization. " Note from the Author or Editor: |
Hidemoto Nakada | Jun 17, 2019 | ||
PDF, | Page 55 1st paragraph (beneath the plot) |
"Using alpha=0.00001, we get a model that is quite unregularized,..." |
Anonymous | Nov 17, 2016 | Jan 13, 2017 |
Printed | Page 58 3rd paragraph |
"Most of the points in class 0 are at the top, and most of the points in class 1 are ath the bottom" Note from the Author or Editor: |
Haesun Park | Dec 18, 2016 | Jan 13, 2017 |
Printed | Page 58 Above Fig. 2-16 |
(1st edition) Note from the Author or Editor: |
Haesun Park | Apr 28, 2017 | Jun 09, 2017 |
Printed | Page 59 2nd paragraph |
"Let's analyze LinearLogistic in more detail .." |
Haesun Park | Dec 18, 2016 | Jan 13, 2017 |
Printed, PDF | Page 60 Code in listing In[45]: |
Value of C in label should be 0.01 i.e. Note from the Author or Editor: |
Anonymous | Feb 05, 2019 | |
Printed | Page 63 Figure 2-17 |
the X-axis label should be 'Feature', instead of 'Coefficient index'. |
HIDEMOTO NAKADA | Mar 31, 2017 | Jun 09, 2017 |
Printed | Page 72 1st paragraph |
"Splitting the dataset vertically at x[1]=0.0596 |
Haesun Park | Dec 18, 2016 | Jan 13, 2017 |
ePub | Page 72 |
„The top node, also called the root, represents the whole dataset, consisting of 75 points belonging to class 0 and 75 points belonging to class 1“ Note from the Author or Editor: |
Mile Dragosavac | Dec 01, 2017 | Oct 19, 2018 |
Printed | Page 77 1st paragraph, 2nd paragraph |
1st paragraph: Note from the Author or Editor: |
Haesun Park | Dec 18, 2016 | Jan 13, 2017 |
Printed | Page 77 Above "Feature importance in trees" section |
(1st edition) Note from the Author or Editor: |
Haesun Park | Apr 28, 2017 | Jun 09, 2017 |
Printed | Page 78 2nd paragraph |
"However, if a feature has a low feature_importance,.." Note from the Author or Editor: |
Haesun Park | Dec 18, 2016 | Jan 13, 2017 |
ePub | Page 78.9 Near Figure 2-5 |
In explaining Figure 2-5, the authors switch from describing the new data points as stars to crosses. It is very confusing. I think the authors meant to say that the new data points are stars. The authors say that but then go on to mention crosses in the figure. Note from the Author or Editor: |
Anonymous | Dec 19, 2018 | |
Printed | Page 80 line 1 |
"a high value of X[0] means class 0, and a low value means class 1" Note from the Author or Editor: |
Hidemoto Nakada | Feb 01, 2017 | Jun 09, 2017 |
Page 83 bottom |
This line of code is throwing a warning: Note from the Author or Editor: |
Anonymous | Nov 06, 2021 | ||
Page 87 Middle paragraph |
"The trees that are built as part of the random forest are stored in the estimator_ attribute." Note from the Author or Editor: |
Anonymous | Jul 14, 2021 | ||
Printed | Page 88 5st paragraph |
"max_features=sqrt(n_features) for classification and max_features=log2(n_features) for regression" Note from the Author or Editor: |
Haesun Park | Dec 18, 2016 | Jan 13, 2017 |
Printed | Page 92 4th paragraph |
"You can find the details in Chapter 1 of Hastie, Tibshirani, and Friedman's The Elements of Statistical Learning" Note from the Author or Editor: |
Haesun Park | Dec 18, 2016 | Jan 13, 2017 |
Printed | Page 94 In[78] |
first comment: |
Haesun Park | Dec 18, 2016 | Jan 13, 2017 |
Printed | Page 95 last sentence |
ax.set_zlabel("feature0 ** 2") Note from the Author or Editor: |
Haesun Park | Dec 18, 2016 | Jan 13, 2017 |
Printed | Page 95 last line |
The last line of In[79] should be: |
Jess D | Dec 30, 2016 | Jan 13, 2017 |
Printed | Page 98 equation in the middle |
k_rbf(x_1, x_2) = exp(\gamma||x_1 - x_2||^2) Note from the Author or Editor: |
Haesun Park | Dec 18, 2016 | Jan 13, 2017 |
Printed | Page 100 2nd paragraph |
"Increasing `C`, as shown on the bottom right, allows these points to have a stronger influence on the model and makes the decision boundary bend to correctly classify them." Note from the Author or Editor: |
Haesun Park | Dec 18, 2016 | Jan 13, 2017 |
Printed | Page 102 In Preprocessing for data for SVMs |
The book disagrees with the sklearn website on how to scale for SVMs. It should be explained more clearly that the right choice of scaling depends on data and model, and StandardScaler would also be a valid approach. |
Andreas C Müller |
Oct 30, 2017 | Oct 19, 2018 |
Printed | Page 107 equation in the middle |
h[0] = tanh(w[0,0]*x[0] + w[1,0]*x[1] + w[2,0]*x[2] + w[3,0]*x[3]) |
Haesun Park | Dec 18, 2016 | Jan 13, 2017 |
Page 107 Formulas in middle |
I think this paragraph: Note from the Author or Editor: |
Abraham Louw | Apr 17, 2019 | ||
Printed | Page 110 1st paragraph |
"If we want a smoother decision boundary, we could add more hidden units (as in |
Haesun Park | Dec 18, 2016 | Jan 13, 2017 |
Printed | Page 118 3rd paragraph |
"you are learning 100 * 1,000 = 100,000 weights from the input to the hidden layer and 1,000 x 1 weights from the hidden layer to the output layer" Note from the Author or Editor: |
Haesun Park | Dec 18, 2016 | Jan 13, 2017 |
Printed | Page 118 3nd paragraph |
introspect is used instead of inspect Note from the Author or Editor: |
Gabriela Hempfling | May 06, 2018 | Oct 19, 2018 |
Printed | Page 119 In[105] |
from sklearn.datasets import make_blobs, make_circles |
Haesun Park | Dec 18, 2016 | Jan 13, 2017 |
Page 142 Applying PCA to the cancer dataset for visualization,1st paragraph |
"This dataset has 30 features, which would result in 30 * 14 = 420 scatter plots!" Note from the Author or Editor: |
Anonymous | Aug 25, 2017 | Oct 19, 2018 | |
Printed | Page 147 1st paragraph |
In end of first sentence, "(it's negative," |
Haesun Park | Jan 04, 2017 | Jan 13, 2017 |
Printed | Page 151 Out[27] |
X_train_pca.shape: (1537, 100) |
Haesun Park | Jan 04, 2017 | Jan 13, 2017 |
Printed | Page 154 Last sentence of the 1st paragraph |
Following sentence: |
Haris Memic | Dec 10, 2016 | Jan 13, 2017 |
Printed | Page 163 line 1 |
"The figure includes 3 of the 100 measurements from X for reference.’’ Note from the Author or Editor: |
HIDEMOTO NAKADA | Feb 01, 2017 | Jun 09, 2017 |
Printed | Page 164 last sentence |
'handwritten digit between 0 and 1.' |
Ricky Park | Jan 05, 2017 | Jan 13, 2017 |
Printed | Page 165 first paragraph |
and color each dot by its class Note from the Author or Editor: |
HIDEMOTO NAKADA | Feb 01, 2017 | Jun 09, 2017 |
Printed | Page 169 in the code/ in the graph |
plt.xlabel("t-SNE feature 0") Note from the Author or Editor: |
Anonymous | May 06, 2018 | Oct 19, 2018 |
Printed | Page 189 last paragraph |
In 2nd line, |
Ricky Park | Jan 05, 2017 | Jan 13, 2017 |
Printed | Page 191 2nd paragraph |
last sentence, Note from the Author or Editor: |
Ricky Park | Jan 05, 2017 | Jan 13, 2017 |
Printed | Page 202 Last input box of page |
the different one-hot-coding of categorical and integer features should be shown. the preface on the page implies a mapping of one cat to one and only one integer equivalent. Note from the Author or Editor: |
Anonymous | Oct 02, 2018 | Oct 19, 2018 |
Printed | Page 204 1st paragraph |
In this paragraph, the author was talking about agglomerative clustering. But in the last sentence of this paragraph, the author wrote "This is not surprising, given the results of DBSCAN, which tried to cluster all points together. Note from the Author or Editor: |
Jun-Lin Lin | Nov 07, 2017 | Oct 19, 2018 |
Printed | Page 209 Table 3-1 |
estimator.predict(X_text) |
Ricky Park | Jan 05, 2017 | Jan 13, 2017 |
Page 211 final paragraph |
"In Table 3-1, X_train and y_train refer to the training data and training labels, while X_test and y_test refer to the test data and test labels (if applicable)." Note from the Author or Editor: |
Anonymous | May 14, 2017 | Jun 09, 2017 | |
Page 212 In[35] |
Chinese version Note from the Author or Editor: |
Alice | Jul 28, 2021 | ||
Printed | Page 221 below Out[12] |
"... with feature values -3 to -2.6, ... with feature values from -2.68 to -2.37, and so on." |
Haesun Park | Jan 19, 2017 | Jun 09, 2017 |
Other Digital Version | 221 the beginning of Section 3.5.2 |
The following three choices are implemented in scikit-learn |
Hanmin Qin | Dec 17, 2018 | |
Printed | Page 228 below Out[24] |
In last sentence, |
Haesun Park | Jan 19, 2017 | Jun 09, 2017 |
Page 234 Input cell 37 and Figure 4-8 |
Log transformation is applied twice to Poisson data (In[36] and In[37] on page 234 of the PDF version) resulting in the wrong histogram in figure 4-8 (on page 235 of the PDF version). Note from the Author or Editor: |
Adel Rahmani | Nov 10, 2016 | Jan 13, 2017 | |
Printed | Page 245 In[55] and below Figure 4-13 |
in In[55] code, "plt.figure()" can be removed |
Haesun Park | Jan 19, 2017 | Jun 09, 2017 |
Other Digital Version | 248 In[59] |
>>> plt.plot(citibike, linewidth=1) Note from the Author or Editor: |
teamclouday | Jan 06, 2019 | |
Printed | Page 249 In[64] |
plt.xlabel("Feature magnitude") |
Haesun Park | Jan 19, 2017 | Jun 09, 2017 |
Printed | Page 254 first paragraph line 8. |
However, when using cross-validation, each example will be in the training set exactly once: Note from the Author or Editor: |
HIDEMOTO NAKADA | Feb 01, 2017 | Jun 09, 2017 |
Printed | Page 262 comment in the list for In[20] |
# evaluate the SVC on the test set Note from the Author or Editor: |
HIDEMOTO NAKADA | Feb 01, 2017 | Jun 09, 2017 |
Printed | Page 263 last paragraph |
(1st edition) |
Haesun Park | Feb 25, 2017 | Jun 09, 2017 |
Printed | Page 266 the last line |
The parameters that were found are scored in the .. Note from the Author or Editor: |
HIDEMOTO NAKADA | Feb 01, 2017 | Jun 09, 2017 |
Printed | Page 273 last paragraph |
(1st edition) Note from the Author or Editor: |
Haesun Park | Feb 25, 2017 | Jun 09, 2017 |
Page 273 last paragraph |
"not that the entry" -> "note that the entry" |
Anonymous | May 14, 2017 | Jun 09, 2017 | |
Printed | Page 287 paragraph below Out[51]: |
For class 1, we get a fairly small recall, and precision is mixed. Note from the Author or Editor: |
HIDEMOTO NAKADA | Feb 01, 2017 | Jun 09, 2017 |
Page 288 first paragraph |
0.13 (vs. 0.89 for the logistic regression) on the "nine" class, for the "not nine” class it is 0.90 vs. 0.99, Note from the Author or Editor: |
Anonymous | May 23, 2017 | Jun 09, 2017 | |
Page 290 paragraph next to lobster |
The paragraph next to the lobster warns us not to use test sets to set decision thresholds. Ironically however, that is what the preceding few pages did. Perhaps a sentence should be added to say that that was simply for ease of demonstration. Or the preceding pages could be reworked to use the training data instead. Note from the Author or Editor: |
Anonymous | May 23, 2017 | Jun 09, 2017 | |
Printed | Page 292 the first paragraph, line 7 |
Because we need to compute the ROC curve.. |
HIDEMOTO NAKADA | Feb 01, 2017 | Jun 09, 2017 |
Printed | Page 295 first paragraph |
Recall that because average pre‐ cision is the area under a curve that goes from 0 to 1, average precision always returns a value between 0 (worst) and 1 (best). Note from the Author or Editor: |
HIDEMOTO NAKADA | Feb 02, 2017 | Jun 09, 2017 |
Printed | Page 301 python code and 1st paragraph |
In the python code "In[69]", the author used a GridSearchCV object with parameter scoring = 'roc_auc'. Consequently, at the last line of the code, the value returned by the "score" method of this GridSearchCV object should be "roc_auc", not not "accuracy". Note from the Author or Editor: |
Jun-Lin Lin | Nov 07, 2017 | Oct 19, 2018 |
Printed | Page 306 In[3] |
(1st edition) |
Haesun Park | Feb 25, 2017 | Jun 09, 2017 |
Printed | Page 312 line -2 (in In[16]) |
# fit the last step Note from the Author or Editor: |
HIDEMOTO NAKADA | Feb 02, 2017 | Jun 09, 2017 |
Page 312 last paragraph |
The text says "select the most informative of the 10 features' Note from the Author or Editor: |
Joaquin Vanschoren | Mar 05, 2017 | Jun 09, 2017 | |
Printed | Page 313 Figure 6-3 |
In Figure 6-3, last step : below pipe.predict(X') : Note from the Author or Editor: |
Aryo Zare | Aug 18, 2019 | |
Page 319 Figure 6-3 |
In figure 6-3, there seems to be an inconsistency between the first step in the diagram and the call to pipe.predict(). Note from the Author or Editor: |
joseph guirguis | Sep 16, 2021 | ||
Printed | Page 325 3rd paragraph |
(1st edition) |
Haesun Park | Mar 12, 2017 | Jun 09, 2017 |
Page 329 In[4] |
The book suggests replacing HTML line breaks with spaces, but the data (including in the book) doesn't seem to actually contain these. Note from the Author or Editor: |
Anonymous | May 21, 2017 | Jun 09, 2017 | |
Page 332 2nd paragraph |
"LogisticRegresssion" -> "LogisticRegression" (2 s instead of 3 ) Note from the Author or Editor: |
Anonymous | Oct 06, 2016 | Jan 13, 2017 | |
Printed | Page 336 In the tfidf equation |
(1st edition) Note from the Author or Editor: |
Haesun Park | Apr 28, 2017 | Jun 09, 2017 |
Page 336 note at bottom of page |
The last sentence in your note on page 336 is unclear: Note from the Author or Editor: |
Stephen Dewey | May 17, 2017 | Jun 09, 2017 | |
Printed | Page 336 2 nd paragraph |
tfidf(w,d) = tf log( (N+1)/(NW+1)) +1 # As printed in book Note from the Author or Editor: |
Chandra Shekhar Singh | Nov 28, 2018 | |
Printed | Page 337 In[23] |
(1st edition) Note from the Author or Editor: |
Haesun Park | Mar 01, 2017 | Jun 09, 2017 |
Page 338 final paragraph |
"Both classes also apply L2 normalization after computing the tf–idf representation; in other words, they rescale the representation of each document to have Euclidean norm 1." Note from the Author or Editor: |
Stephen | May 17, 2017 | Jun 09, 2017 | |
Page 339 middle of page |
The page reads, "As you can see, there is some improvement when using tf–idf instead of just word counts." However 0.89 is the same as the score we were getting on pages 334-336. Note from the Author or Editor: |
Stephen Dewey | May 17, 2017 | Jun 09, 2017 | |
Printed | Page 345 First paragraph |
The paragraph describing the topics does not correspond to the topics in the figure. Clearly topic 70 is the most important in the figure. |
Andreas C Müller |
Jan 12, 2017 | Jun 09, 2017 |
Printed | Page 352 in In[49] |
# pshow first two sentences |
HIDEMOTO NAKADA | Feb 02, 2017 | Jun 09, 2017 |
Printed | Page 355 last sentence |
(1st edition) |
Haesun Park | Mar 01, 2017 | Jun 09, 2017 |
Printed | Page 358 line -4 |
that might already increase response time or reduce cost. |
HIDEMOTO NAKADA | Feb 02, 2017 | Jun 09, 2017 |
Printed | Page 362 2nd paragraph |
(1st edition) |
Haesun Park | Mar 12, 2017 | Jun 09, 2017 |