Errata

Machine Learning with Python Cookbook

Errata for Machine Learning with Python Cookbook

Submit your own errata for this product.

The errata list is a list of errors and their corrections that were found after the product was released. If the error was corrected in a later version or reprint the date of the correction will be displayed in the column titled "Date Corrected".

The following errata were submitted by our customers and approved as valid errors by the author or editor.

Color key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update

Version Location Description Submitted By Date submitted Date corrected
Printed
Page xi
2nd paragraph

It says "you can copy and paste the code and it'll run). But there is no location given for the source code.

Note from the Author or Editor:
Most of the code is available on ChrisAlbon.com, and readers have also made repos such as this one:
https://github.com/DustinAlandzes/machine-learning-with-python-cookbook-notes

Stephen Austin  Apr 21, 2018  Jul 02, 2020
Printed
Page xii
4th bullet

(1st Release)

'14.7 Selecting Random Features in Random Forests' should be '14.7 Selecting Important Features in Random Forests',

Haesun Park  Jul 29, 2019  Jul 02, 2020
Printed, PDF, ePub, Mobi, , Other Digital Version
Page 1
Preface

The preface uses a gendered pronoun, "he". Change to "they"

Chris Albon
 
Apr 02, 2018  Jul 02, 2020
Printed
Page 10
2nd paragraph, 1st sentence

reshape(-1, 1) should be reshape(1, -1). In the example, it is correctly used.

John Lee  Jun 29, 2019  Jul 02, 2020
Printed
Page 16
Last paragraph

(First Release)
"We can use Numpy's dot class to ..." should be "We can use Numpy's dot function to ...".

Haesun Park  Jul 13, 2019  Jul 02, 2020
Printed
Page 20
Code below Discussion

(First Release)

"# Generate three random integer between 1 and 10" should be "# Generate three random integer between 0 and 10"

Haesun Park  Jul 13, 2019  Jul 02, 2020
Printed
Page 28
First solution on the page, line with "url = ..."

The URL 'https://tinyurl.com/simulated_data' must be 'https://tinyurl.com/simulated-data'
(the underscore has to be changed into a minus-sign)

Frank Langenau  Sep 12, 2018  Jul 02, 2020
Printed
Page 29
4th line

The URL 'https://tinyurl.com/simulated_excel' must be 'https://tinyurl.com/simulated-excel'
(The underscore has to be replaced by a minus-sign.)

Frank Langenau  Sep 12, 2018  Jul 02, 2020
Printed
Page 29
line -5

The URL 'https://tinyurl.com/simulated_json' must be 'https://tinyurl.com/simulated-json'
(The underscore has to be replaced by a minus-sign.)

Frank Langenau  Sep 12, 2018  Jul 02, 2020
Printed
Page 30
Above Discussion

(First Release)

The ouputs should be changed to table-like format as other recipes in this chapter.

Haesun Park  Jul 13, 2019  Jul 02, 2020
Printed
Page 37
line with "url" in Solution

The URL 'https://tinyurl.com//titanic-csv' must be
'https://tinyurl.com/titanic-csv'. (Only 1 slash before "titanic-csv")

Frank Langenau  Sep 13, 2018  Jul 02, 2020
Printed
Page 37
Last code block

(First Release)

"# Select three rows" should be "# Select four rows"

Haesun Park  Jul 13, 2019  Jul 02, 2020
PDF
Page 44
3rd codeblock

In section "2.4 Loading an Excel File", the 3rd codeblock in "Solution" tab, it is as follows

# Load data
dataframe = pd.read_excel(url, sheetname=0, header=1)

Whereas it should be :-

# Load data
dataframe = pd.read_excel(url, sheet_name=0, header=1)

The underscore(_) in the argument "sheet_name" is missing.

Note from the Author or Editor:
"pd.read_excel(url, sheetname=0, header=1)" should be "pd.read_excel(url, sheet_name=0, header=1)

Shawn Barar  Mar 06, 2020  Jul 02, 2020
Printed
Page 59
paragraph for Outer

"return all rows in both employee_id and dataframe_sales" should be "return all rows in both dataframe_employee and dataframe_sales"

Haesun Park  Jul 13, 2019  Jul 02, 2020
Printed
Page 62
1st line after the formula

In the 1st line after the formula "where x is the feature vector, x'i ist an individual element of feature x, and x'i ist the rescaled element." the first x'i must be only xi (i in all of the x'i and xi formatted as subscript).

Frank Langenau  Sep 17, 2018  Jul 02, 2020
Printed
Page 67
code block before "Discussion"

The code

interaction = PolynomialFeatures(degree=2,
interaction_only=True, include_bias=False)
interaction.fit_transform(features)

yields an error "unexpected indent" because the last line must not be indended.

The code must be as follows:

interaction = PolynomialFeatures(degree=2,
interaction_only=True, include_bias=False)

interaction.fit_transform(features)

Frank Langenau  Sep 18, 2018  Jul 02, 2020
Printed
Page 78
Code below Solution

"KNN(k=5, verbose=0).complete(standardized_features)" should be "KNN(k=5, verbose=0).fit_transform(standardized_features)"

Haesun Park  Jul 13, 2019  Jul 02, 2020
Printed
Page 82
Last sentence

(First Release)

"classes_ method" should be "classes_ attribute"

Haesun Park  Jul 13, 2019  Jul 02, 2020
Printed
Page 90
Paragraph below Discussion

(First Release)

"the median class of the k nearest observations" should be "the most frequent class of the k nearest observations"

Haesun Park  Jul 13, 2019  Jul 02, 2020
Printed
Page 104
Last paragraph

(First Release)

"We can use the vocabulary_ method" should be "We can use the get_feature_names method"

Haesun Park  Jul 13, 2019  Jul 02, 2020
Printed
Page 142
2nd paragraph

(First Release)

You say "check out the external resources at the end of this solution", but there is no external resources in this recipe.

I suggest one, bit.ly/2wgbPIS

Haesun Park  Jul 13, 2019  Jul 02, 2020
Printed
Page 162
1st paragraph

(First Release)

"define the number of parameters" should be "define the number of components".

Haesun Park  Jul 13, 2019  Jul 02, 2020
Printed
Page 166
2nd paragraph

(First Release)

"V is our d x _n feature matrix(...), W is a d x r, and H is an r x n matrix" should be "V is our n x d feature matrix(...), W is a n x r, and H is an r x d matrix".

Haesun Park  Jul 13, 2019  Jul 02, 2020
Printed
Page 170
Last equation

(First Release)

"operatorname" before Var(x) should be removed.

Haesun Park  Jul 13, 2019  Jul 02, 2020
Printed
Page 180
Code above Discussion

(First Release)

"# Cross-validation technique" should be "# Performance metric".
"# Use all CPU scores" should be "# Use all CPU cores".

Haesun Park  Jul 13, 2019  Jul 02, 2020
Printed
Page 182
Code in middle of page

(First Release)

"# Cross-validation technique" should be "# Performance metric".
"# Use all CPU scores" should be "# Use all CPU cores".

Haesun Park  Jul 13, 2019  Jul 02, 2020
Printed
Page 192
3rd paragraph

(First Release)

"the overall equality of a model" should be "the overall quality of a model".

Haesun Park  Jul 13, 2019  Jul 02, 2020
Printed
Page 217
Code in middle of page

(First Release)

"# View best model" should be "# View best n_components".

Haesun Park  Jul 13, 2019  Jul 02, 2020
Printed
Page 218
Last paragraph

(First Release)

"we run the same GridSearch" should be "we run the same GridSearchCV".

Haesun Park  Jul 13, 2019  Jul 02, 2020
Printed
Page 226
1st paragraph

(First Release)

"The effects of sugar and stir" should be "The effects of sugar and stirred".

Haesun Park  Jul 13, 2019  Jul 02, 2020
Printed
Page 228
2nd equation and last paragraph

(First Release)

"\hat{\beta_d}x_i^d" should be "\hat{\beta_d}x_1^d" in the 2nd eq.

"x_0" shoule be "x[0]" in the last paragraph, because subscript is used for features in the recipe.

Haesun Park  Jul 13, 2019  Jul 02, 2020
Printed
Page 234
2nd paragraph

(First Release)

"create splits to increase impurity" should be "create splits to decrease impurity".

Haesun Park  Jul 13, 2019  Jul 02, 2020
Printed
Page 235
Code in Solution and equations in Discussion

(First Release)

"# Create decision tree classifier object" should be "# Create decision tree regressor object" in the code of Solution.

"\hat{y}_i" should be "\bar{y}" in MSE eq.

"\hat{y}_i" should be "\bar{y}" and "predicted value" should be "mean value" in the last paragraph.

Haesun Park  Jul 13, 2019  Jul 02, 2020
Printed
Page 236
Code above See Also

(First Release)

"# Create dicision tree classifier object using entropy" should be "# Create dicision tree regressor object using mae".

Haesun Park  Jul 13, 2019  Jul 02, 2020
Printed
Page 240
Code in Solution

(First Release)

"# Create random forest classifier object" should be "# Create random forest regressor object".

Haesun Park  Jul 13, 2019  Jul 02, 2020
Printed
Page 241
1st bullet

(First Release)

"Defaults to \sqrt{p} features" should be "Defaults to p features". \sqrt{p} is the default of RandomForestClassifier.

Haesun Park  Jul 13, 2019  Jul 02, 2020
Printed
Page 248
No 3

(First Release)

"x_i correctly, w_i is increased" should be "x_i correctly, w_i is decreased".

"x_i incorrectly, w_i is decreased" should be "x_i correctly, w_i is increased".

Haesun Park  Jul 13, 2019  Jul 02, 2020
Printed
Page 262
2nd paragraph and code in Solution

(First Release)

"MNL" should be "MLR" in 2nd paragraph.

"# Create decision tree classifier object" should be "# Create logistic regression object".

Haesun Park  Jul 13, 2019  Jul 02, 2020
Printed
Page 265
Above Discussion

(First Release)

"# Create decision tree classifier object" should be "# Create logistic regression object".

Haesun Park  Jul 13, 2019  Jul 02, 2020
Printed
Page 271
Polynomial kernel equation

This is same errata that I submitted incorrectly.

(First Release)

K(x_i, x_{i'}) = (1 + \sum_{j=1}^p x_{ij} x_{i' j} )^2

should be

K(x_i, x_{i'}) = (r + \gamma \sum_{j=1}^p x_{ij} x_{i' j} )^d

Haesun Park  Aug 15, 2019  Jul 02, 2020
Printed
Page 289
3rd paragraph

(First Release)

"i.e. 1, 2, and 3" should be "i.e. 0, 1, and 2".

Haesun Park  Jul 13, 2019  Jul 02, 2020
Printed
Page 293
1st sentence

(First Release)

"# Create meanshift object" should be "# Create DBSCAN object".

Haesun Park  Jul 13, 2019  Jul 02, 2020
Printed
Page 294
Code in Solution

(First Release)

"# Create meanshift object" should be "# Create agglomerative clustering object".

Haesun Park  Jul 13, 2019  Jul 02, 2020
Printed
Page 297
3rd line

(First Release)

(3rd line in p297 and last line in p300)

"typically 1" should be "typically 0".

Keras initialize bias as 0 by default.

Haesun Park  Jul 18, 2019  Jul 02, 2020
Printed, PDF
Page 299
Line -3

In the code the 3rd line

print('"Standard deviation:", ...

must be

print("Standard deviation:", ...

(The single quote before "Standard ... is too much.)

Frank Langenau  Dec 02, 2018  Jul 02, 2020
Printed
Page 304
Under Discussion

(First Release)

"5,000 binary features" should be "1,000 binary features".

Haesun Park  Jul 13, 2019  Jul 02, 2020
Printed
Page 313
1st paragraph

(First Release)

"error on both the training set and test set will tend to increase." should be "error on both the training set and test set will tend to decrease."

"the training loss continues to increase" should be "the training loss continues to decrease".

Haesun Park  Jul 13, 2019  Jul 02, 2020
Printed
Page 338
Last code block

(First Release)

"joblib.__version__" should be "sklearn.__version__".

Haesun Park  Jul 13, 2019  Jul 02, 2020
ePub
Page 597
Recipe 15.3 Identifying the Best Neighborhood Size

There is no point in standardizing the feature (
features_standardized = standardizer.fit_transform(features)
) explicitly when it will be done in the Pipeline (
pipe = Pipeline([("standardizer", standardizer), ("knn", knn)])
)
Note: The result will remain same (whether we standardize the feature twice or once) due to obvious reasons.

Note from the Author or Editor:
This is true. "features_standardized = standardizer.fit_transform(features)" can be removed.

Anonymous  Dec 28, 2018  Jul 02, 2020