The errata list is a list of errors and their corrections that were found after the product was released. If the error was corrected in a later version or reprint the date of the correction will be displayed in the column titled "Date Corrected".
The following errata were submitted by our customers and approved as valid errors by the author or editor.
| Version |
Location |
Description |
Submitted By |
Date submitted |
Date corrected |
| Printed, PDF |
Page chapt 6
bloc of code In [6] |
In chapter 6 (early online version of the book --> no page number...) , I think the regular expression is not correct in this bloc of code :
# Transform features from string to numeric
for i in ["term","int_rate","emp_length","revol_util"]:
data.loc[:,i] = \
data.loc[:,i].apply(lambda x: re.sub("[^0-9]", "", str(x)))
data.loc[:,i] = pd.to_numeric(data.loc[:,i])
If I'm not wrong (I checked this with R, not Python), with this approach, you are removing the decimal separators (points).
The consequence is minor for the variable "int_rate" because there is always 2 numbers after the points (the % are just multiplicated by 100).
But for "revol_util" you have for example 19% that will become 19 while 2.10% will become 210.
The point should be included in the brackets in the regex. With R syntax this would be (sorry I'm not certain of the Python syntax) :
txt2num <- c("term","int_rate","emp_length","revol_util")
for (i in txt2num) {
d[,i] <- as.numeric(gsub("[^0-9\\.]", "", d[,i]))
}
Note also that with this approach, in the variable " emp_length" the category "< 1 year" and "1 year" will both be transformed into the numeric value "1". Maybe it would be more appropriate to transform "< 1 year" into "0.5" to keep these value separated.
The consequence of this is probably minor for the demonstration of the technique in this chapter.
Thanks for the interesting book by the way...
Note from the Author or Editor: You are right. The percentages are treated inconsistently for "int_rate" and "revol_util". But, all the features are transformed into numeric and then scaled, so the features (after performing scaling) are the exact same. We will keep the code as is for now, but we will note this for any substantial future revisions we make to this notebook. Thanks!
|
Gilles San Martin |
Feb 22, 2019 |
Mar 06, 2020 |
| Printed |
Page 28
2nd line of code |
$ git lfs pull
instruction doe not exist
"(base) ~ >git lfs pull
git: 'lfs' is not a git command. See 'git --help'.
The most similar command is
log
Note from the Author or Editor: Please add a "$pip install git-lfs" command before "$git lfs install".
The new code on page 28 should read:
$ git clone https://github.com/aapatel09/handson-unsupervised-learning.git
$ pip install git-lfs
$ git lfs install
$ git lfs pull
|
Ernesto Belmont |
Mar 05, 2021 |
May 21, 2021 |
| Printed |
Page 28
5th line of code |
sais
$ activate unsupervisedLearning
should say
$ conda activate unsupervisedLearning
Note from the Author or Editor: current:
$ activate unsupervisedLearning
should be:
$ conda activate unsupervisedLearning
[on page 28.]
|
Ernesto Belmont |
Mar 05, 2021 |
May 21, 2021 |
| Printed |
Page 30
before Overview of data |
In my Mac with 11.2.2 , libomp is missing so it is needed to install
> brew install libomp
Note from the Author or Editor: In the "Interactive Computing Environment: Jupyter Notebook" section on page 30, please add a Note box that says the following:
"On Mac, you may need to install libomp before running $jupyter notebook. Use the following command to install libomp: $brew install libomp"
|
Ernesto Belmont |
Mar 05, 2021 |
May 21, 2021 |
| Printed |
Page 37
8th code line |
Using python 8.5 on Mac with MacOS 11.2.2.
I can't generate the data plot of page 38
I got the error
Traceback (most recent call last):
File "proyecto0.py", line 88, in <module>
ax = sns.barplot(x="count_classes.index", y="tuple(count_classes/len(data))")
--------------------------------
raise ValueError(err)
ValueError: Could not interpret input 'count_classes.index'
Note from the Author or Editor: Please replace the code block on page 37 with the following (note the change in the second line of code):
count_classes = pd.value_counts(data['Class'],sort=True).sort_index()
ax = sns.barplot(x=count_classes.index, y=[tuple(count_classes/len(data))[0],tuple(count_classes/len(data))[1]])
ax.set_title('Frequency Percentage by Class')
ax.set_xlabel('Class')
ax.set_ylabel('Frequency Percentage')
|
Ernesto Belmont |
Mar 05, 2021 |
May 21, 2021 |
| Printed |
Page 39
5th paragraph |
on last line
pij
shuld be
p normal sub i,j
Note from the Author or Editor: Please change subscript "pi, j" in the last line of the 5th paragraph on page 39 to normal "P" subscript "i,j".
|
Ernesto Belmont |
Mar 05, 2021 |
May 21, 2021 |
| Printed |
Page 44
2nd para |
"using confusion matrix would be useful" probably want to mean
"using confusion matrix would not be useful"
Note from the Author or Editor: Given that our credit card transactions dataset is highly imbalanced, using the confusion matrix would be meaningful.
NEEDS TO BE THE FOLLOWING:
Given that our credit card transactions dataset is highly imbalanced, using the confusion matrix would not be meaningful.
|
scott schmidt |
Apr 14, 2019 |
May 03, 2019 |
| PDF |
Page 44
Paragraph "Precision-Recall Curve |
The PDF sais: Precision = TP / (TP + FN)
But this is wrong.
Right is:
Precision = TP / (TP + FP)
Note from the Author or Editor: Yes, confirmed. I just fixed the language in the book.
|
Philip May |
Aug 16, 2019 |
Mar 06, 2020 |
| Printed |
Page 45
recall equation |
denominator (true positive / false positive)
should be
(true positive / false negative)
Note from the Author or Editor: Recall = True Positives / (True Positives + False Positives)
SHOULD BE:
Recall = True Positives / (True Positives + False Negatives)
|
scott schmidt |
Apr 14, 2019 |
May 03, 2019 |
| PDF |
Page 45
Paragraph Precision-Recall Curve |
The PDF sais:
Recall = TP / (TP+FP)
That is wrong.
Right is: Recall = TP / (TP+FN)
Note from the Author or Editor: I corrected this in the book. Thanks.
|
Philip May |
Aug 16, 2019 |
Mar 06, 2020 |
| Printed |
Page 51
Figure 2-6 description text |
Figure 2-6 description at top of page has an unnecessary quotation mark:
‘Figure 2-6. Precision-recall curve of random fores”ts’
Note from the Author or Editor: Yes, please correct.
|
Tim Hutchinson |
May 30, 2020 |
May 21, 2021 |
| Printed, PDF |
Page 59
caption for figure 2-15 |
The caption for figure 2-15 is: "Test set auROC curve of logistic regression"
Should be replaced by: "Test set auROC curve of random forests"
Note from the Author or Editor: I made the correction.
|
Frank Langenau |
Dec 29, 2019 |
Mar 06, 2020 |
| Printed, PDF |
Page 109
last line |
representaion => representation
(missing "t")
Note from the Author or Editor: Made the correction.
|
Frank Langenau |
Dec 29, 2019 |
Mar 06, 2020 |
| Printed, PDF, ePub, Mobi, , Other Digital Version |
Page 143
Last paragraph |
There is a typo.
In the beginning of the explanation about DBSCAN, we have "within a certian distance". This should be "within a certain distance".
|
 Ankur A. Patel |
Jul 08, 2021 |
Dec 10, 2021 |
| Printed, PDF |
Page 207
2nd line of code |
Coefifcient => Coefficient
Note from the Author or Editor: I made the correction.
|
Frank Langenau |
Dec 29, 2019 |
Mar 06, 2020 |
| PDF |
Page 240
3rd para in "Matrix Factorization" |
The underscores in the expression R = H__W should be replaced by a dot in the middle of the line because it obviously should denote the matrix multiplication of the two matrices.
Note from the Author or Editor: Correct, I made the correction.
|
Frank Langenau |
Dec 28, 2019 |
Mar 06, 2020 |
| Printed, PDF |
Page 244
6th para from top |
"This W_h0+vb... " should be "This W*h0+vb..."
Note from the Author or Editor: Correct. I made the correction.
|
Frank Langenau |
Dec 29, 2019 |
Mar 06, 2020 |
| Printed, PDF |
Page 244
End of 7th para from top |
The last part of the last sentence in the 7th para is "RBMs are minimizing the probability distribution of the original input form the probability distribution of the reconstructed data."
I think, there is something missing after "minimizing". Should be: "...RBMs are minimizing the divergence between the probability ..." or so.
Note from the Author or Editor: I made the correction.
|
Frank Langenau |
Dec 29, 2019 |
Mar 06, 2020 |
| Printed, PDF |
Page 295
Last paragraph of the page |
"The initial loss of the discriminator fluctuates ..." should be "The initial accuracy of the discriminator fluctuates ...", because the sentence ends with "... but remains considerably above 0.50." (The loss drops below 0.5.)
Note from the Author or Editor: I made the correction.
|
Frank Langenau |
Jan 03, 2020 |
Mar 06, 2020 |
| Printed |
Page 310
2nd paragraph |
"The distribution is shown in Figure 13-5." - except Figure 13-5 is something else. So the distribution of classes is not shown (the numbers are not printed either)
Note from the Author or Editor: Thanks. I removed the block of code from the book. The counts are shown in Figure 13-5, and I clarified the language to make this clear.
|
Joseph Schwarzbach |
Jun 19, 2019 |
Mar 06, 2020 |
| Printed, PDF |
Page 316
1st line |
The first line of this page is the same as the second to last line on page 315 and should be deleted.
Note from the Author or Editor: I made the correction.
|
Frank Langenau |
Jan 04, 2020 |
Mar 06, 2020 |
| Printed, PDF |
Page 316
First line after the first code block |
The line
"The adjusted Rand index on the training set ..." must be
"The adjusted Rand index on the test set..."
Note from the Author or Editor: I made the correction.
|
Frank Langenau |
Jan 04, 2020 |
Mar 06, 2020 |