Errata

Errata for Hands-On Unsupervised Learning Using Python

Submit your own errata for this product.

The errata list is a list of errors and their corrections that were found after the product was released. If the error was corrected in a later version or reprint the date of the correction will be displayed in the column titled "Date Corrected".

The following errata were submitted by our customers and approved as valid errors by the author or editor.

Color key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update

Version	Location	Description	Submitted By	Date submitted	Date corrected
Printed, PDF	Page chapt 6 bloc of code In [6]	In chapter 6 (early online version of the book --> no page number...) , I think the regular expression is not correct in this bloc of code : # Transform features from string to numeric for i in ["term","int_rate","emp_length","revol_util"]: data.loc[:,i] = \ data.loc[:,i].apply(lambda x: re.sub("[^0-9]", "", str(x))) data.loc[:,i] = pd.to_numeric(data.loc[:,i]) If I'm not wrong (I checked this with R, not Python), with this approach, you are removing the decimal separators (points). The consequence is minor for the variable "int_rate" because there is always 2 numbers after the points (the % are just multiplicated by 100). But for "revol_util" you have for example 19% that will become 19 while 2.10% will become 210. The point should be included in the brackets in the regex. With R syntax this would be (sorry I'm not certain of the Python syntax) : txt2num <- c("term","int_rate","emp_length","revol_util") for (i in txt2num) { d[,i] <- as.numeric(gsub("[^0-9\\.]", "", d[,i])) } Note also that with this approach, in the variable " emp_length" the category "< 1 year" and "1 year" will both be transformed into the numeric value "1". Maybe it would be more appropriate to transform "< 1 year" into "0.5" to keep these value separated. The consequence of this is probably minor for the demonstration of the technique in this chapter. Thanks for the interesting book by the way... Note from the Author or Editor: You are right. The percentages are treated inconsistently for "int_rate" and "revol_util". But, all the features are transformed into numeric and then scaled, so the features (after performing scaling) are the exact same. We will keep the code as is for now, but we will note this for any substantial future revisions we make to this notebook. Thanks!	Gilles San Martin	Feb 22, 2019	Mar 06, 2020
Printed	Page 28 2nd line of code	$ git lfs pull instruction doe not exist "(base) ~ >git lfs pull git: 'lfs' is not a git command. See 'git --help'. The most similar command is log Note from the Author or Editor: Please add a "$pip install git-lfs" command before "$git lfs install". The new code on page 28 should read: $ git clone https://github.com/aapatel09/handson-unsupervised-learning.git $ pip install git-lfs $ git lfs install $ git lfs pull	Ernesto Belmont	Mar 05, 2021	May 21, 2021
Printed	Page 28 5th line of code	sais $ activate unsupervisedLearning should say $ conda activate unsupervisedLearning Note from the Author or Editor: current: $ activate unsupervisedLearning should be: $ conda activate unsupervisedLearning [on page 28.]	Ernesto Belmont	Mar 05, 2021	May 21, 2021
Printed	Page 30 before Overview of data	In my Mac with 11.2.2 , libomp is missing so it is needed to install > brew install libomp Note from the Author or Editor: In the "Interactive Computing Environment: Jupyter Notebook" section on page 30, please add a Note box that says the following: "On Mac, you may need to install libomp before running $jupyter notebook. Use the following command to install libomp: $brew install libomp"	Ernesto Belmont	Mar 05, 2021	May 21, 2021
Printed	Page 37 8th code line	Using python 8.5 on Mac with MacOS 11.2.2. I can't generate the data plot of page 38 I got the error Traceback (most recent call last): File "proyecto0.py", line 88, in <module> ax = sns.barplot(x="count_classes.index", y="tuple(count_classes/len(data))") -------------------------------- raise ValueError(err) ValueError: Could not interpret input 'count_classes.index' Note from the Author or Editor: Please replace the code block on page 37 with the following (note the change in the second line of code): count_classes = pd.value_counts(data['Class'],sort=True).sort_index() ax = sns.barplot(x=count_classes.index, y=[tuple(count_classes/len(data))[0],tuple(count_classes/len(data))[1]]) ax.set_title('Frequency Percentage by Class') ax.set_xlabel('Class') ax.set_ylabel('Frequency Percentage')	Ernesto Belmont	Mar 05, 2021	May 21, 2021
Printed	Page 39 5th paragraph	on last line pij shuld be p normal sub i,j Note from the Author or Editor: Please change subscript "pi, j" in the last line of the 5th paragraph on page 39 to normal "P" subscript "i,j".	Ernesto Belmont	Mar 05, 2021	May 21, 2021
Printed	Page 44 2nd para	"using confusion matrix would be useful" probably want to mean "using confusion matrix would not be useful" Note from the Author or Editor: Given that our credit card transactions dataset is highly imbalanced, using the confusion matrix would be meaningful. NEEDS TO BE THE FOLLOWING: Given that our credit card transactions dataset is highly imbalanced, using the confusion matrix would not be meaningful.	scott schmidt	Apr 14, 2019	May 03, 2019
PDF	Page 44 Paragraph "Precision-Recall Curve	The PDF sais: Precision = TP / (TP + FN) But this is wrong. Right is: Precision = TP / (TP + FP) Note from the Author or Editor: Yes, confirmed. I just fixed the language in the book.	Philip May	Aug 16, 2019	Mar 06, 2020
Printed	Page 45 recall equation	denominator (true positive / false positive) should be (true positive / false negative) Note from the Author or Editor: Recall = True Positives / (True Positives + False Positives) SHOULD BE: Recall = True Positives / (True Positives + False Negatives)	scott schmidt	Apr 14, 2019	May 03, 2019
PDF	Page 45 Paragraph Precision-Recall Curve	The PDF sais: Recall = TP / (TP+FP) That is wrong. Right is: Recall = TP / (TP+FN) Note from the Author or Editor: I corrected this in the book. Thanks.	Philip May	Aug 16, 2019	Mar 06, 2020
Printed	Page 51 Figure 2-6 description text	Figure 2-6 description at top of page has an unnecessary quotation mark: ‘Figure 2-6. Precision-recall curve of random fores”ts’ Note from the Author or Editor: Yes, please correct.	Tim Hutchinson	May 30, 2020	May 21, 2021
Printed, PDF	Page 59 caption for figure 2-15	The caption for figure 2-15 is: "Test set auROC curve of logistic regression" Should be replaced by: "Test set auROC curve of random forests" Note from the Author or Editor: I made the correction.	Frank Langenau	Dec 29, 2019	Mar 06, 2020
Printed, PDF	Page 109 last line	representaion => representation (missing "t") Note from the Author or Editor: Made the correction.	Frank Langenau	Dec 29, 2019	Mar 06, 2020
Printed, PDF, ePub, Mobi, , Other Digital Version	Page 143 Last paragraph	There is a typo. In the beginning of the explanation about DBSCAN, we have "within a certian distance". This should be "within a certain distance".	Ankur A. Patel	Jul 08, 2021	Dec 10, 2021
Printed, PDF	Page 207 2nd line of code	Coefifcient => Coefficient Note from the Author or Editor: I made the correction.	Frank Langenau	Dec 29, 2019	Mar 06, 2020
PDF	Page 240 3rd para in "Matrix Factorization"	The underscores in the expression R = H__W should be replaced by a dot in the middle of the line because it obviously should denote the matrix multiplication of the two matrices. Note from the Author or Editor: Correct, I made the correction.	Frank Langenau	Dec 28, 2019	Mar 06, 2020
Printed, PDF	Page 244 6th para from top	"This W_h0+vb... " should be "This Wh0+vb..." Note from the Author or Editor:* Correct. I made the correction.	Frank Langenau	Dec 29, 2019	Mar 06, 2020
Printed, PDF	Page 244 End of 7th para from top	The last part of the last sentence in the 7th para is "RBMs are minimizing the probability distribution of the original input form the probability distribution of the reconstructed data." I think, there is something missing after "minimizing". Should be: "...RBMs are minimizing the divergence between the probability ..." or so. Note from the Author or Editor: I made the correction.	Frank Langenau	Dec 29, 2019	Mar 06, 2020
Printed, PDF	Page 295 Last paragraph of the page	"The initial loss of the discriminator fluctuates ..." should be "The initial accuracy of the discriminator fluctuates ...", because the sentence ends with "... but remains considerably above 0.50." (The loss drops below 0.5.) Note from the Author or Editor: I made the correction.	Frank Langenau	Jan 03, 2020	Mar 06, 2020
Printed	Page 310 2nd paragraph	"The distribution is shown in Figure 13-5." - except Figure 13-5 is something else. So the distribution of classes is not shown (the numbers are not printed either) Note from the Author or Editor: Thanks. I removed the block of code from the book. The counts are shown in Figure 13-5, and I clarified the language to make this clear.	Joseph Schwarzbach	Jun 19, 2019	Mar 06, 2020
Printed, PDF	Page 316 1st line	The first line of this page is the same as the second to last line on page 315 and should be deleted. Note from the Author or Editor: I made the correction.	Frank Langenau	Jan 04, 2020	Mar 06, 2020
Printed, PDF	Page 316 First line after the first code block	The line "The adjusted Rand index on the training set ..." must be "The adjusted Rand index on the test set..." Note from the Author or Editor: I made the correction.	Frank Langenau	Jan 04, 2020	Mar 06, 2020