Errata

Machine Learning and Security

Errata for Machine Learning and Security

Submit your own errata for this product.

The errata list is a list of errors and their corrections that were found after the product was released.

The following errata were submitted by our customers and have not yet been approved or disproved by the author or editor. They solely represent the opinion of the customer.

Color Key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update

Version Location Description Submitted by Date Submitted
Other Digital Version 2309
Location 2309 of kindle version in ARIMA section below figure 3-2

In the kindle version, there is a link to PyFlux. This link seems to no longer be valid and goes to pyflux.com. On my first attempt, this landed me on an “update your flash” page that was likely malware, though subsequent attempts go to a boilerplate landing page.

Daniel  Jun 11, 2020 
Chapter 1
Where detailing what a true positive is

In the Safari Books Online version, the text states that a true positive for spam prediction is the following:

True positive: predicted spam + actual ham

The text should be:

True positive: predicted spam + actual spam

Asa Freedman  Jan 28, 2019 
1
Labeling spam or ham code at `import os`

The example in the Safari Books Online edition leaves out the section of code where the spam_words and ham_words are compared against the X_test set of the code. The next paragraph goes into a confusion matrix about this non existent data. The whole code is included in the GitHub profile, though, which may/not be useful to someone attempting to type with the book.

Asa Freedman  Jan 28, 2019 
Printed Page 160
1st non-code paragraph, first sentence

Upfront I apologize for my pedantry on this but the book is about computer security and computer security people often care about pedantry.

The first sentence on page 160 says:

"We indeed find some references to the Unix su (super user) privilege escalation command..."

Yet the *su* command does not stand for "super user", instead it stands for either "switch user" or "substitute user" depending on the flavor of UNIX one is on.
That sentence will be better as:

"We indeed find some references to the Unix su (substitute user) privilege escalation command..."

Michal Grochmal  Sep 10, 2018 
Printed Page 31
Code examples

According to the book it uses the following metrics methods:

sklearn.metrics:
- accuracy_score
- confussion_matrix

After the 3rd paragraph it says that to measure the accuracy score of the model that we created using LogisticRegression is:

accuracy_score(y_pred, y_test)

but according to sklearn docs:

sklearn.metrics.accuracy_score(y_true, y_pred, normalize=True, sample_weight=None)

being y_true the correct labels (in this case: y_test) and y_pred the values classified by the model (in this case: y_pred).

So in order to correct the correct method usage is:

accuracy_score(y_test, y_pred)

Even though is does not affect the score rating (0.99992273816) it is important to not confuse the readers and enforce the proper usage of the API.

Regards,
- Miguel

Miguel Diaz  Apr 04, 2018 
Printed Page 26
3rd and 4th paragraph

In the 3rd paragraph the author talks about classifying "malicious" and "legitimate" traffic using a threshold of 6000 requests over a period of 5 minutes (over 6000 is malicious and lower than this is legitimate).

But on the 4th paragraph it talks about the number "20" as that threshold and it should be 6000! I think it's importante to correct this for the sake of new readers.

Regards,
- Miguel

Miguel Díaz  Apr 04, 2018 
Printed Page 20
First line (Code example)

First line of Page 20 there is an example of python code that is incorrect.

Original:
if len(stems) < 2: continue

Fix:
if len(stems) < 2: continue

I assume this is due the book printing software interpretation of language code, since < is &lt; in HTML codification.

I think it's important for the sake of people who are trying the code dirtectly from the book instead of download it from the original source[1]

[1] https://github.com/oreilly-mlsec/book-resources/blob/master/chapter1/spam-fighting-lsh.ipynb

Regards,
- Miguel

MIguel Diaz  Apr 02, 2018 
Chapter 3, Decision Forests 1st paragraph

The two most common types of forests used in practice are *decision* forests and gradient-boosted decision trees

The two most common types of forests used in practice are *random* forests and gradient-boosted decision trees

Henry  Feb 25, 2018 
Printed Page 21
4th paragraph

The text states: "(the argument random_state=123 is passed in for the sake of result reproducibility)"

but the actual code uses 'random_state=2"

Mike Eriksson  Feb 23, 2018