Errata

Practical Statistics for Data Scientists

Errata for Practical Statistics for Data Scientists

Submit your own errata for this product.

The errata list is a list of errors and their corrections that were found after the product was released. If the error was corrected in a later version or reprint the date of the correction will be displayed in the column titled "Date Corrected".

The following errata were submitted by our customers and approved as valid errors by the author or editor.

Color key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update

Version Location Description Submitted By Date submitted Date corrected
PDF
Page 2
1st paragraph 1st line

typo: availablility --> availability

Note from the Author or Editor:
erratum is correct

p. 2 availablility --> availability

Joon-Yong Lee  Jul 29, 2017  May 11, 2018
PDF
Page 6-8
examples, code etc.

Less of an erratum; more of a suggestion.

I had some issues with your script for downloading from googledrive. It was easy enough to repair, nevertheless I'd suggest that you look into the R googledrive package, or at least consider writing something a little less brittle.

At the moment I'm working my way through the 'sample' copy. I haven't yet decided to purchase the book.

Regardless,
Thank you,

Kevin Casey

ps. Also, the typesetting needs some repair too. Some of your equations are improperly formatted.

Note from the Author or Editor:
I updated the download_data.r script to use googledrive package; since this requires installation/update of quite a few packages, I left my previous version as well wrapped in an if(FALSE){} clause.

Kevin Casey  Mar 04, 2018  May 11, 2018
Printed
Page 15, 18
15 formula, 18 near bottom of page

On each of these two pages, MAD has been written as Mean Absolution Deviation. In other places, the A is referenced as 'absolute'. The word absolution is not really a possibility here is it?, having checked its definition.

Note from the Author or Editor:
erratum is correct

page 15 formula, and again page 18 near bottom of page

"absolution" should be replaced with "absolute"

Tom Robey  Jul 24, 2017  May 11, 2018
Printed, PDF, ePub
Page 16
2nd paragraph

The last sentence of the second paragraph reads "However, if you divide by n - 1 instead of n, the standard deviation becomes an unbiased estimate." Dividing by n - 1 instead of n produces an unbiased estimate of the variance, but the estimate of the standard deviation is still biased. See https://en.wikipedia.org/wiki/Unbiased_estimation_of_standard_deviation.

Note from the Author or Editor:
in box on p. 16, end of second para:

EXISTING: "... the standard deviation becomes an unbiased estimate."
CHANGE TO: "... the variance becomes an unbiased estimate."

David W. Body  Mar 09, 2018  May 11, 2018
PDF
Page 27
Top of page

Table 1-3 is a repetition of Table 1-2

Note from the Author or Editor:
Correct, this is for reader convenience since the earlier presentation is 6 pages away. Can reword the introductory sentence above it as follows:
"Table 1-3 (repeated from Table 1-2, earlier, for convenience) ..."

Anonymous  Apr 21, 2017  Jun 23, 2017
Printed
Page 40
last paragraph

The sentence "Now the picture is much clearer: tax-assessed value is much higher in some zip codes (98112, 98105) than in others (98108, 98057)." References two zip codes that aren't in Figure 1-12 (98112 and 98057.) Either it should read "Now the picture is much clearer: tax-assessed value is much higher in some zip codes (98105, 98126) than in others (98108, 98188)." or the plot titles in the figure are incorrect.

Note from the Author or Editor:
The correction is right - the sentence at the bottom of p. 40 should read "Now the picture is much clearer: tax-assessed value is much higher in some zip codes (98105, 98126) than in others (98108, 98188)."

Anonymous  Apr 02, 2018  May 11, 2018
PDF
Page 41
last paragraph

This idea has propogated to various modern graphics systems --> This idea has propagated to various modern graphics systems

Note from the Author or Editor:
p. 41, para at bottom:

1. 3rd line in para, change propogated to propagated

ALSO 2. The items in brackets and parentheses are meant to be index citations, and should indexed and should not appear in the text. These items are
[Trellis-Graphics]
([lattice])
([seaborne])
([bokeh])

JOON-YONG LEE  Jan 02, 2018  May 11, 2018
PDF
Page 45
3rd paragraph

The name was wrong.
Al Landon --> Alf Landon

Note from the Author or Editor:
erratum is correct. p. 45, 3rd para change "Al Landon" to "Alf Landon"

JOON-YONG LEE  Jan 12, 2018  May 11, 2018
PDF
Page 62
1st para

A mistake on this equation: [(1 – [x/100]) / 2]% --> [(100 – x) / 2]%

Note from the Author or Editor:
erratum is correct. P. 62 top line:

EXISTING
... trim [(1-[x/100)/2]% ...

CHANGE TO
... trim [(100-x)/2]%...

JOON-YONG LEE  Jan 19, 2018  May 11, 2018
PDF
Page 65
1st para

typo: prodigous --> prodigious

Note from the Author or Editor:
make change prodigous --> prodigious

JOON-YONG LEE  Jan 17, 2018  May 11, 2018
PDF
Page 67
last paragraph

typo: anamolous --> anomalous

Note from the Author or Editor:
erratum is correct - p. 67 last para -

anamolous --> anomalous

JOON-YONG LEE  Jan 17, 2018  May 11, 2018
PDF
Page 71
first equation

the last term in the formula should be s/sqrt(n) and not s/n

Note from the Author or Editor:
erratum is correct

p. 71, formula at the top

the last term in the formula should be s/sqrt(n) and not s/n

Anonymous  Jun 29, 2017  May 11, 2018
PDF
Page 76
2nd para

where the mean number of events per time period is 2 --> where the mean number of events per time period is 0.2.

Note from the Author or Editor:
p. 76, first para: erratum is correct

EXISTING
where the mean number of events per time period is 2

CHANGE TO
where the mean number of events per time period is 0.2.

JOON-YONG LEE  Jan 21, 2018  May 11, 2018
Printed
Page 87
Bottom of first major paragraph

Text says, "This means that extreme chance results in only one direction direction count toward the p-value."

'Direction' is written twice in a row.

Note from the Author or Editor:
erratum is correct

p. 87, first para after the header, next to last and last line - eliminate one of the "direction"

Tom Robey  Jul 24, 2017  May 11, 2018
Printed
Page 93
For Further Reading section

Bruce's Introductory Statistics and Analytics book is listed with a 2015 date. Pages 88 and 101, the book is listed as 2014.

Note from the Author or Editor:
erratum is correct

p. 93 towards bottom, second item in Further Reading

date on Bruce book should be 2014

Tom Robey   Jul 24, 2017  May 11, 2018
Printed
Page 98
Bottom of Data Science and P-Values paragraph

Sentence reads, " - a feature night be included or ... ". I am thinking the word should have been *might*.

Note from the Author or Editor:
erratum is correct

p. 98, next to last line above box, change "night" to "might"

Tom Robey  Jul 26, 2017  May 11, 2018
PDF
Page 104
Bottom line

The alternative hypothesis uses B > A instead of B < A (or the null hypothesis needs to be changed).

Note from the Author or Editor:
erratum is correct

p. 104 printed edition: last line, should be "A > B" instead of "B > A"

Anonymous  Jul 25, 2017  May 11, 2018
Printed
Page 111
Last sentence.

First release (2017-05-09) of first print edition (May 2017) has Greek letter xi (Unicode 03BE) where Greek letter chi (Unicode 03C7) is meant. Same goes for second formula on page 113.

Note from the Author or Editor:
erratum is correct

p. 111, last line and p. 113, formula in center of page

replace Greek letter xi (Unicode 03BE) with Greek letter chi (Unicode 03C7)

Stephen Frost  Jul 11, 2017  May 11, 2018
Printed
Page 111-114
Throughout

The text is inconsistent in its use of "chi-square" vs. "chi-squared". The main section is titled "Chi-Square Test", however page 113 references "the chi-squared statistic" twice, page 114 contains a section titled "Chi-Squared Test: Statistical Theory" (but mentions "chi-square distribution"), and the output given by R states "Pearson's Chi-squared test".

Note from the Author or Editor:
erratum is correct.
p. 113 sentence in the middle of the page and again in the line beginning "where r and c..." -- change chi-squared to chi-square
p. 114 in the header, and in the line following the header
change "...distribution of the chi-squared statistic..." to "...distribution of the chi-square statistic..."

DO NOT change anything in the R output

Matt Galisa  Aug 14, 2017  May 11, 2018
PDF
Page 112
2nd paragraph

Instead of "same result by random chance" - shouldn't it say same result or more extreme - or something like that?

Note from the Author or Editor:
Confirmed, but on page 96, not 112, 2nd para in the "P-Value" subsection of chapter 3, please replace "...achieve the same result by random chance..." with "... achieve a result as extreme as this, or more extreme, by random chance..."

Anonymous  May 02, 2017  Jun 23, 2017
PDF
Page 124
4th and 5th paragraph

(30% instead of 10%) --> (50% instead of 10%): because 50% is used in the following paragraph as an example.

(say 165 ones and 9,868 zeros) --> (say 165 ones and 9,835 zeros): its sum should be equal to 10,000.

Note from the Author or Editor:
p. 124 - erratum is correct:

at the end of the para starting "So we can try..."

EXISTING
(30% instead of 10%)

CHANGE TO
(50% instead of 10%)

JOON YONG LEE  Feb 07, 2018  May 11, 2018
PDF
Page 129
1st paragraph

interchangable --> interchangeable

Note from the Author or Editor:
p 129, 1st para: erratum is correct
interchangable --> interchangeable

JOON YONG LEE  Feb 09, 2018  May 11, 2018
PDF
Page 136
First equation

Equation for RMSE shows estimate of y_i on LHS

Note from the Author or Editor:
The left-hand side of the equation should read "RMSE" not yi-hat; fixed in Atlas source

Anonymous  May 29, 2017  Jun 23, 2017
PDF
Page 139
last paragraph

"where p is the number of..." Here, p should be a capital P to keep consistency with P in the above AIC equation.

Note from the Author or Editor:
erratum is correct:
p. 139, 3rd line from bottom:

EXISTING: "where p is the number..."
CHANGE TO: "where P is the number..."

P retains its italics

JOON YONG LEE  Feb 11, 2018  May 11, 2018
Printed
Page 153
1st paragraph

The paragraph notes “adding a bathroom increases the sale price by $7,500” however in the previous code output, Bathrooms is shown as 5.537e+03 or about $5,500.

Note from the Author or Editor:
p. 153, end of first text para: erratum is correct

EXISTING:
... increases the sale price by $7500

CHANGE TO
... increases the sale price by $5,537

Peter Edstrom  Feb 04, 2018  May 11, 2018
Printed
Page 154
Last paragraph

The slope of the main effect SqFtTotLiving shows as 1.176e+02 ($117) in the R output but the paragraph says $177. Thus for a home in the highest ZipGroup the slope is the sum of the main effect plus the interaction SqFtTotLiving:ZipGroup5 ($117 + $230 = $347) - the text shows 177 + 230 = 447 which not only does not match the R output but is also arithmetically incorrect (177 + 230 actually equals 407).

Note from the Author or Editor:
erratum is correct: p. 154 last para:

line 3: ...$177 per square foot... should be ...$118 per square foot...
lines 5 and 6: ...or $177 + 230 = $447... should be ...$118 + $230 = $348 per square foot...
line 7: ...by a factor of almost 2.7... should be ...by a factor of more than 2.9...

Matt Galisa  Aug 14, 2017  May 11, 2018
PDF
Page 157
last paragraph

statuatory deed --> statutory deed

Note from the Author or Editor:
erratum is correct:

p. 157, fifth line from bottom, statuatory deed --> statutory deed

JOON-YONG LEE  Feb 15, 2018  May 11, 2018
Printed
Page 170
Middle of main paragraph, under Generalized Additive Models

"Polynomial terms may not flexible enough ... " looks like the word 'be' is missing.

Note from the Author or Editor:
erratum is correct p. 170
EXISTING
"Polynomial terms may not flexible enough ... "

CHANGE TO
"Polynomial terms may not be flexible enough ... "

Tom Robey  Aug 09, 2017  May 11, 2018
Printed
Page 170
Figure 4-12

Figure 4-12, described as representing spline regression, appears identical to Figure 4-10, representing polynomial regression on page 168

Note from the Author or Editor:
The figure in 4.12 is wrong...it is currently a repeat of figure 4.10. I will update with the correct figure

Marshall Ehlinger  Feb 10, 2018  May 11, 2018
Printed
Page 196
Figure 5-5

Both rows of the figure are labeled y = 1; the lower row should be labeled y = 0.

Note from the Author or Editor:
erratum is correct: lower left cell of Figure 5-5 should read y=0

Matt Galisa  Aug 14, 2017  May 11, 2018
PDF
Page 196
Figure 5-5

Shorthand for Specificity labeled as FP/(y=0). It should be Specificity TN/(y=0).

Note from the Author or Editor:
erratum is correct:

p. 196, chart, far right, Specificity should be TN/(y=0)

John Masiello  Sep 14, 2017  May 11, 2018
Printed
Page 197
Bottom of page

The denominator in the equation for specificity is incorrect. ∑FalseNegative should be replaced with ∑FalsePositive.

Note from the Author or Editor:
erratum is correct

p. 197 last formula, second element in denominator should be ∑FalsePositive

Phil Terwilliger  Jan 11, 2018  May 11, 2018
PDF
Page 201
last paragraph

indiscriminantly --> indiscriminately

Note from the Author or Editor:
p. 201, second to last line, erratum is correct. Fix misspelling:

indiscriminantly --> indiscriminately

JOON-YONG LEE  Feb 27, 2018  May 11, 2018
PDF
Page 205
1st paragraph in Data Generation

(see “Undersampling” on page 204) --> (see “Oversampling and Up/Down Weighting” on page 204)

Note from the Author or Editor:
p. 205, first line under heading "Data Generation,"
erratum is correct:

EXISTING
see “Undersampling” on page 204)
CHANGE TO
(see “Oversampling and Up/Down Weighting” on page 204

JOON-YONG LEE  Feb 28, 2018  May 11, 2018
PDF
Page 208
in further reading

Analytics Vidya --> Analytics Vidhya

Note from the Author or Editor:
erratum is correct - 3rd item under Further Reading change should read
...Analytics Vidhya...

JOON-YONG LEE  Feb 28, 2018  May 11, 2018
Printed
Page 212
3rd paragraph

describes the paid off symbol as triangle but is actually a cross. states the qty of default (circle) as 14 and paid of (cross) as 6, but in the figure 6.2 it is 9 default and 11 paid off

for paragraph 1 the step of making a prediction with dti=22.5 and payment_inc_ratio=9 is not listed, just the outcome

Note from the Author or Editor:
p. 212, para in the middle:
EXISTING: The circles (default) and triangles (paid off)
CHANGE TO: The circles (default) and crosses (paid off)

ALSO
EXISTING: "... 14 defaulted loans lie within the circle as compared with only 6 paid-off loans. Hence the predicted outcome of the loan is default"
CHANGE TO: "... 9 defaulted loans lie within the circle as compared with 11 paid-off loans. Hence the predicted outcome of the loan is paid-off"

David Pugh  Mar 01, 2018  May 11, 2018
Printed
Page 221
diagram

The decision tree diagram could do with an explanation of which branch to follow if the node question is true or false. Its not immediately obvious that you go to the left if true and to the right if false

Note from the Author or Editor:
In middle of p. 221, last line before figure:
EXISTING: " ... traversing through a hierarchical tree, starting at the root until a leaf is reached"
CHANGE TO: "...traversing through a hierarchical tree, starting at the root and moving left if the node is true and right if not, until a leaf is reached."

David Pugh  Mar 01, 2018  May 11, 2018
PDF
Page 222
last paragraph

righthand region --> lefthand region

Note from the Author or Editor:
last line on page 222: Change righthand region to lefthand region

JOON-YONG LEE  Mar 06, 2018  May 11, 2018
PDF
Page 223
Figure 6-4

A caption for Figure 6-4 is same to the caption for Figure 6-3. It must be fixed.

Note from the Author or Editor:
p. 223 caption for figure 6.4 should read "The first three rules for a simple tree model fit to the loan data.

JOON-YONG LEE  Mar 06, 2018  May 11, 2018
PDF
Page 230
top of page

Says "refered to as random forest" instead of "referred"

Note from the Author or Editor:
erratum is correct

p. 230 very first word on page should be "referred" instead of "refered"

Anonymous  Jul 18, 2017  May 11, 2018
PDF
Page 230
1st paragraph in Bagging

n records. --> N records: in Step 1 of the bagging algorithm, n means the size of bootstrap resample.

Note from the Author or Editor:
Confirmed - In first para under "Bagging" header, end of sentence should read "with N records." instead of "with n records"

JOON-YONG LEE  Apr 02, 2018  May 11, 2018
Printed, PDF
Page 244
last paragraph

acting in a similar mannger --> manner

Note from the Author or Editor:
change: acting in a similar mannger --> manner

JOON-YONG LEE  Apr 09, 2018  May 11, 2018
PDF
Page 267
last paragraph

The oil stocks (XOM, CVS, SLB, COP) --> The oil stocks (XOM, CVX, SLB, COP)

Note from the Author or Editor:
erratum is correct - p. 267, next to last line, change CVS to CVX

JOON-YONG LEE  Apr 14, 2018  May 11, 2018
PDF
Page 268
main steps of the agglomerative algorithm

for "D(Ck,Cl))" in step 2 and 3, right parentheses are duplicated.

Note from the Author or Editor:
Erratum is correct - p. 268, drop the extra right parenthesis in step 2 and step 3

JOON-YONG LEE  Apr 14, 2018  May 11, 2018
PDF
Page 272
1st paragraph in Mixtures of Normals

N1(μ1),Σ1), N1(μ2),Σ2), ..., N1(μK),ΣK) has wrong a dimension and parentheses.
--> it should be like this N2(μ1,Σ1), N2(μ2,Σ2), ..., N2(μK,ΣK)

Note from the Author or Editor:
p. 272 - correction should be made as stated - last line of initial para should read N2(μ1,Σ1), N2(μ2,Σ2), ..., N2(μK,ΣK)

JOON-YONG LEE  Apr 14, 2018  May 11, 2018
PDF
Page 281
last code block

i cannot find the definition of "dnd_cut".
we need this
dnd_cut <- cut(dnd, h=0.5)

Note from the Author or Editor:
erratum is correct. Bottom of p. 281, Insert an additional line of code ABOVE the two lines already there, so it reads

> dnd_cut <- cut(dnd, h=0.5)
> df[labels... etc.

JOON-YONG LEE  Apr 18, 2018  May 11, 2018
Other Digital Version
567
Table 1-5

I am using the Kindle edition this is LOCATION 567 not page number

Table 1-5 shows a column States in which all States are correctly allocated to their respective bin or break.

The R code does not create this column ... I don't know, yet, how to correct this problem so your advice is awaited!

Note from the Author or Editor:
There was a bug in the R script that created the state abbreviations. I have uploaded the code with the following fix:

state_abb <- state %>%
arrange(Population) %>%
group_by(PopFreq) %>%
summarize(state = paste(Abbreviation, collapse=","), .drop=FALSE) %>%
complete(PopFreq, fill=list(state='')) %>%
select(state)

state_abb <- unlist(state_abb)

Duncan Williamson  Sep 15, 2017  May 11, 2018