The errata list is a list of errors and their corrections that were found after the product was released. If the error was corrected in a later version or reprint the date of the correction will be displayed in the column titled "Date Corrected".
The following errata were submitted by our customers and approved as valid errors by the author or editor.
Version |
Location |
Description |
Submitted By |
Date submitted |
Date corrected |
|
Page Confidence intervals
A couple of pages in |
On Kindle app, error in formula
Mistake: the factor...to get a 1-alpha confidence interval is given by abs(ppf((1-alpha)/2)))
Correction: the factor...to get a 1-alpha confidence interval is given by abs(ppf(alpha/2))
Note from the Author or Editor: Right after “confidence interval is given by”, inside the formula, it should be ppf(alpha) instead of ppf(1-alpha). The following code is correct and needs no modification.
|
Francis Doornaert |
Aug 17, 2023 |
|
Printed |
Page Page 22, Section "A visual guide to Bias"
In the image right after "The reason for this is bias, which is depicted in the right plot:” |
On the image, on the leftmost curly braces, the equation should be
E[Y|T=1] - E[Y|T=0], not E[Y|T=1] = E[Y|T=0].
|
Matthew Facure |
Sep 01, 2023 |
|
|
Page Chapter 3, Section "Conditioning on a Collider"
Local 3042 of Kindle version, first formula |
The formula states:
E[Y|T=1,R=1] - E[Y|T=1,R=1] = E[Y_1 - Y_0|R=1] + E[Y_0|T=0,R=1] - E[Y_0|T=1,R=1]
I think in the left side the second term should be E[T|T=0,R=1] since it is the average difference between treated and no treated who responded to the survey.
Note from the Author or Editor: The formula should be
E[Y|T=1,R=1] - E[Y|T=0,R=1] = E[Y_1 - Y_0|R=1] + E[Y_0|T=1,R=1] - E[Y_0|T=0,R=1]
|
Felipe Frigeri |
Sep 07, 2023 |
|
|
Page Chapter 11 RDD “The IV Estimate”
The code blocks |
Firstly, it appears that the cutoff value used in the code is 10k, while it should actually be 5k. This has a downstream effect on the regression models and, consequently, the calculated LATE.
Secondly, the code implies that the ITTE can be directly derived from the conditional coefficient for the intercept in the linear regression. This would be a valid approach if the cutoff were at 0, but it's actually at 5k. This simplification seems to contradict the locality assumption of RDD, stating that the estimator is valid only near the threshold `R=c`.
Note from the Author or Editor: In the section Intention to Treat Effect (pg 356 of the printed book), the paragraph right after the table should be updated to:
"Then, let's center the running variable, balance, to shift the threshold to zero. In this case, since the discontinuity is at 5000, you can do this by subtracting 5000 from the balance variable. (This is just a trick to make interpreting the regression parameters easier). Next, you need to regress the outcome variable on the centered running variable R interacted with a dummy for being above the threshold (R > 0):
y_i = \beta_0 + \beta_1 r_i + \beta_2 \mathbb{1}\{r_i>0\} + \beta_3 \mathbb{1}\{r_i>0\} r_i
The parameter estimate associated with crossing the threshold…"
Also, code block 20 should be:
m = smf.ols(f"pv~balance*I(balance>0)",
df_dd.assign(balance=lambda d: d["balance"]-5000)).fit()
m.summary().tables[1]
And table resulting from this code should be as in the updated code, cell 25:
https://github.com/matheusfacure/causal-inference-in-python-code/blob/main/causal-inference-in-python/11-Non-Compliance-and-Instruments.ipynb
In the section The IV Estimate, code block 21 should be updated to
def rdd_iv(data, y, t, r, cutoff):
centered_df = data.assign(**{r: data[r]-cutoff})
compliance = smf.ols(f"{t}~{r}*I({r}>0)", centered_df).fit()
itte = smf.ols(f"{y}~{r}*I({r}>0)", centered_df).fit()
param = f"I({r} > 0)[T.True]"
return itte.params[param]/compliance.params[param]
rdd_iv(df_dd, y="pv", t="prime_card", r="balance", cutoff=5000)
The result from this code block should also be updated to 732.8534752298891. See code block 27 in the GitHub link above.
Finally, the array just before the Bunching section should be updated to array([655.08214249, 807.83207567]). See code block 30 in the GitHub link above.
|
Alex Roy |
Oct 30, 2023 |
|
|
Page 36
2nd Paragraph |
The woman and man values should be switched to make sense with the rest of the paragraph.
"When you look at age, treatment groups seem very much alike, but there seems to be a difference in gender (woman = 1, man = 0)."
Note from the Author or Editor: It should be "(woman = 0, man = 1)".
|
Clayton Schoeny |
Jul 24, 2023 |
|
|
Page 42
1st Equation |
In the equation for the estimate of the standard deviation, the summation should start at i=1, not i=0.
Note from the Author or Editor: In the equation, it should be i=1, not i=0.
|
Clayton Schoeny |
Jul 24, 2023 |
|
|
Page 48
Practical Example |
The equation following "They report the efficacy of the vaccine," is not correct. It's printed as as E[Y|T = 0] / E[Y|T = 1], but this would give us a value of 56.5/3.3 = 17.12.
Rather, one way to correctly write the equation is 1 - (E[Y|T = 1] / E[Y|T = 0]).
Note from the Author or Editor: The equation after "They report the efficacy of the vaccine" should be 1 - (E[Y|T = 1] / E[Y|T = 0]).
|
Clayton Schoeny |
Jul 31, 2023 |
|
Printed |
Page 58
Code cell 19 |
Missing a **2 in the code “np.ceil(16 * no_email.std()**2/0.01)”. It should be
“np.ceil(16 * no_email.std()**2/0.01**2)”, however, this gives a number too that is to large to go well with what is written around this code. A better solution is to replace the detectable difference from 1% to 8%.
“So, if you want to craft a cross-sell email experiment where you want to detect a 8% difference, like the one you saw in this conversion email example, you must have a sample size that gives you at least 8% = 2.8SE.
[...]
In [19]: np.ceil(16 * (no_email.std()/0.08)**2)
Out[19]: 103.0
"
|
Matthew Facure |
Sep 01, 2023 |
|
Printed |
Page 60
Last equation in the chapter. |
In the equation right after “you could simplify the sample size formula to:”, there is a ^2 missing. It is
N = 16 * σ^2/δ
but it should be
N = 16 * σ^2/δ^2.
The correct equation can be found at page 58.
|
Matthew Facure |
Sep 01, 2023 |
|
Printed |
Page 97
“It projects all the X variables into the outcome dimension and makes the comparison between treatment and control on that projection.” |
It should be “It projects the outcome variable into the X variables and makes the comparison between treatment and control on that projection.”
|
Matthew Facure |
Sep 01, 2023 |
|
|
Page 151 (Conditioning on a collider)
After first paragraph |
Left hand side of the formula contains an error which has already been submitted as an erratum by another reader (Felipe Frigeri)
But there is also an error in the right hand side, in the SelectionBias collection of terms:
E[Y_0|T=0, R=1] - E[Y_0|T=1, R=1]
should be corrected to
E[Y_0|T=1, R=1] - E[Y_0|T=0, R=1]
Note from the Author or Editor: The right most term, above SelectionBias, should be E[Y_0|T=1, R=1] - E[Y_0|T=0, R=1]
|
Francis Doornaert |
Sep 14, 2023 |
|
|
Page 433
Multiple Cohorts charts or code block |
The example description and code snippet says the data is subset to the West region, but the example charts are labeled Multiple Cohorts - North Region
Note from the Author or Editor: The 1st Plot in the Staggered Adoption section should read West instead of North. This was already fixed in the book's code, cell 42.
https://github.com/matheusfacure/causal-inference-in-python-code/blob/main/causal-inference-in-python/08-Difference-in-Differences.ipynb
|
Kara Downey |
Sep 14, 2023 |
|