Errata

Errata for Causal Inference in Python

Submit your own errata for this product.

The errata list is a list of errors and their corrections that were found after the product was released. If the error was corrected in a later version or reprint the date of the correction will be displayed in the column titled "Date Corrected".

The following errata were submitted by our customers and approved as valid errors by the author or editor.

Color key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update

Version	Location	Description	Submitted By	Date submitted
	Page Confidence intervals A couple of pages in	On Kindle app, error in formula Mistake: the factor...to get a 1-alpha confidence interval is given by abs(ppf((1-alpha)/2))) Correction: the factor...to get a 1-alpha confidence interval is given by abs(ppf(alpha/2)) Note from the Author or Editor: Right after “confidence interval is given by”, inside the formula, it should be ppf(alpha) instead of ppf(1-alpha). The following code is correct and needs no modification.	Francis Doornaert	Aug 17, 2023
Printed	Page Page 22, Section "A visual guide to Bias" In the image right after "The reason for this is bias, which is depicted in the right plot:”	On the image, on the leftmost curly braces, the equation should be E[Y\|T=1] - E[Y\|T=0], not E[Y\|T=1] = E[Y\|T=0].	Matthew Facure	Sep 01, 2023
	Page Chapter 3, Section "Conditioning on a Collider" Local 3042 of Kindle version, first formula	The formula states: E[Y\|T=1,R=1] - E[Y\|T=1,R=1] = E[Y_1 - Y_0\|R=1] + E[Y_0\|T=0,R=1] - E[Y_0\|T=1,R=1] I think in the left side the second term should be E[T\|T=0,R=1] since it is the average difference between treated and no treated who responded to the survey. Note from the Author or Editor: The formula should be E[Y\|T=1,R=1] - E[Y\|T=0,R=1] = E[Y_1 - Y_0\|R=1] + E[Y_0\|T=1,R=1] - E[Y_0\|T=0,R=1]	Felipe Frigeri	Sep 07, 2023
	Page Chapter 11 RDD “The IV Estimate” The code blocks	Firstly, it appears that the cutoff value used in the code is 10k, while it should actually be 5k. This has a downstream effect on the regression models and, consequently, the calculated LATE. Secondly, the code implies that the ITTE can be directly derived from the conditional coefficient for the intercept in the linear regression. This would be a valid approach if the cutoff were at 0, but it's actually at 5k. This simplification seems to contradict the locality assumption of RDD, stating that the estimator is valid only near the threshold `R=c`. Note from the Author or Editor: In the section Intention to Treat Effect (pg 356 of the printed book), the paragraph right after the table should be updated to: "Then, let's center the running variable, balance, to shift the threshold to zero. In this case, since the discontinuity is at 5000, you can do this by subtracting 5000 from the balance variable. (This is just a trick to make interpreting the regression parameters easier). Next, you need to regress the outcome variable on the centered running variable R interacted with a dummy for being above the threshold (R > 0): y_i = \beta_0 + \beta_1 r_i + \beta_2 \mathbb{1}\{r_i>0\} + \beta_3 \mathbb{1}\{r_i>0\} r_i The parameter estimate associated with crossing the threshold…" Also, code block 20 should be: m = smf.ols(f"pv~balanceI(balance>0)", df_dd.assign(balance=lambda d: d["balance"]-5000)).fit() m.summary().tables[1] And table resulting from this code should be as in the updated code, cell 25: https://github.com/matheusfacure/causal-inference-in-python-code/blob/main/causal-inference-in-python/11-Non-Compliance-and-Instruments.ipynb In the section The IV Estimate, code block 21 should be updated to def rdd_iv(data, y, t, r, cutoff): centered_df = data.assign({r: data[r]-cutoff}) compliance = smf.ols(f"{t}~{r}I({r}>0)", centered_df).fit() itte = smf.ols(f"{y}~{r}*I({r}>0)", centered_df).fit() param = f"I({r} > 0)[T.True]" return itte.params[param]/compliance.params[param] rdd_iv(df_dd, y="pv", t="prime_card", r="balance", cutoff=5000) The result from this code block should also be updated to 732.8534752298891. See code block 27 in the GitHub link above. Finally, the array just before the Bunching section should be updated to array([655.08214249, 807.83207567]). See code block 30 in the GitHub link above.	Alex Roy	Oct 30, 2023
	Page 36 2nd Paragraph	The woman and man values should be switched to make sense with the rest of the paragraph. "When you look at age, treatment groups seem very much alike, but there seems to be a difference in gender (woman = 1, man = 0)." Note from the Author or Editor: It should be "(woman = 0, man = 1)".	Clayton Schoeny	Jul 24, 2023
	Page 42 1st Equation	In the equation for the estimate of the standard deviation, the summation should start at i=1, not i=0. Note from the Author or Editor: In the equation, it should be i=1, not i=0.	Clayton Schoeny	Jul 24, 2023
	Page 48 Practical Example	The equation following "They report the efficacy of the vaccine," is not correct. It's printed as as E[Y\|T = 0] / E[Y\|T = 1], but this would give us a value of 56.5/3.3 = 17.12. Rather, one way to correctly write the equation is 1 - (E[Y\|T = 1] / E[Y\|T = 0]). Note from the Author or Editor: The equation after "They report the efficacy of the vaccine" should be 1 - (E[Y\|T = 1] / E[Y\|T = 0]).	Clayton Schoeny	Jul 31, 2023
Printed	Page 58 Code cell 19	Missing a *2 in the code “np.ceil(16 no_email.std()*2/0.01)”. It should be “np.ceil(16 no_email.std()2/0.012)”, however, this gives a number too that is to large to go well with what is written around this code. A better solution is to replace the detectable difference from 1% to 8%. “So, if you want to craft a cross-sell email experiment where you want to detect a 8% difference, like the one you saw in this conversion email example, you must have a sample size that gives you at least 8% = 2.8SE. [...] In [19]: np.ceil(16 * (no_email.std()/0.08)**2) Out[19]: 103.0 "	Matthew Facure	Sep 01, 2023
Printed	Page 60 Last equation in the chapter.	In the equation right after “you could simplify the sample size formula to:”, there is a ^2 missing. It is N = 16 * σ^2/δ but it should be N = 16 * σ^2/δ^2. The correct equation can be found at page 58.	Matthew Facure	Sep 01, 2023
Printed	Page 97 “It projects all the X variables into the outcome dimension and makes the comparison between treatment and control on that projection.”	It should be “It projects the outcome variable into the X variables and makes the comparison between treatment and control on that projection.”	Matthew Facure	Sep 01, 2023
	Page 151 (Conditioning on a collider) After first paragraph	Left hand side of the formula contains an error which has already been submitted as an erratum by another reader (Felipe Frigeri) But there is also an error in the right hand side, in the SelectionBias collection of terms: E[Y_0\|T=0, R=1] - E[Y_0\|T=1, R=1] should be corrected to E[Y_0\|T=1, R=1] - E[Y_0\|T=0, R=1] Note from the Author or Editor: The right most term, above SelectionBias, should be E[Y_0\|T=1, R=1] - E[Y_0\|T=0, R=1]	Francis Doornaert	Sep 14, 2023
	Page 433 Multiple Cohorts charts or code block	The example description and code snippet says the data is subset to the West region, but the example charts are labeled Multiple Cohorts - North Region Note from the Author or Editor: The 1st Plot in the Staggered Adoption section should read West instead of North. This was already fixed in the book's code, cell 42. https://github.com/matheusfacure/causal-inference-in-python-code/blob/main/causal-inference-in-python/08-Difference-in-Differences.ipynb	Kara Downey	Sep 14, 2023