The errata list is a list of errors and their corrections that were found after the product was released. If the error was corrected in a later version or reprint the date of the correction will be displayed in the column titled "Date Corrected".
The following errata were submitted by our customers and approved as valid errors by the author or editor.
Version |
Location |
Description |
Submitted By |
Date submitted |
Date corrected |
|
Page Preface - Acknowledgments
Acknowledgments for Third Edition (2022) |
Minor typo: "Programmer" is mispelled.
It has more than a decade since I started writing the first edition of this book and more than 15 years since I originally started my journey as a Python prorammer.
Note from the Author or Editor: I am fixing the typo
|
Andy Jessen |
Sep 16, 2022 |
|
|
Page Preface
first paragraph |
In the first sentence of the preface here:
wesmckinney.com/book/preface.html
it says:
"This Open Access web version of Python for Data Analysis 3rd Edition is now available as a companion to the print and digital editions. If you encounter any errata, please report them here."
The URL for error reporting is: www.oreilly.com/catalog/errata.csp?isbn=0636920023784
that is the wrong URL. The correct URL is: oreilly.com/catalog/0636920519829/errata
Note from the Author or Editor: will fix
|
Anonymous |
Sep 25, 2022 |
|
|
Page Ch 10, Data Aggregation and Group Operations
10.3 Quantile and Bucket Analysis |
Error:
As you may recall from Ch 8: Data Wrangling: Join, Combine, and Reshape, pandas has some tools, in particular pandas.cut and pandas.qcut, for slicing data up into buckets with bins of your choosing, or by sample quantiles.
Correct:
As you may recall from Ch 7: Data Cleaning and Preparation, pandas has some tools, in particular pandas.cut and pandas.qcut, for slicing data up into buckets with bins of your choosing, or by sample quantiles.
Reason:
pandas.cut and pandas.qcut are discussed in Ch 7 Section 2, Discretization and Binning.
Note from the Author or Editor: will fix
|
Young Tan |
Sep 26, 2022 |
|
|
Page "Reading Text Files in Pieces" in 6.1
4th paragraph |
"The elipsis marks" should be "The ellipsis marks".
Note from the Author or Editor: will fix
|
Noritada Kobayashi |
Oct 10, 2022 |
|
|
Page "A.6 More About Sorting" in Appendix A
2nd code block (program list) |
The randomly generated array (as below) is inappropriate as an example, as the first column is in ascending order from the beginning. Therefore, although we want only the first column to be sorted, there is no change in the array before and after sorting, which makes it difficult to convey the intent.
It would be an appropriate example if it were generated with other parameters.
In [166]: arr = rng.standard_normal((3, 5))
In [167]: arr
Out[167]:
array([[-1.1956, 0.4691, -0.3598, 1.0359, 0.2267],
[-0.7448, -0.5931, -1.055 , -0.0683, 0.458 ],
[-0.07 , 0.1462, -0.9944, 1.1436, 0.5026]])
In [168]: arr[:, 0].sort() # Sort first column values in place
In [169]: arr
Out[169]:
array([[-1.1956, 0.4691, -0.3598, 1.0359, 0.2267],
[-0.7448, -0.5931, -1.055 , -0.0683, 0.458 ],
[-0.07 , 0.1462, -0.9944, 1.1436, 0.5026]])
Note from the Author or Editor: I will improve the example to be more robust to random number generation
|
Noritada Kobayashi |
Oct 30, 2022 |
|
|
Page "Regular Expressions" in 7.4
1st paragraph in p.231 |
Original:
the match object can only tell us the start and end position of the pattern in the string:
Suggestion for improvement:
the match object can tell us the start and end position of the pattern in the string:
Reason:
As the code block that follows indicates, the string representation of the match object includes information about the matched substring in addition to the start and end positions:
Out[174]: <re.Match object; span=(5, 20), match='dave@google.com'>
Note from the Author or Editor: will fix
|
Noritada Kobayashi |
Nov 06, 2022 |
|
|
Page "String Functions in pandas" in 7.4
Table 7-6 |
Error:
Equivalent to built-in str.alnum
Correct:
Equivalent to built-in str.isalnum
Reason:
See the online documentation of Python.
Note from the Author or Editor: will fix
|
Noritada Kobayashi |
Nov 12, 2022 |
|
|
Page Table 6-2, page 257
Argument: skip_footer |
In pandas.read_csv(), the argument "skip_footer" has been deprecated.
It's now "skipfooter".
Note from the Author or Editor: will fix
|
Anonymous |
Nov 26, 2022 |
|
|
Page "Adding legends" in "Ticks, Labels, and Legends" in 9.1
Blocks above and below Figure 9-10 |
Error:
In [50]: ax.legend()
The `legend` method has several other choices for the location `loc` argument. See the docstring (with `ax.legend?`) for more information.
The `loc` legend option tells matplotlib where to place the plot. The default is `"best"`, which tries to choose a location that is most out of the way. To exclude one or more elements from the legend, pass no label or `label="_nolegend_"`.
Correct:
In [50]: ax.legend()
The `legend` method can take the `loc` option, which instructs matplotlib where to place the legend in the plot. The `loc` option defaults to `"best"`, which tries to choose a location that is most out of the way. To exclude one or more elements from the legend, pass no label or `label="_nolegend_"`. The `legend` method has several other choices for the `loc` argument. See the docstring (with `ax.legend?`) for more information.
Reason:
In the 2nd ed., the author passed `loc="best"` as the argument to `legend` in the code block 50, so readers could read the subsequent sentences under the assumption that the `loc` option could be passed. In the 3rd ed., the `loc` option is not passed to `legend` in the code block 50, so the explanation of the `loc` option seems abrupt.
Note from the Author or Editor: I revised the text in this section
|
Noritada Kobayashi |
Jan 07, 2023 |
|
|
Page "Saving Plots to File" in 9.1
1st paragraph |
Error:
You can save the active figure to file using the figure object’s savefig instance method.
Correct:
You can save the figure to file using the figure object’s savefig instance method.
Reason:
In the 2nd ed., the target of the operation was an active figure since the section described `plt.savefig`, but in the 3rd ed., since the `savefig` instance method of a figure object is described, I think the target of the operation does not need to be active.
Note from the Author or Editor: I am removing "active" from the text
|
Noritada Kobayashi |
Jan 07, 2023 |
|
|
Page "Saving Plots to File" in 9.1
Table 9-2 |
Error:
`facecolor, edgecolor`
The color of the figure background outside of the subplots; `"w"` (white), by default.
Correct:
The color of the figure background outside of the subplots; default to `rcParams["savefig.facecolor"]` and `rcParams["savefig.edgecolor"]`, both of which default to `"auto"` (facecolor and edgecolor of the current figure).
Reason:
The default changed from matplotlib 3.3.
Note from the Author or Editor: I'm removing the part about the default altogether since it's pretty in the weeds
|
Noritada Kobayashi |
Jan 07, 2023 |
|
|
Page "Quantile and Bucket Analysis" in 10.3
paragraph spanning p. 339 and p. 340 |
Error:
We can pass `4` as the number of bucket compute sample quartiles, and pass `labels=False` to obtain just the quartile indices instead of intervals:
Suggestion for improvements:
We can pass `4` as the number of bucket to compute sample quartiles, and pass `labels=False` to obtain just the quartile indices instead of intervals:
Reason:
"to" may be missing.
Note from the Author or Editor: I am adding the missing "to"
|
Noritada Kobayashi |
Jan 14, 2023 |
|
|
Page "Exponentially Weighted Functions" in 11.7
3rd paragraph in p.400 |
Error:
with an exponentially weighted (EW) moving average with `span=60`
Correct:
with an exponentially weighted (EW) moving average with `span=30`
Reason:
The code states `span=30` and also the 1st paragraph describes that specifying with `span` makes the result comparable to a simple rolling with the same width.
Note from the Author or Editor: I'm fixing this in the text
|
Noritada Kobayashi |
Mar 05, 2023 |
|
|
Page Chapter 2, Variables and argument passing section
3rd paragraph under the section |
"In some languages, the assignment if b will cause the data [1, 2, 3] to be copied."
if -> of
Note from the Author or Editor: confirmed
|
Jeremy Hageman |
Aug 23, 2023 |
|
|
Page Appendices- Advanced Numpy, A3 Broadcasting
P 667, 'demean_axis' function code |
the last line of function definition of 'demean_axis' should be changed to 'return arr - means[tuple(indexer)]', from 'return arr - means[indexer]'.
Note from the Author or Editor: will fix
|
Lance Lee |
Sep 04, 2023 |
|
|
Page Chapter 5, Indexing Selection and filtering, Selecting on dataframe with loc and iloc
2nd paragraph |
The result of selecting a single row is a Series with an index that contains the DataFrame's column labels. To select multiple roles, creating a new DataFrame, pass a sequence of labels:
To select multiple rows
instead of
To select multiple roles
Note from the Author or Editor: confirmed
|
Elombat Loic |
Sep 05, 2023 |
|
|
Page Page 112
2nd Paragraph |
Wes, I hope your're doing well bro. Enjoying the paperback of edition 3!
This is a minor (possible negligible) language clarification.
paragraph 2 reads:
[Here, arr.mean(axis=1) means "compute mean across the columns," where arr.sum(axis=0) means "compute sum down the rows"]. The choice of wording here is a bit confusing and could potential be interpreted to mean the opposite of what it is saying.
May I suggest, [Here, arr.mean(axis=1) means "compute mean through the rows," where arr.sum(axis=0) means "compute sum through the columns"].
Note from the Author or Editor: i will revise the language to use "over"
|
Daniel Gala |
Nov 04, 2023 |
|
|
Page Creating ndarrays
n/a |
In the two examples for data type for the array that NumPy creates:
In [27]: arr1.dtype
Out[27]: dtype('float64')
In [28]: arr2.dtype
Out[28]: dtype('int64')
The output of the dtype for arr2 is not int64 but int32.
Note from the Author or Editor: I will add a note that the output might be int32 on some platforms
|
Jaeeun Choi |
Nov 11, 2023 |
|
|
Page Chapter 10: Data Aggregation and Group Operations
Quantile and Bucket Analysis Section |
In line "pandas has some tools, in particular pandas.cut and pandas.qcut", the referred section is incorrect.
Incorrect referred section: "Ch 8: Data Wrangling: Join, Combine, and Reshape, "
Correct referred section: "Ch7: Data Cleaning and Preparation"
Note from the Author or Editor: i will fix the reference
|
Thinh Pham |
Nov 12, 2023 |
|
|
Page https://wesmckinney.com/book/python-builtin#control_exceptions
at the first mention of the "finally:" block |
The write_to_file() function is not defined.
Note from the Author or Editor: write_to_file is a fake function for illustration's sake, but I'll clarify anyway
|
Sandor Budai |
Nov 14, 2023 |
|
|
Page Section 4.1 - data types for ndarrays
second note |
In the note it says "A signed integer can represent both positive and negative integers, while an unsigned integer can only represent nonzero integers. For example, int8 (signed 8-bit integer) can represent integers from -128 to 127 (inclusive), while uint8 (unsigned 8-bit integer) can represent 0 through 255."
Second part of the first sentence seems incorrect (nonzero integers)
It should most likely read ", while an unsigned integer can only represent non-negative integers."
The example makes that clear also.
Note from the Author or Editor: confirmed, should be "non-negative"
|
Niclas Ericsson |
Nov 28, 2023 |
|
|
chapter 2
Chapter 2 |
can vs cann
Python Language Basics, IPython, and Jupyter Notebooks
Built-in Data Structures, Functions, and Files
"To check if two variables refer to the same object, use the is keyword. is not cann analogously be used to check that two objects are not the same:"
Note from the Author or Editor: Corrected before publication. Thank you!
|
Anonymous |
Dec 13, 2021 |
Aug 12, 2022 |
Other Digital Version |
§2.3
Language Semantics\Binary operators and comparisons |
"Python Language Basics, IPython, and Jupyter Notebooks
...
Language Semantics
...
Binary operators and comparisons
Most of the binary math operations and comparisons use familiar mathematical syntax used in other programming langauges:"
"languages" instead of "langauages"
Note from the Author or Editor: Corrected before publication. Thank you!
|
Oussama Kiassi |
Jan 12, 2022 |
Aug 12, 2022 |
Other Digital Version |
1.2 Why Python for Data Analysis?, Solving the “Two-Language” Problem
Second paragraph |
The first sentence of the paragraph lacks a verb:
"Over the last decade some new approaches to solving the "two-language" problem, such as the Julia programming language."
Note from the Author or Editor: Corrected before publication. Thank you!
|
Ali Rahmjoo |
Feb 15, 2022 |
Aug 12, 2022 |
|
Page 4.4 Array-Oriented Programming with Arrays
1st code block |
In [169]: points = np.arange(-5, 5, 0.01) # 100 equally spaced points
-> this will return 1000 equally spaced points, not 100
|
Anonymous |
Jan 14, 2024 |
|
|
Page 7.5 Categorical Data
page 391 |
The input in [248] gives an error.
Here is the correct input:
%time
labels.astype('category')
Note from the Author or Editor: fixing the code example
|
Marjorie Curry |
Oct 30, 2022 |
|
|
Page 11.6 Resampling and Frequency Conversion
Table 11-5 |
Expression:
Axis to resample on; default `axis=0`
Suggestion for improvements:
Axis to resample on; default `axis="index"`
Reason:
This is not a mistake, but since the 3rd edition seems to unify the specification of axis in pandas with `"index"` and `"columns"` instead of numbers, the specification with numbers may surprise the reader a little.
Note from the Author or Editor: I am fixing in text
|
Noritada Kobayashi |
Mar 03, 2023 |
|
|
Page 11.6 Resampling and Frequency Conversion
Table 11-5 |
Error:
`fill_method` How to interpolate when upsampling, as in `"ffill"` or `"bfill"`; by default does no interpolation
Correct:
(deletion of description)
Reason:
This option has been removed from API in pandas v0.18.0. See doc/source/whatsnew/v0.18.0.rst in the pandas repository.
Note from the Author or Editor: Removing from text
|
Noritada Kobayashi |
Mar 04, 2023 |
|
|
Page 11.6 Resampling and Frequency Conversion
Table 11-5 |
Error:
`limit` When forward or backward filling, the maximum number of periods to fill
Correct:
(deletion of description)
Reason:
This option has been removed from API in pandas v0.18.0. See doc/source/whatsnew/v0.18.0.rst in the pandas repository.
Note from the Author or Editor: Removing in text
|
Noritada Kobayashi |
Mar 04, 2023 |
|
|
Page 11.7 Moving Window Functions
1st paragraph in p.399 |
Expression:
The `rolling` function also accepts a string indicating a fixed-size time offset rolling() in moving window functions rather than a set number of periods.
Reason:
The meaning of "rolling() in moving window functions", which are inserted in the 3rd edition, seemed to me to be difficult to understand. In the 2nd edition, the sentence corresponding to this sentence was as follows:
The `rolling` function also accepts a string indicating a fixed-size time offset rather than a set number of periods.
Note from the Author or Editor: This "rolling() in moving window functions" piece was inserted in the text by the indexer in error. It can either be removed or converted into its proper indexterm form
|
Noritada Kobayashi |
Mar 05, 2023 |
|
|
Page 13.1 Bitly Data from 1.USA.gov
use the json module and its loads function invoked on each line in the sample file we downloaded |
"import json
with open(path) as f:
records = [json.loads(line) for line in f]"
, but It cann't use loads function invoked on each line in the sample file, Ipython/jupyter pop up a error :"UnicodeDecodeError: 'gbk' codec can't decode byte 0xac in position 6991: illegal multibyte sequence"
Note from the Author or Editor: We need to add encoding="utf-8" when opening the file because this fails in china
|
Sam Z.H. |
Oct 30, 2023 |
|
|
Page 29
1st paragraph |
'If you bind a new object to a variable inside a function, that will not overwrite a variable of the same name in the "scope" outside the function (the "parent scope").'
I believe that the correct is "... that will overwrite a variable ..." as it is demonstrated in the given example below the paragraph.
Note from the Author or Editor: the language is unclear, I will revise
|
John Maciel |
Oct 03, 2023 |
|
|
Page 88
Table 4-1, third entry, 'arange' |
the Python built-in range() function does not return a list but a generator
Note from the Author or Editor: fixing
|
Claas Rostock |
Dec 26, 2022 |
|
|
Page 104
Table 4-3 |
uniform appears two times in the table
Note from the Author or Editor: fixing
|
Claas Rostock |
Dec 26, 2022 |
|
|
Page 133
first paragraph and code block |
"If you assign a Series, its labels will be realigned exactly to the DataFrame's index ..."
In[65]: val = pd.Series([-1.2, -1.5, -1.7], index=["two", "four", "five"])"
This does not demonstrate any matching of frame2's index to the Series index.
It would be more informative as something like '... index=["two", 4, "five"]'
Note from the Author or Editor: I am fixing the code example
|
Gregory Sherman |
Feb 21, 2023 |
|
|
Page 166 (3rd edition)
middle of page |
It is *not* true that "if any value is not NA, then the result is NA." Apparently the default is to skip (exclude) NA values.
Note from the Author or Editor: Yes, the language needs to be fixed to indicate that the result will be the sum of the non-NA values
|
Michael VanValkenburgh |
Nov 15, 2022 |
|
|
Page 176
mid |
You write:
"
Indexing
Can treat one or more columns as the returned DataFrame..
"
Is this correct, or did you mean "treat .. as index of the returned DataFrame"?
Note from the Author or Editor: fixing
|
Claas Rostock |
Dec 28, 2022 |
|
|
Page 207
paragraph at top & [38] |
"Suppose you want to keep only rows containing at most a certain number of missing observations. You can indicate this with the thresh argument."
In fact, as command[38] shows, with thresh=2, only rows with <2 missing values were kept.
In the first sentence of the page, "at most" can be replaced with "less than".
Note from the Author or Editor: I am correcting to "less than" in the text
|
Gregory Sherman |
Mar 03, 2023 |
|
|
Page 274 (third edition)
In [151]: |
The line "In [151]: ..." appears to be superfluous---a holdover from the second edition.
Note from the Author or Editor: will fix
|
Michael VanValkenburgh |
Nov 29, 2022 |
|
|
Page 282 (third edition)
second sentence of 9.1 |
Will you please clarify the difference between
%matplotlib inline
and
%matplotlib notebook
?
For example, Figure 9-15 on page 302 works with notebook but is blank with inline,
and Figure 9-19 on page 307 works with inline but partially overwrites Figure 9-18 with notebook.
Note from the Author or Editor: I will clarify
|
Michael VanValkenburgh |
Nov 30, 2022 |
|
|
Page 301 (third edition)
Table 9-4 |
In Table 9-4, I believe the argument is "layout" (singular).
Note from the Author or Editor: will fix
|
Michael VanValkenburgh |
Nov 29, 2022 |
|
|
Page 310
last paragraph |
In the text, you say histplot can plot both histogram and density plot simultaneously, but then (in Figure 9-23) you only plot the histogram. I wonder if you intended to use kde=True so that both are plotted.
Note from the Author or Editor: You're right, I will fix
|
Alex Dow |
Aug 24, 2023 |
|
|
Page 317
first sentence in 9.3 |
"there [are] many options..." (insert "are")
Note from the Author or Editor: will fix
|
Michael VanValkenburgh |
Nov 30, 2022 |
|