Errata

Errata for Python for Data Analysis, Third Edition

Submit your own errata for this product.

The errata list is a list of errors and their corrections that were found after the product was released.

The following errata were submitted by our customers and have not yet been approved or disproved by the author or editor. They solely represent the opinion of the customer.

Color Key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update

Version	Location	Description	Submitted by	Date submitted
Other Digital Version	Preface Using Code Examples	Words wrong way around on wesmckinney.com You can data find files should be You can find data files	Steven Mooney	Feb 16, 2024
Printed, ePub	Page Section 3.1, page 59 1st paragraph	The example: In [118]: hash("string") Out [118]: 3634226001988967898 However, when I did it I got inconsistent results from the hash function. below are examples of the result from running the function 4 consecutive times: -783493489962912440 -2593540438211823544 5958934601557521611 1519405966352344185 Thus this function could not be used to verify the object "string" could be used as a dictionary key. I am using an 2021 iMac with an Apple M1 chip, 16 GB memory, and macOS Sonoma 14.2.1 I am using PyCharm 2023.3.3 (Community Edition) Build #PC-233.13763.11, built on January 25, 2024 Runtime version: 17.0.9+7-b1087.11 aarch64 VM: OpenJDK 64-Bit Server VM by JetBrains s.r.o. macOS 14.2.1 GC: G1 Young Generation, G1 Old Generation Memory: 2048M Cores: 8 Metal Rendering is ON Registry: ide.experimental.ui=true Non-Bundled Plugins: com.jetbrains.edu (2024.1-2023.3-882)	Patrick Salkeld	Feb 16, 2024
Other Digital Version	Section 5.2; Indexing, Selection, and Filtering Selection on DataFrame with loc and iloc	The word rows is misspelled as "roles". The result of selecting a single row is a Series with an index that contains the DataFrame's column labels. To select multiple roles, creating a new DataFrame, pass a sequence of labels:	Andrei	Feb 17, 2024
Other Digital Version	Generator expressions 3rd code listing	syntax typo for the statement `dict((i, i **2) for i inrange(5))` should have a space between the keywords `in` and `range`.	Ben To	Feb 19, 2024
Other Digital Version	Set hashable set elements part	just missing a space before the first parenthesis in the sentence "set elements generally must be immutable, and they must be hashable(which means that calling hash on a value does not raise an exception)."	Ben To	Feb 19, 2024
Printed, ePub	Page Page 98, Section 4.1 First example, first 3 paragraphs	When tried to duplicate this example: names = np.array(["Bob", "Joe", "Will", "Bob", "Will", "Joe", "Joe"]) data = ([[4, 7], [0,2], [-5, 6], [0, 0],[1, 2], [-12, -4], [3, 4]]) names == "Bob" data[names == "Bob"] I got this error: Traceback (most recent call last): File "/Volumes/Extreme SSD/Python Data Analysis/Python3_for_Data_Analysis/main.py", line 550, in <module> data[names == "Bob"] ~~~~^^^^^^^^^^^^^^^^ TypeError: only integer scalar arrays can be converted to a scalar index This contradicts the subsequent text which states: "...You can even mix and match Boolean arrays with slices or integers (or sequences of integers; more on this later)."	Patrick Salkeld	Feb 19, 2024
Other Digital Version	Chapter 4 - Data Types for ndarrays Second note box	Where the online text says "A signed integer can represent both positive and negative integers, while an unsigned integer can only represent nonzero integers", the phrase "nonzero integers" should be "non-negative integers".	Ben To	Mar 04, 2024
O'Reilly learning platform	Page Chapter 10.x Throughout the chapter	Chapter 10 uses DataFrame.groupby(...,axis="columns") on several occasions, which is deprecated.	Jochen Schüttler	Apr 09, 2024
Other Digital Version	Chapter 4, Section "Data Types for ndarrays" The second Note (after Table 4.2)	Text: "A signed integer can represent both positive and negative integers, while an unsigned integer can only represent nonzero integers." Suggestion: "A signed integer can represent both positive and negative integers, while an unsigned integer can only represent non-negative integers, including zero."	Alessandro Botelho Bovo	Jun 06, 2024
Other Digital Version	Chapter 2, section "Numeric types" 3rd paragraph	It says: "Integer division not resulting in a whole number will always yield a floating-point number" Suggestion: "Integer division will always yield a floating-point number"	Alessandro Botelho Bovo	Jun 06, 2024
Other Digital Version	Chapter 4, Section "Unique and Other Set Logic" 1st paragraph	It says: "NumPy has some basic set operations for one-dimensional ndarrays. A commonly used one is numpy.unique, which returns the sorted unique values in an array:" The sentence might imply that `numpy.unique` only works for one-dimensional arrays, which is not true. The `numpy.unique` function also works for n-dimensional arrays, although by default it flattens the array to one dimension before finding the unique values.	Alessandro Botelho Bovo	Jun 11, 2024
ePub	Page Chapter 3, List Discussion regarding "Extend"	Document at learning.oreilly.com. In the discussion of "Extend", the text compares extend to "+" with adding a multi-element list in _one_ move to another multi-element list. However, when discussing performance, the text describes adding the multi-element list in _n_ moves where _n_ is the length of the list being added, using a for loop. There seems to be little point to using either "extend" or "+" to add one element at a time to a list. One might as well use "append", it would make the code easier to understand.	Steven O. Ellis	Jul 07, 2024
O'Reilly learning platform	Page Chapter 2 Tab Completion	"Also, you can also complete methods and attributes on any object after typing a period:" double use of 'also'	Anonymous	Sep 05, 2024
ePub	Page https://wesmckinney.com/book/data-analysis-examples#whetting_movielens In [98]: movies["genre"] = movies.pop("genres").str.split("\|")	In [98]: movies["genre"] = movies.pop("genres").str.split("\|") should be movies["genres"] = movies.pop("genre").str.split("\|")	Anonymous	Sep 11, 2024
Other Digital Version	Creating ndarrays Quinto parrafo.	In [31]: np.empty((2, 3, 2)) Out[31]: array([[[0., 0.], [0., 0.], [0., 0.]], [[0., 0.], [0., 0.], [0., 0.]]]) La función np.empty no inicializa los valores del array, por lo que los valores que muestra son arbitrarios y no necesariamente ceros. El resultado es más consistente con np.zeros. In [46]: np.empty((2, 3, 2)) Out[46]: array([[[4.67296746e-307, 1.69121096e-306], [1.78020984e-306, 1.55762979e-307], [1.78022342e-306, 8.06635958e-308]], [[1.86921415e-306, 1.00132737e-307], [1.33508506e-307, 9.45701377e-308], [1.11257937e-307, 2.00755374e-317]]])	Gerald Juárez	Sep 14, 2024
Other Digital Version	Section 11.1 Table 11.2	In the “Open Access” HTML version, Table 11.2: datetime format specification: It says: "%j - Day of the year as a zero-padded integer (from 001 to 336)" According to the official Python document (and common sense:) ), the value range should be "from 001 to 366"	Jihang Tang	Sep 26, 2024
ePub	Page Using Code Examples First sentence.	The first sentence begins with "You can data find files", I assume it should be "You can find data files".	Adel Siddiquei	Oct 15, 2024
Other Digital Version	Chapter 3, Section 3.1 6th Subtopic	In the provided example, the description states that strings with a length of 2 or less should be filtered out. However, the code filters out strings where the length is greater than 2 (if len(x) > 2). This is inconsistent with the intended explanation. Correction: To correct this, either the description should state that strings with a length greater than 2 are included, or the code should be modified to reflect the original intention of filtering out strings with a length of 2 or less. Here’s the corrected code if the description is to remain unchanged: [x.upper() for x in strings if len(x) <= 2] This will ensure that only strings with a length of 2 or less are included and converted to uppercase, aligning with the description.	Syed Mohammad Hasan	Oct 22, 2024
ePub	Page 1 Preliminaries Installing Necessary Packages	Sorry, I don't have a massive tech background. Is there something different about python 3.12.2? Or are there permission issues that I need to get around? Latest is python 3.12.2. Got to the step where I'm running: "(base) $ conda config --set channel_priority strict" I get this in return: "Error while loading conda entry point: conda-libmamba-solver" Reason: "/miniconda3/lib/libarchive.19.dylib' (no such file)"	Anonymous	Aug 21, 2024
Other Digital Version	1.4 Installation and Setup Installing Necessary Packages	On Windows, substitute a carat ^ for the line continuation \ used on Linux and macOS. "carat" should be "caret", right?	Anonymous	May 15, 2024
ePub	Page 3.1, List Discussion of "Extend"	Please disregard the errata I just submitted. I missed that the example was a list of lists. The text makes perfect sense.	Steven O. Ellis	Jul 07, 2024
O'Reilly learning platform	Page 4 NumPy Basics: Arrays and Vectorized Computation Data Types for ndarrays	In [45]: numeric_strings = np.array(["1.25", "-9.6", "42"], dtype=np.string_) `np.string_` was removed in the NumPy 2.0 release. Use `np.bytes_` instead.	Dmitry	Aug 27, 2024
Other Digital Version	4.2 Pseudorandom Number Generation Table 4.3: NumPy random number generator methods	duplicate `uniform` function listed in the table	Ben To	Mar 09, 2024
O'Reilly learning platform	Page 4.2 Pseudorandom Number Generation Table 4.3: NumPy random number generator methods	In Table 4.3, “uniform” distribution is repeated in the third and last row.	Gao Lu	Oct 01, 2024
Other Digital Version	4.4 Array-Oriented Programming with Arrays first code listing	In [169]: points = np.arange(-5, 5, 0.01) # 100 equally spaced points But this results in "1000" points.	Ben To	Mar 11, 2024
Other Digital Version	4.6 Linear Algebra 4th code example	The qr method in the import statement, is never used. from numpy.linalg import inv, qr	Doug Richardson	Aug 15, 2024
ePub	Page 5 Indexing, Selection and Filtering Using Code Examples	In the following sentence should 'columns' be changed to 'rows'. When I test this, it prints 2 rows and all the columns. The row selection syntax data[:2] is provided as a convenience. Passing a single element or a list to the [] operator selects columns.	Steven Mooney	Feb 21, 2024
ePub	Page 7.1.1 Filtering Out Missing Data 6th Paragragh and [38]	"Suppose you want to keep only rows containing at most a certain number of missing observations. You can indicate this with the thresh argument:" The thresh argument to numpy.Dataframe.dropna() does not govern how many NA values are allowed. Instead it requires that many non-NA values to be present.	Anonymous	May 07, 2024
Other Digital Version	9 Plotting and Visualization Figure 9.27: Tipping percentage by day split by time/smoker	The code to generate figure 9.27 does not match the generated figure, as the generated figure has a hue to the bars (indicating the day) which is missing from: In [113]: sns.catplot(x="day", y="tip_pct", row="time", .....: col="smoker", .....: kind="bar", data=tips[tips.tip_pct < 1]) This can be corrected with: In [113]: sns.catplot(x="day", y="tip_pct", row="time", .....: col="smoker", hue="day", .....: kind="bar", data=tips[tips.tip_pct < 1])	Doug Richardson	Aug 19, 2024
Other Digital Version	9 Plotting and Visualization Figure 9.28: Box plot of tipping percentage by day	Figure 9.28 box plots have hues in the image, but the code to generate them does not match. In [114]: sns.catplot(x="tip_pct", y="day", kind="box", .....: data=tips[tips.tip_pct < 0.5]) should be In [114]: sns.catplot(x="tip_pct", y="day", kind="box", hue="day", .....: data=tips[tips.tip_pct < 0.5]) To match figure 9.28.	Doug Richardson	Aug 19, 2024
O'Reilly learning platform	Page 10.2 6th code box, In [72]	The code example is "grouped_pct.agg([("average", "mean"), ("stdev", np.std)])". There is a FutureWarning to use "grouped_pct.agg([("average", "mean"), ("stdev", "std")]) instead.	Jochen Schüttler	Apr 09, 2024
Other Digital Version	13.3 US Baby Names In[116] China edition page415	According to the up code block: def~~ In[116]: names Out[116]: table maybe wrong. It should be name sex births year prop year sex 1880 F 0 Mary F 7065 1880 0.077643 1 Anna F 2604 1880 0.028618 2 Emma F 2003 1880 0.022013 3 Elizabeth F 1939 1880 0.021309 4 Minnie F 1746 1880 0.019188 ... ... ... ... ... ... ... ... 2010 M 1690779 Zymaire M 5 2010 0.000003 1690780 Zyonne M 5 2010 0.000003 1690781 Zyquarius M 5 2010 0.000003 1690782 Zyran M 5 2010 0.000003 1690783 Zzyzx M 5 2010 0.000003	Zhang yingtan	Mar 19, 2024
PDF	Page 135 4 & 6	"If a DataFrame’s index and columns have their name attributes set, these will also be displayed:" Next sentence says: "Unlike Series, DataFrame does not have a name attribute." One sentence (par. 4) refers to df as having their name attributes "set", while in the next sentence it specifies the df's "does NOT have a name attribute" This creates confusion.	Emile Jacques Bosman	May 01, 2024
Printed, ePub	Page 147 3rd paragraph	The second sentence in the following text has the word "role" rather than "row: The result of selecting a single row is a Series with an index that contains the DataFrame's column labels. To select multiple roles, creating a new DataFrame, pass a sequence of labels:	Anonymous	Jul 31, 2024
Printed, ePub	Page 159 1st paragraph	The paragraph starts with "Here the function f, which…". Since the example function is named "f1", the paragraph should start with "Here the function f1, which…"	Anonymous	Jul 31, 2024
Printed, ePub	Page 166 3rd paragraph	"When an entire row or column contains all NA values, the sum is 0, whereas if any value is not NA, then the result is NA. " This sentence should be: "When an entire row or column contains all NA values, the sum is 0, whereas if any value is not NA, then the result includes the value(s) not NA." df one two a 1.40 NaN b 7.10 -4.5 c NaN NaN d 0.75 -1.3 df.sum(axis="columns") a 1.40 b 2.60 c 0.00 d -0.55 dtype: float64 df.sum(axis="columns", skipna=False) a NaN b 2.60 c NaN d -0.55 dtype: float64	Anonymous	Jul 31, 2024
Printed	Page 169 In[285]	In[283] and In[285] look exactly the same even though line above says that you could include more concise syntax.	Jude Cancellieri	Mar 09, 2024
Printed, ePub	Page 210, Section 7.2 2nd paragraph	The sentence is: Relatedly, drop_duplicates returns a DataFrame with rows where the duplicated array is False filtered out: The sentence should be: Relatedly, drop_duplicates returns a DataFrame with rows where the duplicated array is True filtered out:	Anonymous	Aug 26, 2024
Printed, ePub	Page 273 last paragraph, following subtitle 'Pivoting "long" to "Wide" Format'	In the sentence "In this format, individual values are represented by a single row in a table rather than multiple values per row.", the text starting with "by" should be: "by a single column in a table rather than multiple values (i.e. columns) per row."	Anonymous	Aug 29, 2024