The errata list is a list of errors and their corrections that were found after the product was released. If the error was corrected in a later version or reprint the date of the correction will be displayed in the column titled "Date Corrected".
The following errata were submitted by our customers and approved as valid errors by the author or editor.
Version |
Location |
Description |
Submitted By |
Date submitted |
Date corrected |
Printed |
Page vi
United States |
The technical editor Hugh Brown is listed as Hugh White.
Not sure of the page number.
Note from the Author or Editor: Yes, many apologies. His name is Hugh Brown (and he was a great editor!)
|
Hugh Brown |
Nov 05, 2012 |
May 17, 2013 |
|
Page New 3/E textbook > Chapter 2 > Variables and argument passing
3rd paragraph |
wesmckinney.com/book/python-basics.html#semantics_references
New 3/E textbook > Chapter 2 > Variables and argument passing
This:
"In some languages, the assignment if b will cause the data..."
should probably be:
"In some languages, the assignment of b will cause the data..."
Changed 'if' to 'of'.
Note from the Author or Editor: will fix
|
Aaditya Bugga |
Apr 09, 2022 |
|
|
n/a |
In the open access version, when seaborn histplots are plotted, the kde=true argument seems to be missing: wesmckinney.com/book/plotting-and-visualization.html#fig-vis_series_kde
Note from the Author or Editor: will fix
|
Hamed |
Apr 22, 2022 |
|
|
Page chapter 9
first paragraph |
In section Ticks, Labels and Legends. ax.xlim() is no longer working, they changed to ax.set_xlim()
Note from the Author or Editor: will fix
|
Levy |
May 26, 2022 |
|
|
Page NA
4.4 Array-Oriented Programming with Arrays |
wesmckinney.com/book/numpy-basics.html
Current: In [169]: points = np.arange(-5, 5, 0.01) # 100 equally spaced points
Proposed fix: In [169]: points = np.arange(-5, 5, 0.01) # 1000 equally spaced points
Note from the Author or Editor: will fix
|
Matt Dahlman |
Jun 07, 2022 |
|
|
Page https://wesmckinney.com/book/preliminaries.html
section 1.4 Installation and Setup |
Under section 1.4 Installation and Setup, you have the following subheadings:
Miniconda on Windows
GNU/Linux
Miniconda on macOS
I think that middle one should be "Miniconda on GNU/Linux" for consistency with the other two.
Note from the Author or Editor: will fix
|
Graeme Richardson |
Jun 07, 2022 |
|
|
Page https://wesmckinney.com/book/pandas-basics.html#pandas_summarize
section 5.3 Summarizing and Computing Descriptive Statistics |
This statement seems to be incorrect or at least unclear: "When an entire row or column contains all NA values, the sum is 0, whereas if any value is not NA then the result is NA." As seen in the examples, if any value is not NA, the result is a sum.
Note from the Author or Editor: Confirmed, I am fixing the language to be correct
|
Graeme Richardson |
Jun 23, 2022 |
|
|
Page https://wesmckinney.com/book/data-cleaning.html#prep_dummy_vars
section on Computing Indicator/Dummy Variables |
You have the following note: "For much larger data, this method of constructing indicator variables with multiple membership is not especially speedy. It would be better to write a lower-level function that writes directly to a NumPy array, and then wrap the result in a DataFrame." What is meant by lower-level? A custom C function or a Python function that is more efficient in some way? Given that pandas is presumably written in C, it's surprising that any type of Python function could be faster than pandas. I think this note could be clarified for the reader by saying "not especially speedy because...". For example, is it because using pandas in this way will do too many memory allocations, too many data copies, etc.
Note from the Author or Editor: i'm removing this note
|
Graeme Richardson |
Jun 28, 2022 |
|
|
Page Merging on Index
3rd block of code in the section |
There is a format error and below
pd.DataFrame({"event1": pd.Series([0, 2, 4, 6, 8, 10], dtype="Int64",
everything is displayed in red color within the block of code. This makes the reading confusing
Note from the Author or Editor: will reformat
|
Enrique M. Muro |
Jun 30, 2022 |
|
|
Page Section 5.2
Selection on DataFrame with loc and iloc |
It reads “To select multiple roles“ and it should read “To select multiple rows”.
Note from the Author or Editor: will fix
|
Andres Medaglia |
Jul 01, 2022 |
|
|
Page Section 8.3 Reshaping and Pivoting
Pivoting “Long” to “Wide” Format |
It reads: "Now, ldata looks like:"
I believe it should read: "Now, long_data looks like:"
Note from the Author or Editor: will fix
|
Andres Medaglia |
Jul 07, 2022 |
|
|
Page Selection on DataFrame with loc and iloc
Right before Table 5.4 3rd edition online |
Note: Resubmitted for clarity
"Boolean arrays can used with loc but not iloc:"
As per the documentation, this is not 100% accurate. A boolean ndarray may be passed for iloc:
Given:
mydict = [{'a': 1, 'b': 2, 'c': 3, 'd': 4},
{'a': 100, 'b': 200, 'c': 300, 'd': 400},
{'a': 1000, 'b': 2000, 'c': 3000, 'd': 4000 }]
df = pd.DataFrame(mydict)
df
a b c d
0 1 2 3 4
1 100 200 300 400
2 1000 2000 3000 4000
The following produces valid output--------------------------------
df.iloc[:, df.columns.isin(['a','b'])]
However, the following does not------------------------------------
data.iloc[data['c']>=300]
Note from the Author or Editor: I am clarifying that I mean for selecting rows
|
Mauricio Ruiz |
Jul 14, 2022 |
|
|
Page Python Language Basics > Scalar Types > Strings
https://wesmckinney.com/book/python-basics.html#scalar_strings |
A mention of an additional line break is missing in the following text:
"It may surprise you that this string c actually contains four lines of text; the line breaks after """ and after lines are included in the string. We can count the new line characters with the count method on c:"
There are 3 line breaks that should be mentioned:
1) after """
2) after `that`
3) after `lines`
Line breaks 1 and 3 are correctly mentioned, but 2 was omitted and should be included.
|
Anonymous |
Jul 19, 2022 |
|
|
Page Ch 4: Fancy Indexing
The last paragraph before the 5th code chunk |
Many users (myself included) may have expected fancy indexing to return a rectangular sub-matrix.
Here is one way to get that:
Note from the Author or Editor: will reword
|
Nicholas Vence |
Jul 22, 2022 |
|
|
Page Tab Completion
Paragraph 2 |
Digital version of the book at wesmckinney.com/book/ipython.html. Word "also" repeated twice: "Also, you can also complete methods and attributes on any object after typing a period:"
Note from the Author or Editor: will fix
|
Semyon Bokhankevich |
Jul 25, 2022 |
|
|
Page Data Types for ndarrays
General Note |
Digital version of the book at wesmckinney.com/book/numpy-basics.html. In the general note, the wording goes as "A signed integer can represent both positive and negative integers, while an unsigned integer can only represent nonzero integers." Given the code example provided within the note, I think "nonnegative" was meant.
An attempt to pass a sequence with a negative number while specifying unsigned integer data type yields a peculiar result:
In [35]: np.array([-1, 0, 1], dtype="u1")
Out[35]: array([255, 0, 1], dtype=uint8)
Thank You for elaborating on this distinction!
Note from the Author or Editor: will fix
|
Semyon Bokhankevich |
Jul 27, 2022 |
|
|
Page Chapter 2 - dates and times
https://wesmckinney.com/book/python-basics.html#scalar_dates |
Question marks not replaced with actual reference:
"See ??? for a full list of format specifications."
Note from the Author or Editor: will fix, this is only on the HTML version
|
Anonymous |
Jul 29, 2022 |
|
|
Page Filling In Missing Data
3rd paragraph |
The ??? is not replaced with the actual reference link.
'The same interpolation methods available for reindexing (see ???) can be used with fillna'
Note from the Author or Editor: will fix
|
Junwei Fang |
Aug 11, 2022 |
|
|
Page Acknowledgements for the Third Edition
1st two lines |
Text reads: "It has more than a decade since I started writing the first edition of this book and more than 15 years since I originally started my journey as a Python prorammer."
Programmer is missing the 'g' and 'It has more...' should probably read 'It has BEEN more...'
Note from the Author or Editor: will fix
|
Laure Robinson |
Aug 29, 2022 |
|
|
Page https://wesmckinney.com/book/python-builtin.html#comprehensions
https://wesmckinney.com/book/python-builtin.html |
"we could filter out strings with length 2 or less and convert them to uppercase like this:"
does not tie to the code following thereafter:
[x.upper() for x in strings if len(x) > 2]
['BAT', 'CAR', 'DOVE', 'PYTHON']
Change to e.g. "filter out strings with length more than 2 and convert them..."
Note from the Author or Editor: will fix
|
Thomas Pfeiffer |
Sep 08, 2022 |
|
|
Page https://wesmckinney.com/book/python-builtin.html
https://wesmckinney.com/book/images/pda3_0301.png |
pda3_0301.png is apparently missing.
Note from the Author or Editor: This has been fixed
|
brian piercy |
Sep 17, 2022 |
|
|
Page Chapter 2, strings
Above output #65 |
Change "Afer this operation, the variable"
to
"After this operation, the variable"
wesmckinney.com/book/python-basics.html
Note from the Author or Editor: will fix
|
Will Beasley |
Sep 24, 2022 |
|
|
Page Table 2.1: Binary operators
Table 2.1: Binary operators |
a <= b should be in the inline code, or
`a < b, a <= b ` .
Note from the Author or Editor: will fix
|
Alen Softić |
Jan 11, 2023 |
|
|
Page Section 7.3, page 227, Table 7-3
Top |
StringDtype is missing from the table. Not sure if the table is meant to be exhaustive, but this is an important type that should be included in this table.
Note from the Author or Editor: I will add it in a subsequent printing
|
Kerrick Staley |
Jun 06, 2023 |
|
Mobi |
Page 1
On Kindle: "Location 325 of 13301" |
Sorry, don't know the proper page number (I'm on a kindle), so I entered 1.
In Chapter 1, under the numpy description, one of the bullet points has a minor grammatical error. It reads"
"Tools for integrating connecting C, C + +, and Fortran code to Python"
I assume "integrating connecting" was not intended as is.
Note from the Author or Editor: on page 4 of the print text / PDF change "integrating connecting C, C++, ..." to "integrating C, C++, ..."
|
Anonymous |
Oct 24, 2012 |
May 17, 2013 |
|
Page 1.4 Installation and Setup
Installing Necessary Packages |
In the part about setting up the enviroment, should change the packeage from jupyter to jupyterlab to avoid some package/dependencies conflits
change "(pydata-book) $ conda install -y pandas jupyter matplotlib" to
"(pydata-book) $ conda install -y pandas jupyterlab matplotlib"
Note from the Author or Editor: agreed, fixing
|
Luiz Henrique |
Dec 06, 2022 |
|
|
Page 2 Python Language Basics, IPython, and Jupyter Notebooks
1st paragraph |
"Now in 2022, there is now[...]" --> One "now" should be removed
Note from the Author or Editor: will fix
|
Anonymous |
Nov 30, 2022 |
|
Printed, PDF, ePub |
Page 6-8
Installation and Setup |
Dear Sirs:
I have just purchased Wes McKinney�s Python for Data Analysis.
I am trying to install Python as instructed on pages 6-8 of the book, but I am running into problems.
It appears that the Python package that comes with EDPFree and the Pandas library are both essential for me to use the book.
When I try to install Pandas on top of EDPFree (which is now Canopy Express), I get the error message:
�Python version 2.7 required, which was not found in the registry.�
I am running Windows 7 (32-bit).
The author recommends uninstalling the previous version of Python and then installing EPDFree, which has been changed to Enthought Canopy.
After I do that, Python does not appear in Add or Remove Programs anymore, but Enthought Canopy does.
The Canopy interface works, and it can run a simple script. It says that � contrary to the error message � I do have version 2.7 of Python installed.
The author recommends installing pandas-0.9.0.win32-py2.7.exe. Only version 11 is now available, so I downloaded that.
When I googled the error message, I got a suggestion to add C:\Python27; and C:\Python27\Scripts; to my system path, but that did not help.
Google also gave me a suggestion to uninstall Python (which means Canopy in this case) for all users and re-install for just me.
This also did not help.
As things now stand, I do not think I will be able to make any use of the book.
Is there a forum or an author�s page that addresses this problem?
Thank you,
John Chesnut
Note from the Author or Editor: Since publishing the book Enthought have changed their Python distribution so that the directions are now incompatible.
If you run into this problem please install the free Anaconda distribution for your platform (which includes pandas) from here:
http://continuum.io/downloads
|
Anonymous |
May 28, 2013 |
Dec 12, 2014 |
|
Page 7.3 Extension Data Types
Table 7.3: pandas extension data types |
I've found the error on the html, web version one. The table is linked to wrong table. So when I click on the link button, It gives me a wrong link back.
Note from the Author or Editor: fixing in html version
|
Anonymous |
Nov 05, 2022 |
|
PDF |
Page 9
2nd paragraph |
In the OS X installation it states that we should type "gcc" at the terminal command line to see if gcc is installed. I'm running Mavericks and it is not installed. I believe it's been depreciated by Apple. Is there a workaround for this issue?
Thanks
Note from the Author or Editor: Yes, Mavericks now uses clang instead of gcc. Editors, could you add a parenthesis that states "(or clang on newer versions of OS X)"
|
scottclausen@mac.com |
Oct 23, 2013 |
Dec 12, 2014 |
|
Page 9 Plotting and Visualization
Python for Data Analysis, 3E (pre release) |
I use "jupyter notebook" and "jupyterlab". The examples in "Figures and Subplots" cannot be reproduced with jupyterlab exactly as in the book.
Suggestion
At the beginning of the book it should be pointed out that the examples could be reproduced with "jupyter notebook", minor adjustments would be necessary with jupyterlab.
Best regards, Robert
Note from the Author or Editor: I'm adding some language for JupyterLab users
|
Robert Moser |
Aug 10, 2022 |
|
|
Page 9.2 Plotting with pandas and seabord code for figure
Code for figure 9-24 |
The code attempts to set the title of the chart with:
In [109]: ax.title("Changes in log(m1) versus log(unemp)")
This raises an exception:
TypeError: 'Text' object is not callable
Perhaps it should be:
ax.set(title="Changes in log(m1) versus log(unemp)")
Note from the Author or Editor: will fix
|
Mark Meyer |
Jul 13, 2022 |
|
|
Page 13 Data Analysis Examples
"Donation Statistics by Occupation and Employer" Section |
In the web version of book:
In the below text the "f" in the map method should be "get_emp"
def get_emp(x):
# If no mapping provided, return x
return emp_mapping.get(x, x)
fec["contbr_employer"] = fec["contbr_employer"].map(f)
Note from the Author or Editor: will fix
|
Anonymous |
Jan 05, 2023 |
|
PDF |
Page 18
India |
the following command [json.loads(line) for line in open(path)] produces the following error:
--------------------------------------------------------------
UnicodeDecodeError Traceback (most recent call last)
<ipython-input-83-b1e0b494454a> in <module>()
----> 1 records = [json.loads(line) for line in open(path)]
C:\Users\Mrinal\AppData\Local\Enthought\Canopy32\App\appdata\canopy-1.4.1.1975.win-x86\lib\json\__init__.pyc in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
336 parse_int is None and parse_float is None and
337 parse_constant is None and object_pairs_hook is None and not kw):
--> 338 return _default_decoder.decode(s)
339 if cls is None:
340 cls = JSONDecoder
C:\Users\Mrinal\AppData\Local\Enthought\Canopy32\App\appdata\canopy-1.4.1.1975.win-x86\lib\json\decoder.pyc in decode(self, s, _w)
363
364 """
--> 365 obj, end = self.raw_decode(s, idx=_w(s, 0).end())
366 end = _w(s, end).end()
367 if end != len(s):
C:\Users\Mrinal\AppData\Local\Enthought\Canopy32\App\appdata\canopy-1.4.1.1975.win-x86\lib\json\decoder.pyc in raw_decode(self, s, idx)
379 """
380 try:
--> 381 obj, end = self.scan_once(s, idx)
382 except StopIteration:
383 raise ValueError("No JSON object could be decoded")
UnicodeDecodeError: 'utf8' codec can't decode byte 0x92 in position 6: invalid start byte
Please help and explain the reason for the error
Note from the Author or Editor: Editors, can you please change "open(path)" to "open(path, 'rb')" ? this will fix this issue for readers using Python 3
|
Mrinal |
Jul 05, 2014 |
Dec 12, 2014 |
PDF |
Page 23
|
For the code example following:
In [301]: tz_counts[:10].plot(kind='barh', rot=0)
The 'plot' function has no visible effect. Should be in iPython? (which also doesn't work.)
Note from the Author or Editor: There should be a note at the beginning of the chapter to run IPython in pylab mode.
Editors: please place a note at the end of the opening paragraph that says:
"To follow along with these examples, you should run IPython in Pylab mode by running <literal>ipython --pylab</literal> at the command prompt."
|
Brian Piercy |
Dec 04, 2012 |
May 17, 2013 |
Printed, PDF |
Page 23
middle of page |
In the PDF version, the url overshoots the page
Note from the Author or Editor: Editors please insert a line break like so in the console output
Out[304]: u'Mozilla/5.0 (Linux; U; Android 2.2.2; en-us; LG-P925/V10e Build/FRG83G) AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 Mobile Safari/533.1'
|
Anonymous |
Apr 18, 2013 |
May 17, 2013 |
Printed |
Page 23
1st code sample |
The 2nd and 3rd use of pd.read_table should use the ratings.dat and movies.dat file and not users.dat
Note from the Author or Editor: Thanks. This has been fixed
|
Richard White |
Mar 30, 2015 |
|
Printed |
Page 23
first code block after 2nd paragraph |
In the users.dat file downloaded from https://grouplens.org/datasets/movielens/1m/ the data for 'gender' is before 'user_id' e.g.
1::F::1::10::48067
2::M::56::16::70072
therefor unames should not be defined as :
unames = [ 'user_id', 'gender', 'age', 'occupation', 'zip']
instead should be :
unames = [ 'gender', 'user_id', 'age', 'occupation', 'zip']
Note from the Author or Editor: will review
|
Edward Hope |
Jan 18, 2018 |
|
Printed, PDF, ePub, Mobi, , Other Digital Version |
Page 24
two fifths down the page |
Found same problem as CJ:
66
In the following line:
operating_system = np.where(cframe['a'].str.contains('Windows'), 'Windows', 'Not Windows')
np was not defined, so this line gives an error
99
Question: Why don't any of these known errata get confirmed/addressed by the author or staff at O'Reilly?
Note from the Author or Editor: On page 21 please change the code line In [290]: about halfway down the page from
In [290]: import pandas as pd
to
In [290]: import pandas as pd; import numpy as np
This mistake is fairly minor (all things considered) as these code examples are intended to be run in IPython in "pylab" mode (ipython --pylab) which will have imported NumPy and created the np alias. Sorry about that
|
Moritz Heukamp |
May 11, 2013 |
May 17, 2013 |
PDF |
Page 29
2nd paragraph |
totals should be titles:
"This produced another DataFrame containing mean ratings with movie totals as row
labels and gender as column labels. "
should read
"This produced another DataFrame containing mean ratings with movie titles as row
labels and gender as column labels. "
Note from the Author or Editor: Good catch. Editors, please make the indicated change. Thanks
|
vrajmohan |
Sep 26, 2013 |
Dec 12, 2014 |
Printed |
Page 33
middle |
I get a ValueError: array dimensions must agree except for d_0 when I run line 371:
names1880.groupby('sex').births.sum().
names1880.groupby('sex')['births'].sum() works.
Note from the Author or Editor: We have addressed this (I believe) in a review of the code examples. Will follow up with editors to verify that it is fixed
|
Allen Long |
Nov 03, 2013 |
Dec 12, 2014 |
PDF |
Page 38
Code on bottom of page 38 and top of page 39 |
searchsorted() is a method available for NumPy arrays, not Pandas Series. So to get the code in the book to work, I needed to first convert the Series to a NumPy array with array().
In final code, the get_quantile_count() function is as follows:
# Get number of distinct names in the top 50% of births using clever NumPy hack
def get_quantile_count(group, q=0.5):
group = group.sort_index(by='prop', ascending=False)
return array(group.prop.cumsum()).searchsorted(q) + 1
Note from the Author or Editor: Ah, this is a casualty of some API changes in pandas:
Editors, could you change the indicated line to be instead:
group.prop.cumsum().values.searchsorted(q) + 1
|
Todd Leonhardt |
Sep 14, 2013 |
Dec 12, 2014 |
Printed |
Page 38
United States |
After defining the array prop_cumsum you want to call the method searchsorted to search for the 50th percentile. The code supplied is prop_cumsum.searchsorted(0.5), which throws the error Series object has no Attribute searchsorted
I got this to work sort of: numpy.searchsorted(prop_cumsum,0.5), the only problem is the output is every line number in the array followed by the index position. Can you shed any light on the code as written in the text and the code I got to work?
Thanks
Note from the Author or Editor: This is caused by API changes in pandas. We have fixed the code example in an overall review of the examples, so this will be addressed in the next printing.
|
Anonymous |
Jun 25, 2014 |
Dec 12, 2014 |
PDF |
Page 40
in [3] |
While executing the code from the book:
In [3]: data = {i : randn() for i in range(7)}
I got the following error: NameError: global name 'randn' is not defined.
I solved it by using "from scipy import randn".
(Perhaps included packages depend on ipython configuration.)
Note from the Author or Editor: Page 46 in the printed text, please insert the line
In [541]: import numpy as np
right above the In [542]: ...
and make sure there is a blank line for consistent formatting
|
Anonymous |
Aug 15, 2012 |
May 17, 2013 |
PDF |
Page 43
United States |
filename m1-1m /users.dat should be movielens/users.dat
Note from the Author or Editor: Correct -- editors, could you make the indicated change (replace ml-1m with movielens)?
|
Anonymous |
Dec 07, 2013 |
Dec 12, 2014 |
ePub |
Page 46
printed text, |
Code from Safari:
In [541]: import numpy as np
In [542]: data = {i : randn() for i in range(7)}
This causes an error:
NameError: global name 'randn' is not defined
This works
data = {i : np.random.randn() for i in range(7)}
Appears there is a problem with the 'import numpy as np' being incomplete.
Note from the Author or Editor: Good catch, and I believe we tried to correct this error in the last revision.
Editors, could you replace the indicated randn with np.random.randn ? thanks
|
Anonymous |
Jun 24, 2013 |
Dec 12, 2014 |
PDF |
Page 52
top |
the two ways of computing top1000 give different results
Note from the Author or Editor: I have made a note to look into this since we have made a full review of the book's code examples. There might be a bug in pandas, in which case I will report upstream to the dev team
|
Anonymous |
Dec 07, 2013 |
Dec 12, 2014 |
PDF |
Page 53
Table 3-1 |
Commands are given as 'Ctrl-P', 'CTRL-A', etc. with the letter in UPPERCASE, which is potentially confusing, since the keys are to be pressed without the shift key (except 'Ctrl-Shift-v'). In fact, without the example containing a 'Shift', I would not be sure this is an error.
Note from the Author or Editor: A fair point.
Editors: Please change the single letters in the command shortcuts in Table 3-1 to lowercase. E.g.
Ctrl-Shift-V
should be
Ctrl-Shift-v
and Ctrl-B should be Ctrl-b
Thanks
|
Steven Pav |
Dec 27, 2012 |
May 17, 2013 |
Printed |
Page 54
2nd paragraph |
... designed to faciliate common tasks ...
Note from the Author or Editor: Please fix facilitate typo
|
Frans Koning |
Nov 22, 2012 |
May 17, 2013 |
PDF |
Page 54
Code example at bottom of page |
When I try to do 'a' in _ip.user_ns it throws a NameError exception and says "name '_ip' is not defined.
I can use the IPython magic %who to see if the variable is in memory or not.
Note from the Author or Editor: I should have known better than to use a private IPython API.
editors, could we remove this altogether:
In [8]: 'a' in _ip.user_ns
Out[8]: True
change the line number of the subsequent prompt to 8 (instead of 9)
then, remove the following lines:
In [1]: 'a' in _ip.user_ns
Out[1]: False
and add these lines in its place:
In [10]: a
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-10-60b725f10c9c> in <module>()
----> 1 a
NameError: name 'a' is not defined
thanks
|
Todd Leonhardt |
Sep 15, 2013 |
Dec 12, 2014 |
Printed |
Page 65
Paragraph 1 |
what is referred to as Table 3-3 in the text is actually displayed as Table 3-4
Note from the Author or Editor: Confirmed. Please fix reference to Table 3-4
|
Anonymous |
Apr 18, 2013 |
May 17, 2013 |
Printed |
Page 67
Last sentence of third paragraph |
Text reads "Here is a simple list of 700,000 strings ..." but the sample code produces 600,000 strings.
Note from the Author or Editor: Good catch. Editors, could you change the copy to say 600,000 instead of 700,000?
|
James Williamson |
May 26, 2013 |
Dec 12, 2014 |
Printed |
Page 69
Paragraph 4, last sentence |
'while' should be 'whole'
Note from the Author or Editor: Confirmed, thanks
|
Anonymous |
Apr 18, 2013 |
May 17, 2013 |
|
Page 72
clean_strings(states) |
On page 72, after applying clean_strings(states), "South Carolina" has unwanted space in between, I do believe this is a print error. Sorry if a false flag, just trying to help.
Note from the Author or Editor: will improve the example
|
Mauricio Ruiz |
Jun 23, 2022 |
|
Printed |
Page 75
paragraph 2, sentence 2 |
'willl' should be 'will'
Note from the Author or Editor: Confirmed. thanks
|
Anonymous |
Apr 18, 2013 |
May 17, 2013 |
Printed |
Page 77
Top bullet points |
The third bullet point in the sample configuration changes is unnecessary: it repeats the first clause of the second bullet point.
Note from the Author or Editor: good catch. Editors, could you remove the 3rd bullet point?
|
jworeilly |
May 26, 2013 |
Dec 12, 2014 |
Printed |
Page 83
Last line in table 4-2 on this page |
"float64, float128" should read "float64" only. "float128" already correctly appears on the next line in the table (on page 84).
Note from the Author or Editor: Correct. Please delete the ", float128" there
|
Dan Grossman |
Jan 25, 2013 |
May 17, 2013 |
Printed |
Page 86
Final paragraph, first sentence. |
"... especially if they have used ..."
should read
"... especially if you have used ..."
Note from the Author or Editor: Thanks, please correct typo as described
|
Dan Grossman |
Jan 25, 2013 |
May 17, 2013 |
PDF |
Page 89
In [84]: |
As randn is a function in the numpy.random module, the line should read:
data = np.random.randn(7, 4)
Note from the Author or Editor: yes: editors, please make the indicated change
|
vrajmohan |
Sep 17, 2013 |
Dec 12, 2014 |
Printed |
Page 90
paragraph 1, sentence 2 |
par 1, sentence 2 is a fragment
Note from the Author or Editor: Change the first two sentences of that paragraph to
Suppose each name corresponds to a row in the <literal>data</literal> array, and we wanted to select all the rows with corresponding name <literal>'Bob'</literal>.
|
Anonymous |
Apr 18, 2013 |
May 17, 2013 |
Printed |
Page 95
In [123]: and In [124]: |
As in "In [84]:" on page 89, `randn()' should read `np.random.randn()' ...
Note from the Author or Editor: Editors: can you please make the indicated change?
Replace
randn()
with
np.random.randn()
|
Kazuyoshi Furutaka |
Jun 11, 2014 |
Dec 12, 2014 |
Printed, PDF |
Page 99
Second to last paragraph |
"scalers" should be "scalars"
|
Wes McKinney |
May 13, 2013 |
May 17, 2013 |
Printed, PDF |
Page 100
United States |
1 * cond1 + 2 * cond2 + 3 * -(cond1 | cond2)
is not equivalent to the two other code examples offered. In particular, if cond1 and cond2 are both False, the result is 0, not 3.
Note from the Author or Editor: Oops.
Please change that line of code to
1 * (cond1 & -cond2) + 2 * (cond2 & -cond1) + 3 * -(cond1 | cond2)
|
Aaron Schumacher |
Apr 07, 2013 |
May 17, 2013 |
Printed |
Page 103
3rd paragraph |
In 1st release of 2nd edition print copy, section on numpy fancy indexing (page 103, 3rd paragraph) says, “...the result of fancy indexing is always one-dimensional.” However, there are example outputs in this section with more than one dimension. Is that because some of the examples in the section are not fancy indexing? If that’s the case, it’s unclear where the section is building up to a fancy indexing example as opposed to every example being fancy indexing. The number of dimensions in the output seems to be number of array dimensions minus number of index dimensions, unless index can also have more dimensions than the array.
Note from the Author or Editor: will clarify
|
Stephen Frost |
Feb 08, 2018 |
|
Printed, PDF |
Page 106
Table 4-7 |
For pinv description remove the word "square" (this function does not require that the matrices be square)
|
Wes McKinney |
May 13, 2013 |
May 17, 2013 |
Printed, PDF |
Page 106
Table 4-7 |
In description of lstsq, replace "y = Xb" with the more commonly used "Ax = b"
|
Wes McKinney |
May 13, 2013 |
May 17, 2013 |
|
Page 107
Table 4-8 |
Table 4-8: the description for binomial should read 'Draw samples
from a binomial distribution'
Note from the Author or Editor: Please fix as described. thanks!
|
Anonymous |
Apr 18, 2013 |
May 17, 2013 |
Printed, PDF, ePub, Mobi, , Other Digital Version |
Page 107
Middle of page |
Change "See table Table 4-8..." to "See Table 4-8..."
|
Wes McKinney |
May 12, 2013 |
May 17, 2013 |
PDF |
Page 113
In [17] |
In [17]: np.exp(obj2)
numpy needs to be imported before this code. There should be a line of code before this code:
import numpy as np
Note from the Author or Editor: Will add missing import
|
Dan Yuan |
Sep 13, 2017 |
|
|
Page 114
Bottom half |
The text (pdf page 114, book pages 134-135) illustrates the creation of a DataFrame from a dict. First, the dict creation is shown:
pop = {'Nevada': {2001: 2.4, 2002: 2.9},
'Ohio': {2000: 1.5, 2001: 1.7, 2002: 3.6}}
then, the data frame is created:
frame3 = pd.DataFrame(pop)
That's all fine thus far. However, the display of the DataFrame after creation isn't correct, in that the index order as shown isn't what occurs. That is, the book and pdf show:
Nevada Ohio
2000 NaN. 1.5
2001 2.4 1.7
2002 2.9 3.6
But in reality, the DataFrame is displayed thus, if one follows along with the text as shown:
Nevada Ohio
2001 2.4 1.7
2002 2.9 3.6
2000 NaN 1.5
In other words, the indices are displayed 2001, 2002, 2000, rather than 2000, 2001, 2002. This matters, because the examples that follow immediately (which involve transposing and also index slicing using that first DataFrame) then won't work as shown. The problem lies with the original dict creation. If the order of the "Nevada" and "Ohio" dicts are swapped, with Ohio being first, then the indices will appear in the desired order (i.e., 2000, 2001, 2002). (However, note that the columns in the resulting DataFrame will also then be swapped (with Ohio appearing as the first column, and Nevada second)).
The bottom line is that the whole set of examples doesn't work as shown, unfortunately, and there is a cascading effect - the first example is off, and thus so are the following examples based upon the first.
Note from the Author or Editor: this was fixed in the 3rd edition
|
Andrew Boudreau |
May 30, 2021 |
|
PDF |
Page 119
Table 5-5 |
The description of the copy option for reindex in table 5-5 of the current (as of 8/2/12) preprint version may be wrong. It says that copy is "Do not copy underlying data if new index is equivalent to old index."
I believe this is the opposite of copy's behavior, and the words "Do not" should be removed.
Note from the Author or Editor: Change text to
If True, always copy underlying data even if new index is equivalent to old index. Otherwise, do not copy the data when the indexes are equivalent.
|
Dan Becker |
Aug 02, 2012 |
May 17, 2013 |
PDF |
Page 123
Table 5-6, 2nd row |
"Selects single row of subset of rows from the DataFrame."
shoud probably be
"Selects single row or subset of rows from the DataFrame."
Note from the Author or Editor: Confirmed typo as described
|
Guan Yang |
Aug 16, 2012 |
May 17, 2013 |
Printed |
Page 124
table 5.5 |
Description for argument copy is self contradictory. Appears to say copy true means don't copy
Note from the Author or Editor: The text could be clearer. Editors, could you change "Otherwise" to read "If False" (use fixed width font for the False) in the table?
|
gwideman |
Jul 03, 2013 |
Dec 12, 2014 |
Printed |
Page 125
Last sentence |
last sentence: should read 'Here are some examples of this:'
Note from the Author or Editor: please fix as described. thanks!
|
Anonymous |
Apr 18, 2013 |
May 17, 2013 |
Printed |
Page 145
Bottom of page & continuing |
"...if you have an axis containing integers, data selection will always be label-oriented."
But earlier, on p. 141: "Slicing with labels...the endpoint is inclusive"
So why at bottom of p. 145 does ser[:1] not include the endpoint of the slice (only first row returned)? Shouldn't label-oriented slicing of this "axis containing integers" make ser[:1] return the same two rows as ser.loc[:1]? Shouldn't it be the case that only ser.iloc[:1] is not label-oriented, and therefore only it excludes the endpoint of the slice?
Note from the Author or Editor: will review
|
Stephen Frost |
Feb 08, 2018 |
|
Printed |
Page 150
Bottom of 2nd code block |
df1 - df2 should use '+' operator instead if adding lists. '-' operator still produces the same result.
|
Shivan Sivakumaran |
Oct 08, 2020 |
|
Printed, PDF |
Page 152
Final code block |
The line currently is:
frame = DataFrame(np.arange(6).reshape(3, 2)), index=[2, 0, 1])
It should instead be:
frame = DataFrame(np.arange(6).reshape(3, 2), index=[2, 0, 1])
Note from the Author or Editor: Confirmed. please change as described
|
Joshua Lande |
Mar 14, 2013 |
May 17, 2013 |
Printed |
Page 152
Second paragraph |
Duplicate colons introduce the second example code block.
Note from the Author or Editor: Please remove the unnecessary colon
|
jworeilly |
Jun 07, 2013 |
Dec 12, 2014 |
Printed |
Page 152
Middle |
For line [294] of the iget_value code example, the second ")" after the call to reshape(3, 2) is incorrect.
Note from the Author or Editor: I believe this is already fixed in the second printing
|
jworeilly |
Jun 07, 2013 |
Dec 12, 2014 |
Printed |
Page 153
bottom of page |
pdata.ix['Adj Close', '5/22/2012':, :] refers to Adj Close.
The table below that shows the Close, not the Adj Close.
Note from the Author or Editor: Very strange. Editors, can you please change the indicated line of code to:
pdata.ix['Adj Close', '5/22/2012':, :]
See also revised code examples for an alternative replacement.
|
Arie Ellerbrak |
Aug 01, 2013 |
Dec 12, 2014 |
PDF |
Page 160
United States |
keep_date_col description is inconsistent with the pandas documention. Should be:
If joining columns to parse date, keep the joined columns. Default False
Note from the Author or Editor: Confirmed. Please change as described
|
Thomas Maloney |
Jan 04, 2013 |
May 17, 2013 |
Printed |
Page 162
Middle op the page |
In order for data.to_csv(sys.stdout, sep='|') to work you must
import sys
first
Note from the Author or Editor: Editors, find this text on the page
(writing to sys.stdout so it just prints the text result)
change it to
(writing to sys.stdout so it just prints the text result; make sure to import sys)
use fixed width font for "import sys"
|
Arie Ellerbrak |
Aug 01, 2013 |
Dec 12, 2014 |
PDF |
Page 170
Middle |
The Output of perf = DataFrame(data) is not correct. As printed:
In [928]: perf
Out[928]:
Empty DataFrame
Columns: array([], dtype=int64)
Index: array([], dtype=int64)
But should be:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 648 entries, 0 to 647
Data columns:
AGENCY_NAME 648 non-null values
CATEGORY 648 non-null values
DESCRIPTION 648 non-null values
FREQUENCY 648 non-null values
INDICATOR_NAME 648 non-null values
INDICATOR_UNIT 648 non-null values
MONTHLY_ACTUAL 648 non-null values
MONTHLY_TARGET 648 non-null values
PERIOD_MONTH 648 non-null values
PERIOD_YEAR 648 non-null values
YTD_ACTUAL 648 non-null values
YTD_TARGET 648 non-null values
dtypes: int64(2), object(10)
Note from the Author or Editor: Confirmed. Please change the text of Out[928]: to
<class 'pandas.core.frame.DataFrame'>
Int64Index: 648 entries, 0 to 647
Data columns:
AGENCY_NAME 648 non-null values
CATEGORY 648 non-null values
DESCRIPTION 648 non-null values
FREQUENCY 648 non-null values
INDICATOR_NAME 648 non-null values
INDICATOR_UNIT 648 non-null values
MONTHLY_ACTUAL 648 non-null values
MONTHLY_TARGET 648 non-null values
PERIOD_MONTH 648 non-null values
PERIOD_YEAR 648 non-null values
YTD_ACTUAL 648 non-null values
YTD_TARGET 648 non-null values
dtypes: int64(2), object(10)
|
Thomas Maloney |
Jan 04, 2013 |
May 17, 2013 |
Printed |
Page 172
Last paragraph, 2nd sentence |
Interally -> Internally
Note from the Author or Editor: Confirmed typo
|
Arie Ellerbrak |
Aug 02, 2013 |
Dec 12, 2014 |
Printed |
Page 175
United States |
Current text "...pandas has a read_frame function in its pandas.io.sql module that simplifies the process."
Warnings when running code:
1. "read_frame is depreciated, use read_sql "
2. "Reading a table with read_sql is not supported"
"for a DBIAPI2 connection. Use a SQLAlchemy"
"engine or specify a SQL query"
This apparently changed with pandas release v0.14.0 (May 31 , 2014). Essentially the SQL function names change and the engine object replaces the connection object.
The SQL changes are documented in:
http://pandas.pydata.org/pandas-docs/stable/pandas.pdf
page 8 "SQL interfaces updated to use sqlalchemy, "
page 18 "The SQL reading and writing functions now support more database flavors through SQLAlchemy...
The new functions read_sql_query() and read_sql_table() are introduced. The function read_sql()
is kept as a convenience wrapper around the other two and will delegate to specific function depending on the provided
input (database table name or sql query).
In practice, you have to provide a SQLAlchemy engine to the sql functions. To connect with SQLAlchemy you use
the create_engine() function to create an engine object from database URI. You only need to create the engine
once per database you are connecting to. For an in-memory sqlite database:
In [43]: from sqlalchemy import create_engine
# Create your connection.
In [44]: engine = create_engine(�sqlite:///:memory:�)
This engine can then be used to write or read data to/from this database:
In [45]: df = pd.DataFrame({�A�: [1,2,3], �B�: [�a�, �b�, �c�]})
In [46]: df.to_sql(�db_table�, engine, index=False)
You can read data from a database by specifying the table name:
In [47]: pd.read_sql_table(�db_table�, engine)
Out[47]:
A B
0 1 a
1 2 b
2 3 c
or by specifying a sql query:
In [48]: pd.read_sql_query(�SELECT * FROM db_table�, engine)
Out[48]:
A B
0 1 a
1 2 b
2 3 c"
Note from the Author or Editor: We are fixing this in the code example review. Will be fixed in next printing
|
Jim Callahan |
Jul 31, 2014 |
Dec 12, 2014 |
Printed |
Page 175
top |
Due to change to SQLAlchemy the conn object is replaced by an engine object.
The line,
conn = sqlite3.connect(':memory:')
should be replaced by
To use a SQLite :memory: database, specify an empty URL:
engine = create_engine('sqlite://')
Notice that 'sqlite' is in lowercase and without a '3' suffix.
For a relative file path, this requires three slashes:
engine = create_engine('sqlite:///foo.db')
And for an absolute file path, four slashes are used:
engine = create_engine('sqlite:////absolute/path/to/foo.db')
source:
http://docs.sqlalchemy.org/en/rel_0_9/core/engines.html#sqlite
Note from the Author or Editor: Editors: We are addressing this in the code example review.
Reporter: This will be fixed in the next printing
|
Jim Callahan |
Jul 31, 2014 |
Dec 12, 2014 |
PDF, ePub, Mobi |
Page 192
Beginning of section Pivoting ?long? to ?wide? Format |
The section begins:
A common way to store multiple time series in databases and CSV is in so-called long or stacked format:
In [116]: ldata[:10]
However, the variable ldata has not been defined or initialized previously (or later) in the book.
Note from the Author or Editor: Yeah, I left the code to make that DataFrame out as it was derived in a mungy way from the macrodata used earlier.
Editors: please put a note in parentheses after "stacked format" that says
"... or stacked format (code to create this DataFrame omitted for brevity):" or something. pretty trivial for the user to type this in
|
David Kimery |
Apr 17, 2013 |
May 17, 2013 |
PDF, ePub |
Page 192
out 116 and out 118 |
In chapter 7, in the subsection entitled "Pivoting "long" to "wide" Format" . . .
On further examination -- the ldata output in out 116 is only for part of ldata, as in ldata[:10]. This omits five rows of data that should be in ldata based on the rest of the examples in this section:
10 1959-12-31 00:00:00 infl 0.270
11 1959-12-31 00:00:00 unemp 5.600
12 1960-03-31 00:00:00 realgdp 2847.699
13 1960-03-31 00:00:00 infl 2.310
14 1960-03-31 00:00:00 unemp 5.200
Note from the Author or Editor: I need to look into this, but I am going to try to add the code to generate the ldata table. I replied to your other question, but I didn't realize until further examination that the code was omitted. I made a note to myself and will address separately with the editors
|
Doug McCaleb |
Aug 15, 2013 |
Dec 12, 2014 |
Printed |
Page 192
Belgique |
A reader posted earlier the following comment:
"The section begins:
A common way to store multiple time series in databases and CSV is in so-called long or stacked format:
In [116]: ldata[:10]
However, the variable ldata has not been defined or initialized previously (or later) in the book. "
Perhaps would it be helpful to slightly alter the example to make it immediately testable by the audience of the book:
from pandas.core.reshape import melt, pivot
df = pd.read_csv('ch07/macrodata.csv') # original format
data = df.ix[:,['year', 'quarter', 'realgdp', 'infl', 'unemp']] # selection of variables
data['date'] = 10*data['year']+data['quarter'] # some quick identificator for the 'date' instead of separate year and quarter variables
del data['year']
del data['quarter']
ldata = melt(data, id_vars = ['date']) # long format
pivoted = ldata.pivot('date', 'variable', 'value'); pivoted.head()
# Note: 'item' becomes 'variable' in the rest of the example
Note from the Author or Editor: OK, sounds good.
Editors, could you remove this text: (code to create this DataFrame omitted for brevity)
then, after the first code example (ldata[:10]), could you put a code block with this code used to create the example:
data = pd.read_csv('ch07/macrodata.csv')
periods = pd.PeriodIndex(year=data.year, quarter=data.quarter, name='date')
data = DataFrame(data.to_records(),
columns=pd.Index(['realgdp', 'infl', 'unemp'], name='item'),
index=periods.to_timestamp('D', 'end'))
ldata = data.stack().reset_index().rename(columns={0: 'value'})
|
Patrick Jeuniaux |
Oct 14, 2013 |
Dec 12, 2014 |
PDF |
Page 194
3rd paragraph under "Removing Duplicates" |
"Relatedly, drop_duplicates returns a DataFrame where the duplicated array is True:"
The index values from `data.drop_duplicates()` suggest that drop_duplicates returns rows where the duplicated() array is False.
Note from the Author or Editor: Nice catch, will fix in the upcoming printing.
|
Chapman |
Nov 17, 2014 |
Dec 12, 2014 |
Printed, |
Page 194
3rd paragraph |
On the 3rd paragraph of "Removing Duplicates" sub-section: the drop_duplicates function returns where it is FALSE although the book says where it is TRUE.
"Relatedly, drop_duplicates returns a DataFrame where the duplicated array is True:
In [129]: data.drop_duplicates()"
So, the 'True' should be replaced by 'False'.
Thanks.
Simone.
Note from the Author or Editor: This has been fixed in the 2nd edition
|
Simone Occulate |
Dec 15, 2014 |
|
Printed |
Page 199
Top of page. |
The bins are divided into 18 to 25, 26 to 35, 35 to 60 and 60 and older.
Should be 18 to 26, 26 to 35, 35 to 60, 60 and older
or 18 to 25, 25 to 35, 35 to 60, 60 and older.
Note from the Author or Editor: editors, can you please change the copy to:
18 to 25, 26 to 35, 36 to 60, and finally 61 and older
|
Arie Ellerbrak |
Aug 02, 2013 |
Dec 12, 2014 |
PDF |
Page 203
Middle of the page |
Splitting the categories from the movie dataset can achieved by using:
movies.genres.str.get_dummies('|')
Note from the Author or Editor: Awesome. I'll use this feature in the next iteration of the book
|
Kristof |
Sep 05, 2015 |
|
PDF |
Page 204
somewhere |
ch07/movies.dat is not there (is in ch02/movielens)
Note from the Author or Editor: Thanks.
please change 'ch07/movies.dat' to 'ch02/movielens/movies.dat' in the code
|
Miki Tebeka |
Nov 09, 2012 |
May 17, 2013 |
Printed |
Page 217
Caption for Figure 7.1 |
Figure 7-1 displays values by *food* group, not by nutrient group (Zinc is the nutrient in the example). Its captions should hence read something along the lines of "Median Zinc values by food group".
Note from the Author or Editor: Confirmed the caption is wrong. Will fix
|
David Garcia Quintas |
Sep 07, 2015 |
|
|
Page 223
Table 8-1 |
Table 8-1: the description for 'subplot_kw' is cut off
Note from the Author or Editor: Please change the description for subplot_kw to
Dict of keywords passed to <literal>add_subplot</literal> call used to create each subplot.
|
Anonymous |
Apr 18, 2013 |
May 17, 2013 |
|
Page 235
paragraph1, sentence 1 |
par 1 sentence 1: should read '... is as simple as ...'
Note from the Author or Editor: Please fix typo as described. thanks!
|
Anonymous |
Apr 18, 2013 |
May 17, 2013 |
PDF |
Page 241
somewhere |
scatter_matrix(trans_data, diagonal='kde', color='k', alpha=0.3)
should be
pd.scatter_matrix(trans_data, diagonal='kde', color='k', alpha=0.3)
Note from the Author or Editor: Thanks. Please change code as described (add pd. to start of statement)
|
Miki Tebeka |
Nov 09, 2012 |
May 17, 2013 |
Printed, PDF, ePub, Mobi, , Other Digital Version |
Page 241-242
Fig 8-23 |
Fig 8-23 appears to be identical to Fig 8-22
Note from the Author or Editor: Not sure what happened here, 8-23 is supposed to be a different figure if you read the text closely.
Here is a figure to replace 8-23 (should just be a drop-in replacement), editors please contact me if you need any changes to this:
https://www.dropbox.com/s/annqtoank0snrwu/scatter_matrix_fix_20130512.pdf
|
Anonymous |
Apr 18, 2013 |
May 17, 2013 |
PDF |
Page 246
Example code |
The example code on the page 246 (Plotting Maps: Visualizing Haiti Earthquake Crisis Data) no longer works due to change of pandas since v0.13.0 released on 31 Dec 2013.
To make it work,
x, y = m(cat_data.LONGITUDE, cat_data.LATITUDE)
should be
x, y = m(cat_data.LONGITUDE.values, cat_data.LATITUDE.values)
You may find details on http://stackoverflow.com/questions/23136159
Apart from this, it will be also great if we add the following line at the end of the same example code to show the resulting plot.
plt.show()
Note from the Author or Editor: Editors: please verify that this has been fixed in the overall code example review.
|
Younghoon Rhiu |
Jun 21, 2014 |
Dec 12, 2014 |
Printed |
Page 266
Top half |
demeaned.groupby(key).mean() does not work for me; that is, it yields non-zero values (and not just due to rounding).
I think the issue is that the people DataFrame gets reorganized internally with rows in different order. This doesn't seem to affect the alignment of key within people. But it does affect demean, so the values of key no longer line up with their original position.
import pandas as pd
from pandas import DataFrame
import numpy as np
def demean(arr):
return arr - arr.mean()
# This doesn't work.
people = DataFrame(np.random.randn(5, 5),
columns=['a', 'b', 'c', 'd', 'e'],
index=['Joe', 'Steve', 'Wes', 'Jim', 'Travis'])
key = ['one', 'two', 'one', 'two', 'one']
demeaned = people.groupby(key).transform(demean)
print demeaned
print demeaned.groupby(key).mean()
produces
a b c d e
Jim 0.223861 -2.072542 0.973977 -0.021754 -1.019689
Joe 0.326119 0.671576 0.487932 -0.404353 1.219755
Steve -0.223861 2.072542 -0.973977 0.021754 1.019689
Travis 0.204880 -0.422467 -1.024938 -0.555061 -0.563228
Wes -0.530999 -0.249109 0.537006 0.959414 -0.656527
a b c d e
one -0.177000 -0.083036 0.179002 0.319805 -0.218842
two 0.265499 0.124555 -0.268503 -0.479707 0.328264
Note from the Author or Editor: This appears to be a bug in pandas unfortunately. I have reported it to the dev team here -- the appropriate action here is to fix the bug rather than changing the book text:
https://github.com/pydata/pandas/issues/8046
|
Ian Gow |
Jul 06, 2013 |
Dec 12, 2014 |
Printed |
Page 266
top half |
This is reference to an issue that Ian Gow has also pointed about above (Jul 06, 2013). A possible solution to the problem is mentioned below.
Define people as in the book. The values are a different since 'randn' gives different numbers.
>>> people
a b c d e
joe 2.011219 0.139871 -0.169945 1.801018 0.560313
steve -0.878164 0.121969 -0.174672 -1.500867 1.548067
wes -0.460175 -0.449552 1.213917 1.250151 0.191200
jim 2.286116 -1.253508 -0.567102 -0.802946 1.432807
travis -0.506323 0.807026 0.960450 -1.266392 0.567154
Define key as in the book:
>>> key
['one', 'two', 'one', 'two', 'one']
However, the error is that the following does not give zero mean:
demeaned = people.groupby(mapc,axis=0).transform(demean)
demeaned.groupby(mapc,axis=0).mean()
>>> demeaned = p.groupby(key).transform(demean)
>>> demeaned.groupby(key).mean()
a b c d e
one -0.269472 -0.205111 0.181926 0.218409 -0.082785
two 0.404208 0.307667 -0.272888 -0.327613 0.124178
A possible solution is to do the following. Define mapc as:
mapc = {'joe':'one', 'steve':'two', 'wes':'one', 'jim':'two', 'travis':'one'}
and now the the following produces zero mean:
>>> demeaned = p.groupby(mapc).transform(demean)
>>> demeaned.groupby(mapc).mean()
a b c d e
one 7.401487e-17 0 3.700743e-17 3.700743e-17 -4.625929e-17
two 0.000000e+00 0 -1.387779e-17 5.551115e-17 0.000000e+00
Note from the Author or Editor: We are working to address this in pandas:
https://github.com/pydata/pandas/issues/8046
|
Qasim Iqbal |
Oct 25, 2013 |
Dec 12, 2014 |
Printed, PDF, ePub |
Page 271
bottom |
This statement
from shapelib import ShapeFile
asks the shapelib library. I tried to install shapelib and pyshapelib (the binding), but it gave an error
shapelibc.so: undefined symbol: SASetupDefaultHooks
Judging from the fact that pyshapelib was last updated in 2007, we are wondering if it is still compatible with newer version of shapelib. Could you recommend another shapelib binding that will work with the examples of the book?
Note from the Author or Editor: We may need to remove this example; I know there are various issues with basemap as well. I've made a note and I will follow up with O'Reilly editors
|
Anonymous |
Sep 09, 2013 |
Dec 12, 2014 |
PDF |
Page 282
somewhere |
Should be return totals.order(ascending=False)[:n] (was [-n:])
Note from the Author or Editor: Correct. Please fix code typo as described (replace [-n:] with [:n])
|
Miki Tebeka |
Nov 09, 2012 |
May 17, 2013 |
Printed |
Page 287
1st line of example "In [108]" |
Update seaborn now requires kwargs "x=" and "y=" for first two arguments in example reference.
|
Dennis Gonzales |
Apr 23, 2021 |
|
Printed |
Page 308
middle of page |
Out[470] should be 'Period('2007-06', 'M')'
Note from the Author or Editor: Confirmed, please make change as described
There is also a formatting mistake right before "Out [470]:" , please fix that also
|
Anonymous |
Apr 18, 2013 |
May 17, 2013 |
Printed, PDF, ePub, Mobi, , Other Digital Version |
Page 324
bottom of page |
In[570]: spx_px is has not been defined in the chapter yet
Note from the Author or Editor: Please add code line just above In [570]:
In [569]: spx_px = close_px_all['SPX']
Make sure there is a blank line between that code line and the next one to keep the styling consistent
|
Anonymous |
Apr 18, 2013 |
May 17, 2013 |
Printed |
Page 324
First paragraph of Exponentially-weighted functions |
The formula for the moving average is written as
ma_t = a * ma_{t-1} + (a-1) * x_{-t}
with a the decay factor.
It should be:
ma_t = a * ma_{t-1} + (1-a) * x_{t}
Note from the Author or Editor: Good catch, please make this change
|
Bertrand Haut |
Mar 06, 2014 |
Dec 12, 2014 |
Printed |
Page 344
1st paragraph, body of the "to_index" function |
The given defintion of to_index:
def to_index(rets):
index = (1 + rets).cumprod()
first_loc = max(index.notnull().argmax() - 1, 0)
index.values[first_loc] = 1
return index
doesn't seem to work with Pandas 0.14.1, firstly due to "index.notnull().argmax() - 1", where index.notnull().argmax() is now a Timestamp without an offset, from which one can't substract an int. Morever, one can't compare it against an int, as part of the max() function.
The following version works:
def to_index(rets):
index = (1 + rets).cumprod()
first_loc = index.notnull().argmax()
index[first_loc] = 1
return index
Note from the Author or Editor: Good catch will fix in the upcoming printing.
|
David Garcia Quintas |
Oct 04, 2014 |
Dec 12, 2014 |
PDF |
Page 345
Signal Frontier Analysis section |
The example refers to a mean reverting strategy and not a momentum portfolio because we rank returns in descending order. E.g. the highest return gets the rank 1, which translates in a lower portfolio weight after demeaning and normalizing.
So either we change the text or, if we really want to provide an example of momentum portfolio we change the function calc_mon and use ascending=True, i.e.
ranks = mom_ret.rank(axis=1, ascending=True)
There is another small error in function strat_sr on page 346. Here when we compute the portfolio we use a lag value of 1, meaning that for portfolio at day t we use only information from day t-1 back. This is ok, however, when we then compute the total cumulative returns there is no need to again shift the portfolio by one day, as this implies that we just through away one day of information, so the line:
port = port.shift(1).resample(freq, how='first')
should be:
port = port.resample(freq, how='first')
Note from the Author or Editor: You're right about the momentum portfolio.
Editors, on page 345 can you replace the two usages of "momentum" with "mean reversion" and on Page 347, in the Figure 11-3 caption can you also make the same substitution.
The second note about the strat_sr function is not errata because the portfolio weights are the portfolio weights: they have to be shifted forward to compute the portfolio returns in the next period, so no changes needed there.
|
Anonymous |
Jul 01, 2014 |
Dec 12, 2014 |
Printed |
Page 351
Table 11-5. Resample method arguments |
The 'freq' argument seems wrong, when trying it explicitly, the following error message is returned:
In [22]: ts2 = ts.resample(freq='M')
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-22-ad32b3871b3e> in <module>()
----> 1 ts2 = ts.resample(freq='M')
TypeError: resample() got an unexpected keyword argument 'freq'
Indeed, in the docs (https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.resample.html) it is listed as 'rule' argument, which works:
In [25]: ts2 = ts.resample(rule='M').mean()
In [26]: ts2
Out[26]:
2000-01-31 -0.123505
2000-02-29 0.011267
2000-03-31 0.180698
2000-04-30 0.007794
Note from the Author or Editor: confirmed, will fix
|
Yonathan Mizrahi |
Oct 08, 2018 |
|
Printed |
Page 357
Second example "In [221]:" |
<ipython-input-216-793d385fe06a>:1: FutureWarning: 'loffset' in .resample() and in Grouper() is deprecated.
>>> df.resample(freq="3s", loffset="8H")
becomes:
>>> from pandas.tseries.frequencies import to_offset
>>> df = df.resample(freq="3s").mean()
>>> df.index = df.index.to_timestamp() + to_offset("8H")
ts.resample('5min', closed='right', label='right', loffset='-1s').sum()
Note from the Author or Editor: will fix
|
Dennis Gonzales |
May 16, 2021 |
|
Printed, PDF, ePub, Mobi, , Other Digital Version |
Page 358
In Figure 12-3 |
arr.reshape((3,4), order=?)
should read
arr.reshape((4,3), order=?)
Note from the Author or Editor: Correct, please fix figure text as described. Surprised this one evaded me but it's obvious once you see it =)
|
Dan Grossman |
Jan 25, 2013 |
May 17, 2013 |
Printed |
Page 363
Bottom of page |
In box, The Broadcasting Ru should be The Broadcasting Rule
|
Wes McKinney |
May 12, 2013 |
May 17, 2013 |
PDF |
Page 365
image |
Quote from page 364:
"See Figure 12-6 for another illustration, this time subtracting a two-dimensional array from a three-dimensional one across axis 0."
Figure 12-6 does not show subtraction nor numbers representing numpy data make any sense
Note from the Author or Editor: The figure and text needs fixing
The text: change "subtracting... from ..." to "adding...to..."
In the Figure 12-6, change the numbers in the result to be double what they are, so instead of 0, 1, 2, 3, 4, 5, 6, 7, make then in the corresponding order double that, 0, 2, 4, 6, ...
|
klo |
Oct 31, 2012 |
May 17, 2013 |
Printed, PDF |
Page 373
10th Line |
Error in code : In [77]: g = df.groupby('key').value
correction : g = df.groupby('key)
.value after a groupby method would lead to an error as ".value" is not any aggregation function. Given the context i think this should be just
g = df.groupby('key')
|
Bharath Reddy |
Mar 11, 2020 |
|
Printed |
Page 378
index for as_ordered |
as_ordered methdo, 378 --> as_ordered method
|
E G |
Feb 18, 2020 |
|
Printed |
Page 378
n/a |
missing index and examples for
merge_asof() function which has
existed for a while and seems useful
for financial time series.
That said, is there a specific reason it
has been omitted ? Or can one easily
implement it with some of the documented
functions,etc...?
Note from the Author or Editor: will document in 3rd edition
|
E G |
Feb 18, 2020 |
|
PDF |
Page 390
Next to paw prints at the top |
"Assignment is also referred to as binding, as we are binding a name to an object. Variables names that have been assigned may occasionally be referred to as bound variables."
At the beginning of the second sentence, I think either 'variables' should be singular or the word 'names' should be removed. :-)
Note from the Author or Editor: Editors: on Page 390, "Variables names" should be "Variable names"
|
Nick Carchedi |
Jun 05, 2014 |
Dec 12, 2014 |
Printed |
Page 400
middle of page |
The text currently says:
"When aggregating of otherwise grouping time series data, ..."
It probably should say
"When aggregating or otherwise grouping time series data"
Note from the Author or Editor: Please fix typo as described, thanks
|
Anonymous |
Apr 15, 2013 |
May 17, 2013 |
|
Page 403
example line 'In [85]' |
FutureWarning:
statsmodels.tsa.AR has been deprecated in favor of statsmodels.tsa.AutoReg and
statsmodels.tsa.SARIMAX.
AutoReg adds the ability to specify exogenous variables, include time trends,
and add seasonal dummies. The AutoReg API differs from AR since the model is
treated as immutable, and so the entire specification including the lag
length must be specified when creating the model. This change is too
substantial to incorporate into the existing AR api. The function
ar_select_order performs lag length selection for AutoReg models.
AutoReg only estimates parameters using conditional MLE (OLS). Use SARIMAX to
estimate ARX and related models using full MLE via the Kalman Filter.
Note from the Author or Editor: this is fixed in the 3rd edition
|
Dennis Gonzales |
May 30, 2021 |
|
Printed |
Page 405
first snippet in page |
The code snippet about the "xrange" function needs correction.
Replace "x" with "i" in the following example:
sum = 0
for i in xrange(10000):
# % is the modulo operator:
if x % 3 == 0 or x % 5 == 0:
sum += i
The right code should be:
sum = 0
for i in xrange(10000):
# % is the modulo operator:
if i % 3 == 0 or i % 5 == 0:
sum += i
Note from the Author or Editor: Good catch. Editors, please change "x" to "i" in the indicated code example as written by the errata reporter
|
Gaston |
Apr 15, 2014 |
Dec 12, 2014 |
PDF |
Page 413
3rd IPython display: In[432], Out[434] and Out[435] |
The example is correct but you may as well get the names correct too, seeing as the names are those of real people.
On the 2nd line of In[432]:
('Schilling', 'Curt') should be ('Curt', 'Schilling')
The output for Out[434] and Out[435] will then be corrected accordingly, to:
Out[434]: ('Nolan', 'Roger', 'Curt')
Out[435]: ('Ryan', 'Clemens', 'Schilling')
Note from the Author or Editor: I've fixed this in the book source materials
|
Anonymous |
Sep 07, 2015 |
|
PDF |
Page 416
defaultdict examples at the top of the page |
The two examples illustrating the usage of defaultdict, don't quite work as described in Python 3 (at least not in v. 3.4.3). For the first example, one cannot see the result in the same form as by the techniques on the previous page, by just typing by_letter; one must type dict(by_letter). Next, it is not clear what the example,
counts = defaultdict(lambda: 4)
is supposed to produce. Typing counts at the prompt (in IDLE), simply yields
defaultdict(<function <lambda> at 0x02DD3B70>, {})
while typing dict(counts), yields
{}
It is not clear how one could incorporate this construction into the previous example or for a new example, to see how 4 gets used.
Note from the Author or Editor: I will confirm that this behaves in the expected way in the 3rd ed
|
Anonymous |
Sep 08, 2015 |
|
Printed, PDF, ePub, Mobi, , Other Digital Version |
Page 418
last line |
IT IS:
loc_mapping = dict((val, idx) for idx, val in enumerate(strings)}
SHOULD BE:
loc_mapping = dict((val, idx) for idx, val in enumerate(strings))
NOTE: Last character of code line should be ) and not }... probably from wrong copy&paster of previous code line. It's obvious, but I checked this with IPython.
Note from the Author or Editor: Please fix typo as submitter described (replace curly brace with parenthesis)
Thanks!
|
Jose Manuel Martà|
May 09, 2013 |
May 17, 2013 |
Printed |
Page 419
entire example |
The example lacks a function to remove extra whitespace in string "south carolina##". Either the output should be altered at top and bottom of page 419 (i.e. "Out[15]" and "Out[22]") or a function should be added to normalize the whitespace between tokens. E.g. value = ' '.join(value.split())
Note from the Author or Editor: Will fix example and clarify text, since it only strips whitespace from the start and end of the tokens
|
Craig Murray |
Feb 15, 2016 |
|
PDF |
Page 420
Bottom third |
The main restriction on function arguments it that the keyword arguments must follow the positional arguments (if any).
'it' should be 'is'
Note from the Author or Editor: Editors: please change to "The main restriction on function arguments is that"
|
Nick Carchedi |
Jun 06, 2014 |
Dec 12, 2014 |
Printed, PDF, ePub, Mobi, , Other Digital Version |
Page 427
Last code example in section "Currying: Partial Argument Application" |
In the code comment:
# Take the 60-day moving average of of all time series in data
"of" is repeated.
Note from the Author or Editor: Please fix typo as described (remove duplicate "of")
|
Jose Manuel Martà|
May 09, 2013 |
May 17, 2013 |
Printed, PDF, ePub, Mobi, , Other Digital Version |
Page 432
Last line in Table A-6 |
IS:
True is the file is closed.
SHOULD BE:
True if the file is closed.
Note from the Author or Editor: Please make change as submitter described (replace is with if)
|
Jose Manuel Martà|
May 10, 2013 |
May 17, 2013 |
ePub |
Page 712
1st code example, list comprehension for enough_es within for loop |
In the first code example for the Nest list comprehensions section, the "if name.count('e') > 2" within the list comprehension should have a ">=" instead of a ">".
Note from the Author or Editor: You're right. Editors, could you please make the indicated change?
|
Todd Leonhardt |
Sep 14, 2013 |
Dec 12, 2014 |
ePub |
Page 727
Top of page, 1st code example |
For the output to work as intended in the example, the print statement within def squares() needs to be outside the for loop within that generator function.
The way the code is written, the 'Generating squares....' print will occur each time a new number is generated. But if you move the print outside the for, it will print exactly once.
Note from the Author or Editor: Good catch. Authors could you change the code cited to look like this (mind the 4-space indents):
def squares(n=10):
print 'Generating squares from 1 to %d' % (n ** 2)
for i in xrange(1, n + 1):
yield i ** 2
|
Todd Leonhardt |
Sep 14, 2013 |
Dec 12, 2014 |
Mobi |
Page 5385
Example code |
The location is based on the location information provided by my Kindle reader using the mobi format. I believe this would be page 229 in the physical edition.
In the example code in section "Annotation and Drawing on Subplot," the first element of each tuple in the crisis_data list is of type datetime.datetime. These elements are used as an argument to pandas.asof(). However, this method takes a DateTimeIndex as an argument. Therefore, this date value needs to be converted using pandas.to_datetime() before making the call to asof().
Note from the Author or Editor: will review for 3e
|
Patton Bradford |
Feb 13, 2016 |
|