Errata

High Performance Python

Errata for High Performance Python

Submit your own errata for this product.

The errata list is a list of errors and their corrections that were found after the product was released. If the error was corrected in a later version or reprint the date of the correction will be displayed in the column titled "Date Corrected".

The following errata were submitted by our customers and approved as valid errors by the author or editor.

Color key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update

Version Location Description Submitted By Date submitted Date corrected
Page Chapter 2
py-spy command

It seems that there should be a top command between py-spy and --pid. I tested with the most recent version.

Note from the Author or Editor:
A single word change is needed. At the top of page 55 in console font I see
"$ sudo env "PATH=$PATH" py-spy --pid 15953"
and it should have "top" added to read:
"$ sudo env "PATH=$PATH" py-spy top --pid 15953"

chen yuanyuan  Jul 12, 2020  Jan 27, 2023
Page Chapter 11
Regarding the Wikipedia texts

I find Error: 

FileNotFoundError: [Errno 2] No such file or directory: 'all_unique_words_wikipedia_via_gensim.txt' When trying to load text_example_clean_list_wikipedia_gensim.py and text_example.py

I don't  find  'all_unique_words_wikipedia_via_gensim.txt'.

I don't know where to download it, and its not in the supplemental files. Can you clarify if we need to create it ourselves, or we just simply are missing the file?

Note from the Author or Editor:
This code currently isn't in the supplemental github repository. I will try to address this and add it in the near future!

Tye Lokka  Feb 09, 2022  Jan 27, 2023
Page Page 18, Chapter 1
Last sentence in the 10th and last paragraph on the page.

Sentence reads: Both are sensible solutions and are significantly better using the operating system's global Python environment!

Does this sentence mean "... better [when] using ..." or does it mean "... better [than] using ..."?

Note from the Author or Editor:
Please add THAN in the indicated place:
"Both are sensible solutions and are significantly better THAN using the operating system's global Python environment!"

Jackson Smith  Jun 12, 2023 
I
chapter 1

"less than two cores" -> "fewer than two cores"

Zachary Kneupper  Jan 06, 2020  Apr 30, 2020
I
I

In chapter 1, the text describing the check_prime(number) function refers to a variable called "number_float", but the "number_float" variable does not appear in the function definition.

Zachary Kneupper  Jan 08, 2020  Apr 30, 2020
I
Chapter 1, "So Why Use Python?"

"Continuum’s Anaconda, a scientifically focused environment" should be edited, since Continuum Analytics has changed their name to Anaconda, Inc.

See:
https://www.anaconda.com/continuum-analytics-officially-becomes-anaconda/

Zachary Kneupper  Jan 09, 2020  Apr 30, 2020
Page 33
paragraph below time command run

"If you try time --verbose quick-and dirty get an error ..."
seems to be missing "and" before "get"

Note from the Author or Editor:
The text currently reads "...If you try `time --verbose` quick-and-direct get an error..." and it should read "...If you try `time --verbose` and you get an error...".

Gregory Sherman  Apr 30, 2021  Jan 27, 2023
Page 37-39
cProfile command and results

$ python -m cProfile -o profile.stats julia1.py

The resulting information obtained in IPython is from profiling a different program - julia1_nopil.py (Both are present in the downloaded code for Chapter 2)

Also, there is a draw_output argument to calc_pure_python() in julia1_nopil.py, but it is not used in the function.

Note from the Author or Editor:
Page 37 the code line "$ python -m cProfile -o profile.stats julia1.py" should now read
"$ python -m cProfile -o profile.stats julia1_nopil.py"

I have modified the julia1_nopil.py code in the public github repository to remove the redundant keyword, so the rest of the printed code samples will work as expected.

Gregory Sherman  Apr 30, 2021  Jan 27, 2023
Page 74
Example 3-5


M = (N >> 3) + (3 if N < 9 else 6)

N 0 1-4 5-8 ...
M 0 4 8 ...
=================================================
I believe that both the equation and N to M table are incorrect.
The equation and M values resemble what I found at bit.ly/3xIEJ2L

The equation in the text would produce nonsensical results in some cases, such as M=3 if N=4.
Likewise, the table sometimes has M equal to N, when it is clearly stated on pg. 73 that "M > N"


Note from the Author or Editor:
Good eye! The equation should read:

M = N + (N >> 3) + (3 if N < 9 else 6)

The way it is now, M is simply the amount of over-allocation, not the total re-allocated size.

You can see this equation in action here:

https://github.com/python/cpython/blob/3.7/Objects/listobject.c#L59

Gregory Sherman  May 02, 2021  Jan 27, 2023
Page 75
End of page and the beginning of the next page

The example in the page appends i * 2. It should be i * i. Therefore, the result on the next page is contradict against the conclusion on the next page.

Note from the Author or Editor:
Good catch. For consistency, Example 3-7 should use "i * 2" everywhere instead of "i*i". This choice is arbitrary and it could have been the other way, however if we let i get very big i*i has a chance to overflow!

Fu Chen  May 18, 2022  Jan 27, 2023
Page 77
sentence above Example 3-8

"... instantiating a list can be 5.1x slower than instantiating a tuple ..."

In the example, 95 ns / 12.5 ns = 7.6

Also worth noting is that on my Windows 10 PC, the factor varies widely between tests
from about 2.7 to 3.9

Note from the Author or Editor:
Good eye! Indeed the text should say "7.6x slower".

The exact number, however, should be taken with a grain of salt given that we are doing micro-profiling. It does, however, give the general sense of what to expect for larger lists/tuples.

Gregory Sherman  May 02, 2021  Jan 27, 2023
Page 97
Bottom of page

Misleading statement: when you explain where generators come into play and how they avoid the creation of all elements at once, the reader might take away that range is a generator, which is not the case (see e.g. question 13092267 on stackoverflow, since I can't post URLs here). You might want to talk about laziness instead.

Note from the Author or Editor:
This is a subtlety that we will address in the next edition, we suggest no changes to this edition as the truth is close enough (for most people) and a longer explanation is needed for fine clarification

Anthony Labarre  Jun 29, 2023 
Page 102
code at top of page

The first Fibonacci number - according the modern definition - is zero,
so "yield j" should be replaced with "yield i"

Note from the Author or Editor:
Good eye! Indeed, "yield j" should be "yield i" in the code snipped spanning page 101-102.

Gregory Sherman  May 04, 2021  Jan 27, 2023
Page 102
code in middle of page

'... answer the question "How many Fibonacci numbers below 5,000 are odd?" in multiple ways:'


fibonacci_naive() generates 1 as the first Fibonacci number and does not strictly test for "below", so should be:
while i < 5000:
if i % 2:

fibonacci_transform() also has the problem with "below", so should be:
if f >= 5000:

as does fibonacci_succinct(); the fix is:
first_5000 = takewhile(lambda x: x < 5000, fibonacci())


Note from the Author or Editor:
Yes, your changes do indeed fix the various off-by-one mistakes in this snippet.

Gregory Sherman  May 04, 2021  Jan 27, 2023
Page 104
Example 5-2, definition of function `read_fake_data`

Given the description of Example 5-4, it seems like the function `read_fake_data` in Example 5-2 should contain a "!=" rather than a "==" in its definition. Specifically, in Example 5-4 the anomaly detection is set up to find values that don't fit a normal distribution, but as written `read_fake_data` primarily produces data with the constant value 100, which does not fit a normal distribution.

I think the function definition should read:

def read_fake_data(filename):
for timestamp in count():
# We insert an anomalous data point approximately once a week
if randint(0, 7*60*60*24 - 1) != 1:
value = normalvariate(0, 1)
else:
value = 100
yield datetime.fromtimestamp(timestamp), value

Note from the Author or Editor:
Good catch! Indeed, the "==" should be a "!=" because we want the anomalous value of 100 to be infrequent. In fact, to make this more readable I think a better if statement would be:

if randint(0, 7 * 60 * 60 * 24 - 1) == 1:
value = 100
else:
value = normalvariate(0, 1)

Anonymous  Oct 01, 2020  Jan 27, 2023
Page 106
Middle

"continue retrieving anomalous data This is called"

Missing period.

Anonymous  Aug 28, 2020  Jan 27, 2023
Page 106
Lazy Generator Evaluation

Please update the example as shown in your github example here: github.com/mynameisfiber/high_performance_python_2e/blob/master/05_iterators/lazy_data_analysis.py

Because:

- data should be created using read_fake_data not read_data

Thank you

Note from the Author or Editor:
Good eye. Example 5-5 should read:

data = read_fake_data("fake_filename")

instead of:

data = read_data(filename)

Ali  Jan 24, 2021  Jan 27, 2023
Page 131
3rd paragraph

"We will continue on the track of removing necessary functionality in favor of performance ..."
I believe it means to say "unnecessary" instead of "necessary," especially given the larger context. Thanks! :)

Note from the Author or Editor:
Agreed, please replace "We will continue on the track of removing necessary functionality..." with "We will continue on the track of removing unnecessary functionality..."

Alex Dvorak  Dec 07, 2022  Jan 27, 2023
Page 227
main function located with "5" circle

for this function to work the correction should be:

result = asyncio.run(run_func())


current function is:

result = asyncio.run(run_func)


this returns the error: "run_func is not a coroutine"

Note from the Author or Editor:
This is 100% correct. Good catch!

Harry Ritchie  Jan 15, 2021  Jan 27, 2023
Page 254
1st paragraph

We create a list containing nbr_estimates divided by the number of workers.

nbr_estimates -> nbr_samples_in_total ?

Note from the Author or Editor:
The text currently reads
"We create a list containing `nbr_estimates` divided by the number of workers."
and it should read:
"We create a list containing `nbr_samples_in_total` divided by the number of workers."

Evan Lai  Aug 26, 2020  Jan 27, 2023
Page 265
Figure 9-8 and text below

"Using processes ... A second CPU doubles the speed, and using four CPUs quadruples the speed."

==============================================

This conflicts with the graph above, which shows times of about 2.5 for 1 worker, 1.6 for 2, and 0.9 for 4.

Note from the Author or Editor:
The text currently reads "Using processes gives us a predictable speedup, just as it did in the pure Python example. A second CPU doubles the speed, and using four CPUs quadruples the speed." and should be replaced with:
"Using processes gives us a predictable speedup, just as it did in the pure Python example. A second CPU nearly doubles the speed, and when using four CPUs the speed is nearly quadrupled. We rarely achieve a pure doubling or quadrupling of execution speeds due to other overheads."

Gregory Sherman  Aug 02, 2021  Jan 27, 2023