Errata

Errata for High Performance Python, Second Edition

Submit your own errata for this product.

The errata list is a list of errors and their corrections that were found after the product was released. If the error was corrected in a later version or reprint the date of the correction will be displayed in the column titled "Date Corrected".

The following errata were submitted by our customers and approved as valid errors by the author or editor.

Color key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update

Location	Description	Submitted By	Date submitted	Date corrected
Page Chapter 2 py-spy command	It seems that there should be a top command between py-spy and --pid. I tested with the most recent version. Note from the Author or Editor: A single word change is needed. At the top of page 55 in console font I see "$ sudo env "PATH=$PATH" py-spy --pid 15953" and it should have "top" added to read: "$ sudo env "PATH=$PATH" py-spy top --pid 15953"	chen yuanyuan	Jul 12, 2020	Jan 27, 2023
Page Chapter 11 Regarding the Wikipedia texts	I find Error: FileNotFoundError: [Errno 2] No such file or directory: 'all_unique_words_wikipedia_via_gensim.txt' When trying to load text_example_clean_list_wikipedia_gensim.py and text_example.py I don't find 'all_unique_words_wikipedia_via_gensim.txt'. I don't know where to download it, and its not in the supplemental files. Can you clarify if we need to create it ourselves, or we just simply are missing the file? Note from the Author or Editor: This code currently isn't in the supplemental github repository. I will try to address this and add it in the near future!	Tye Lokka	Feb 09, 2022	Jan 27, 2023
Page Page 18, Chapter 1 Last sentence in the 10th and last paragraph on the page.	Sentence reads: Both are sensible solutions and are significantly better using the operating system's global Python environment! Does this sentence mean "... better [when] using ..." or does it mean "... better [than] using ..."? Note from the Author or Editor: Please add THAN in the indicated place: "Both are sensible solutions and are significantly better THAN using the operating system's global Python environment!"	Jackson Smith	Jun 12, 2023
I chapter 1	"less than two cores" -> "fewer than two cores"	Zachary Kneupper	Jan 06, 2020	Apr 30, 2020
I I	In chapter 1, the text describing the check_prime(number) function refers to a variable called "number_float", but the "number_float" variable does not appear in the function definition.	Zachary Kneupper	Jan 08, 2020	Apr 30, 2020
I Chapter 1, "So Why Use Python?"	"Continuum’s Anaconda, a scientifically focused environment" should be edited, since Continuum Analytics has changed their name to Anaconda, Inc. See: https://www.anaconda.com/continuum-analytics-officially-becomes-anaconda/	Zachary Kneupper	Jan 09, 2020	Apr 30, 2020
Page 33 paragraph below time command run	"If you try time --verbose quick-and dirty get an error ..." seems to be missing "and" before "get" Note from the Author or Editor: The text currently reads "...If you try `time --verbose` quick-and-direct get an error..." and it should read "...If you try `time --verbose` and you get an error...".	Gregory Sherman	Apr 30, 2021	Jan 27, 2023
Page 37-39 cProfile command and results	$ python -m cProfile -o profile.stats julia1.py The resulting information obtained in IPython is from profiling a different program - julia1_nopil.py (Both are present in the downloaded code for Chapter 2) Also, there is a draw_output argument to calc_pure_python() in julia1_nopil.py, but it is not used in the function. Note from the Author or Editor: Page 37 the code line "$ python -m cProfile -o profile.stats julia1.py" should now read "$ python -m cProfile -o profile.stats julia1_nopil.py" I have modified the julia1_nopil.py code in the public github repository to remove the redundant keyword, so the rest of the printed code samples will work as expected.	Gregory Sherman	Apr 30, 2021	Jan 27, 2023
Page 74 Example 3-5	M = (N >> 3) + (3 if N < 9 else 6) N 0 1-4 5-8 ... M 0 4 8 ... ================================================= I believe that both the equation and N to M table are incorrect. The equation and M values resemble what I found at bit.ly/3xIEJ2L The equation in the text would produce nonsensical results in some cases, such as M=3 if N=4. Likewise, the table sometimes has M equal to N, when it is clearly stated on pg. 73 that "M > N" Note from the Author or Editor: Good eye! The equation should read: M = N + (N >> 3) + (3 if N < 9 else 6) The way it is now, M is simply the amount of over-allocation, not the total re-allocated size. You can see this equation in action here: https://github.com/python/cpython/blob/3.7/Objects/listobject.c#L59	Gregory Sherman	May 02, 2021	Jan 27, 2023
Page 75 End of page and the beginning of the next page	The example in the page appends i * 2. It should be i * i. Therefore, the result on the next page is contradict against the conclusion on the next page. Note from the Author or Editor: Good catch. For consistency, Example 3-7 should use "i * 2" everywhere instead of "ii". This choice is arbitrary and it could have been the other way, however if we let i get very big ii has a chance to overflow!	Fu Chen	May 18, 2022	Jan 27, 2023
Page 77 sentence above Example 3-8	"... instantiating a list can be 5.1x slower than instantiating a tuple ..." In the example, 95 ns / 12.5 ns = 7.6 Also worth noting is that on my Windows 10 PC, the factor varies widely between tests from about 2.7 to 3.9 Note from the Author or Editor: Good eye! Indeed the text should say "7.6x slower". The exact number, however, should be taken with a grain of salt given that we are doing micro-profiling. It does, however, give the general sense of what to expect for larger lists/tuples.	Gregory Sherman	May 02, 2021	Jan 27, 2023
Page 97 Bottom of page	Misleading statement: when you explain where generators come into play and how they avoid the creation of all elements at once, the reader might take away that range is a generator, which is not the case (see e.g. question 13092267 on stackoverflow, since I can't post URLs here). You might want to talk about laziness instead. Note from the Author or Editor: This is a subtlety that we will address in the next edition, we suggest no changes to this edition as the truth is close enough (for most people) and a longer explanation is needed for fine clarification	Anthony Labarre	Jun 29, 2023
Page 102 code at top of page	The first Fibonacci number - according the modern definition - is zero, so "yield j" should be replaced with "yield i" Note from the Author or Editor: Good eye! Indeed, "yield j" should be "yield i" in the code snipped spanning page 101-102.	Gregory Sherman	May 04, 2021	Jan 27, 2023
Page 102 code in middle of page	'... answer the question "How many Fibonacci numbers below 5,000 are odd?" in multiple ways:' fibonacci_naive() generates 1 as the first Fibonacci number and does not strictly test for "below", so should be: while i < 5000: if i % 2: fibonacci_transform() also has the problem with "below", so should be: if f >= 5000: as does fibonacci_succinct(); the fix is: first_5000 = takewhile(lambda x: x < 5000, fibonacci()) Note from the Author or Editor: Yes, your changes do indeed fix the various off-by-one mistakes in this snippet.	Gregory Sherman	May 04, 2021	Jan 27, 2023
Page 104 Example 5-2, definition of function `read_fake_data`	Given the description of Example 5-4, it seems like the function `read_fake_data` in Example 5-2 should contain a "!=" rather than a "==" in its definition. Specifically, in Example 5-4 the anomaly detection is set up to find values that don't fit a normal distribution, but as written `read_fake_data` primarily produces data with the constant value 100, which does not fit a normal distribution. I think the function definition should read: def read_fake_data(filename): for timestamp in count(): # We insert an anomalous data point approximately once a week if randint(0, 7606024 - 1) != 1: value = normalvariate(0, 1) else: value = 100 yield datetime.fromtimestamp(timestamp), value Note from the Author or Editor:* Good catch! Indeed, the "==" should be a "!=" because we want the anomalous value of 100 to be infrequent. In fact, to make this more readable I think a better if statement would be: if randint(0, 7 * 60 * 60 * 24 - 1) == 1: value = 100 else: value = normalvariate(0, 1)	Anonymous	Oct 01, 2020	Jan 27, 2023
Page 106 Middle	"continue retrieving anomalous data This is called" Missing period.	Anonymous	Aug 28, 2020	Jan 27, 2023
Page 106 Lazy Generator Evaluation	Please update the example as shown in your github example here: github.com/mynameisfiber/high_performance_python_2e/blob/master/05_iterators/lazy_data_analysis.py Because: - data should be created using read_fake_data not read_data Thank you Note from the Author or Editor: Good eye. Example 5-5 should read: data = read_fake_data("fake_filename") instead of: data = read_data(filename)	Ali	Jan 24, 2021	Jan 27, 2023
Page 131 3rd paragraph	"We will continue on the track of removing necessary functionality in favor of performance ..." I believe it means to say "unnecessary" instead of "necessary," especially given the larger context. Thanks! :) Note from the Author or Editor: Agreed, please replace "We will continue on the track of removing necessary functionality..." with "We will continue on the track of removing unnecessary functionality..."	Alex Dvorak	Dec 07, 2022	Jan 27, 2023
Page 227 main function located with "5" circle	for this function to work the correction should be: result = asyncio.run(run_func()) current function is: result = asyncio.run(run_func) this returns the error: "run_func is not a coroutine" Note from the Author or Editor: This is 100% correct. Good catch!	Harry Ritchie	Jan 15, 2021	Jan 27, 2023
Page 254 1st paragraph	We create a list containing nbr_estimates divided by the number of workers. nbr_estimates -> nbr_samples_in_total ? Note from the Author or Editor: The text currently reads "We create a list containing `nbr_estimates` divided by the number of workers." and it should read: "We create a list containing `nbr_samples_in_total` divided by the number of workers."	Evan Lai	Aug 26, 2020	Jan 27, 2023
Page 265 Figure 9-8 and text below	"Using processes ... A second CPU doubles the speed, and using four CPUs quadruples the speed." ============================================== This conflicts with the graph above, which shows times of about 2.5 for 1 worker, 1.6 for 2, and 0.9 for 4. Note from the Author or Editor: The text currently reads "Using processes gives us a predictable speedup, just as it did in the pure Python example. A second CPU doubles the speed, and using four CPUs quadruples the speed." and should be replaced with: "Using processes gives us a predictable speedup, just as it did in the pure Python example. A second CPU nearly doubles the speed, and when using four CPUs the speed is nearly quadrupled. We rarely achieve a pure doubling or quadrupling of execution speeds due to other overheads."	Gregory Sherman	Aug 02, 2021	Jan 27, 2023