Errata

Errata for Python Polars: The Definitive Guide

Submit your own errata for this product.

The errata list is a list of errors and their corrections that were found after the product was released. If the error was corrected in a later version or reprint the date of the correction will be displayed in the column titled "Date Corrected".

The following errata were submitted by our customers and approved as valid errors by the author or editor.

Color key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update

Location	Description	Submitted By	Date submitted
Page p.158 l. -2	'optimal' should be 'optional' Note from the Author or Editor: Correct, should be 'optional'	HIDEMOTO NAKADA	Aug 23, 2025
Page p.343 in table 15-4	The description of the `include_keys` parameter seems to be wrong. The documentation says: Include the columns used to partition the DataFrame in the output. Note from the Author or Editor: This method was updated after the book was published. 3 descriptions need to be updated: maintain_order Ensure that the order of the groups is consistent with the input data. This is slower than a default partition by operation. include_key Include the columns used to partition the DataFrame in the output. as_dict Return a dictionary instead of a list. The dictionary keys are tuples of the distinct group values that identify each group.	HIDEMOTO NAKADA	Aug 23, 2025
Page p.421 l.8	"This means that in order to match data across two languages, you first have to deserialize the data from one format, then serialize it to the format of the other language." I guess we need to serialize first, and then deserialize it. Note from the Author or Editor: Correct, it should be: "When transferring data between two programming languages, you first serialize the data in the source language into a common format, then deserialize it in the target language to reconstruct the data in memory."	HIDEMOTO NAKADA	Sep 02, 2025
Page p.421 l. -5	a computer with multiple computation cores can process the same instruction on multiple data points at the same time. SIMD instructions do not require multiple cores. 'a computer with SIMD capable processing units' might be better. Note from the Author or Editor: Correct. Instead of: "By lining up your data in memory, or vectorizing it, a computer with multiple computation cores can process the same instruction on multiple data points at the same time." it should be: "By laying out data contiguously in memory, or vectorizing it, the processor’s SIMD-capable units can apply the same instruction to multiple data points simultaneously, greatly improving performance."	HIDEMOTO NAKADA	Sep 02, 2025
Page p.11 Part IV	> Chapter 15 shows how to reshape data, through (un)pivoting, stacking, and extending. stacking and extending are not explained in chapter 15, but in chapter 14. Note from the Author or Editor: "Chapter 14 explains how to combine different DataFrames using joins and concatenations. Chapter 15 shows how to reshape data, through (un)pivoting, stacking, and extending." Should be: "Chapter 14 explains how to combine different DataFrames using joins and concatenations. Chapter 15 shows how to reshape data, through (un)pivoting, transposing, and exploding."	HIDEMOTO NAKADA	Sep 09, 2025
Page p.325 Takeaways, first item.	> Combine DataFrames with exact matches in their join columns using df.join(). You can fine-tune the join with the tolerance and by arguments and by selecting the appropriate strategy. The second sentence is explaining join_asof, not join. Note from the Author or Editor: "Combine DataFrames with exact matches in their join columns using df.join(). You can fine-tune the join with the tolerance and by arguments and by selecting the appropriate strategy. • Combine numerical or temporal columns in DataFrames on their nearest values using df.join_asof()." Should be: "Combine DataFrames with exact matches in their join columns using df.join(). • Combine numerical or temporal columns in DataFrames on their nearest values using df.join_asof(). You can fine-tune the join with the `tolerance` and `by` arguments and by selecting the appropriate strategy."	HIDEMOTO NAKADA	Sep 09, 2025
Page p.448 Table A-1	The table includes Dask, DuckDB, and PySpark, while they are not mentioned in this chapter. I think it would be better to remove them from the table. Note from the Author or Editor: Agreed. While they were relevant for the general benchmarking we did, we only compare GPU packages in the appendix. The table should only feature: - cuDF - pandas - Polars - Polars on GPU	HIDEMOTO NAKADA	Sep 09, 2025
Page Chapter 11, p240: Slicing second bullet point	The example implies regular python convention of `df.slice(start, stop)`, but the second argument is actually the length of the slice. This is particularly important to remember when using `pl.col().list.slice()` as the error can easily go unnoticed. Note from the Author or Editor: Correct. Should be: "For example, keep from the third to the seventh row with df.slice(2, 4). Here the first argument is the starting index, and the second argument is the length of the slice."	Ben Hardcastle	Sep 09, 2025
Page p.76 table 4-1	> Int8 8-bit signed Integer type. -128 to 128 the range should be -128 to 127. Note from the Author or Editor: Correct, the range is indeed -128 to 127 (inclusive)	HIDEMOTO NAKADA	Sep 14, 2025
Page p. 335 Table 15-2	Arguments `id_vars` and `value_vars` should be `index` and `on` respectively. I think the code outside the table is correct. I guess these were changes in the API when writing the book. I think the Lazy Pivot box on p. 334 is out of date. Note from the Author or Editor: The API has indeed changed since the book was published.	Ian Gow	Mar 13, 2026
Page 13 First code block, below second paragraph	The end of the URL for the citibike data on S3 should read: 202403-citibike-tripdata.zip The print and notebook versions contain a .csv suffix: 202403-citibike-tripdata.csv.zip Note from the Author or Editor: Good find! This has been fixed in the repo that comes with the book, and we'll fix the print in the next edition	Andrew Campbell	Jul 06, 2025
Page 134 2nd key note	The 2nd note states " method is one the many" but should state "method is one of the many" Note from the Author or Editor: Good find, it should be "The `Expr.str.ends_with()` method is one of the many String methods..."	Thomas Hefferman	Aug 18, 2025
Page 330 Table 15-1	Argument `columns` should be `on`. Note from the Author or Editor: API has changed since the book was published.	Ian Gow	Mar 13, 2026

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Errata

Errata for Python Polars: The Definitive Guide