Errata

Python Polars: The Definitive Guide

Errata for Python Polars: The Definitive Guide

Submit your own errata for this product.

The errata list is a list of errors and their corrections that were found after the product was released. If the error was corrected in a later version or reprint the date of the correction will be displayed in the column titled "Date Corrected".

The following errata were submitted by our customers and approved as valid errors by the author or editor.

Color key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update

Version Location Description Submitted By Date submitted Date corrected
Page p.158
l. -2

'optimal' should be 'optional'

Note from the Author or Editor:
Correct, should be 'optional'

HIDEMOTO NAKADA  Aug 23, 2025 
Page p.343
in table 15-4

The description of the `include_keys` parameter seems to be wrong.

The documentation says:
Include the columns used to partition the DataFrame in the output.

Note from the Author or Editor:
This method was updated after the book was published. 3 descriptions need to be updated:

maintain_order
Ensure that the order of the groups is consistent with the input data. This is slower than a default partition by operation.

include_key
Include the columns used to partition the DataFrame in the output.

as_dict
Return a dictionary instead of a list. The dictionary keys are tuples of the distinct group values that identify each group.

HIDEMOTO NAKADA  Aug 23, 2025 
Page p.421
l.8

"This means that in order to match data across two languages,
you first have to deserialize the data from one format,
then serialize it to the format of the other language."

I guess we need to serialize first, and then deserialize it.

Note from the Author or Editor:
Correct, it should be:
"When transferring data between two programming languages, you first serialize the data in the source language into a common format, then deserialize it in the target language to reconstruct the data in memory."

HIDEMOTO NAKADA  Sep 02, 2025 
Page p.421
l. -5

a computer with multiple computation cores
can process the same instruction on multiple data points at the same time.

SIMD instructions do not require multiple cores.
'a computer with SIMD capable processing units' might be better.

Note from the Author or Editor:
Correct.
Instead of:
"By lining up your data in memory, or vectorizing it, a computer with multiple computation cores can process the same instruction on multiple data points at the same time."
it should be:
"By laying out data contiguously in memory, or vectorizing it, the processor’s SIMD-capable units can apply the same instruction to multiple data points simultaneously, greatly improving performance."

HIDEMOTO NAKADA  Sep 02, 2025 
Page p.11
Part IV

> Chapter 15 shows how to reshape data, through (un)pivoting, stacking, and
extending.

stacking and extending are not explained in chapter 15, but in chapter 14.

Note from the Author or Editor:
"Chapter 14 explains how to combine different DataFrames using joins and concatenations. Chapter 15 shows how to reshape data, through (un)pivoting, stacking, and
extending."

Should be:

"Chapter 14 explains how to combine different DataFrames using joins and concatenations. Chapter 15 shows how to reshape data, through (un)pivoting, transposing, and exploding."

HIDEMOTO NAKADA  Sep 09, 2025 
Page p.325
Takeaways, first item.

> Combine DataFrames with exact matches in their join columns using df.join().
You can fine-tune the join with the tolerance and by arguments and by selecting
the appropriate strategy.

The second sentence is explaining join_asof, not join.

Note from the Author or Editor:
"Combine DataFrames with exact matches in their join columns using df.join(). You can fine-tune the join with the tolerance and by arguments and by selecting the appropriate strategy.
• Combine numerical or temporal columns in DataFrames on their nearest values using df.join_asof()."
Should be:
"Combine DataFrames with exact matches in their join columns using df.join().
• Combine numerical or temporal columns in DataFrames on their nearest values using df.join_asof(). You can fine-tune the join with the `tolerance` and `by` arguments and by selecting the appropriate strategy."

HIDEMOTO NAKADA  Sep 09, 2025 
Page p.448
Table A-1

The table includes Dask, DuckDB, and PySpark, while they are not mentioned in this chapter. I think it would be better to remove them from the table.

Note from the Author or Editor:
Agreed. While they were relevant for the general benchmarking we did, we only compare GPU packages in the appendix.
The table should only feature:
- cuDF
- pandas
- Polars
- Polars on GPU

HIDEMOTO NAKADA  Sep 09, 2025 
Page Chapter 11, p240: Slicing
second bullet point

The example implies regular python convention of `df.slice(start, stop)`, but the second argument is actually the length of the slice. This is particularly important to remember when using `pl.col().list.slice()` as the error can easily go unnoticed.

Note from the Author or Editor:
Correct.
Should be:
"For example, keep from the third to the seventh row with df.slice(2, 4). Here the first argument is the starting index, and the second argument is the length of the slice."

Ben Hardcastle  Sep 09, 2025 
Page p.76
table 4-1

> Int8 8-bit signed Integer type. -128 to 128

the range should be -128 to 127.

Note from the Author or Editor:
Correct, the range is indeed -128 to 127 (inclusive)

HIDEMOTO NAKADA  Sep 14, 2025 
Page p. 335
Table 15-2

Arguments `id_vars` and `value_vars` should be `index` and `on` respectively.

I think the code outside the table is correct. I guess these were changes in the API when writing the book.

I think the Lazy Pivot box on p. 334 is out of date.

Note from the Author or Editor:
The API has indeed changed since the book was published.

Ian Gow  Mar 13, 2026 
Page 13
First code block, below second paragraph

The end of the URL for the citibike data on S3 should read:

202403-citibike-tripdata.zip

The print and notebook versions contain a .csv suffix:

202403-citibike-tripdata.csv.zip

Note from the Author or Editor:
Good find! This has been fixed in the repo that comes with the book, and we'll fix the print in the next edition

Andrew Campbell  Jul 06, 2025 
Page 134
2nd key note

The 2nd note states " method is one the many" but should state "method is one of the many"

Note from the Author or Editor:
Good find, it should be "The `Expr.str.ends_with()` method is one of the many String methods..."

Thomas Hefferman  Aug 18, 2025 
Page 330
Table 15-1

Argument `columns` should be `on`.

Note from the Author or Editor:
API has changed since the book was published.

Ian Gow  Mar 13, 2026