Python Polars: The Definitive Guide

Book description

Want to speed up your data analysis and work with larger-than-memory datasets? Python Polars offers a blazingly fast, multithreaded, and elegant API for data loading, manipulation, and processing. With this hands-on guide, you'll walk through every aspect of Polars and learn how to tackle practical use cases using real-world datasets.

Jeroen Janssens and Thijs Nieuwdorp from Xomnia in Amsterdam show you how this superfast DataFrame library is perfect for efficient data wrangling, ETL pipelines, and so much more. This book helps you quickly learn the syntax and understand Polars' underlying concepts. You don't need to have experience with pandas or Spark, but if you do, this book will help you make a smooth transition.

With this definitive guide at your side, you'll be able to:

  • Process larger-than-memory datasets at record speed
  • Apply the eager, lazy, and streaming APIs of Polars and decide when to use them
  • Transition smoothly from pandas or Spark to Polars
  • Integrate Polars into your existing code base
  • Work with Arrow and Parquet to efficiently read and write data
  • Translate complex ETL tasks into efficient and elegant queries

Publisher resources

View/Submit Errata

Table of contents

  1. 1. First Steps
    1. Overview
    2. Installing Polars
    3. Compiling Polars from Scratch
      1. Edge Case: Very Large Datasets
      2. Edge Case: Processors Lacking AVX support
    4. Configuring Polars
      1. Temporary Configuration Using a Context Manager
      2. Local Configuration Using a Decorator
    5. Downloading Datasets and Code Examples
    6. Crash Course JupyterLab
      1. Keyboard Shortcuts
    7. Using Polars in a Docker Container
    8. Conclusion
  2. 2. Moving from Pandas to Polars
    1. Animals
    2. Similarities to Recognize
    3. Appearances to Appreciate
      1. Differences in Code
      2. Differences in Display
    4. Concepts to Unlearn
      1. Index
      2. Axes
      3. Indexing and Slicing
      4. Eagerness
      5. Relaxedness
    5. Syntax to Forget
      1. Common Operations Side By Side
    6. Takeaways
  3. 3. Data Types and Data Structures
    1. Arrow Data Types
      1. Nested Data Types
      2. Missing Values
    2. Series, DataFrames, and LazyFrames
    3. Data Type Conversion
    4. Conclusion
  4. 4. Eager and Lazy APIs
    1. Eager API: DataFrame
    2. Lazy API: LazyFrame
      1. LazyFrame Scan Level Optimizations
      2. Other Optimizations
    3. Performance Differences
    4. Functionality Differences
      1. Aggregations
      2. Attributes
      3. Computation
      4. Descriptive
      5. GroupBy
      6. Exporting
      7. Manipulation and Selection
      8. Miscellaneous
    5. Out-of-Core Computation with Lazy API’s Streaming Mode
    6. Tips and Tricks
      1. Going from LazyFrame to DataFrame and Vice Versa
      2. Joining a DataFrame and a LazyFrame
      3. Caching Intermittent Stages
    7. Conclusion
  5. 5. Reading and Writing Data
    1. Reading CSV Files
    2. Parsing Missing Values Correctly
    3. Reading Files with Encodings Other than UTF-8
    4. Reading Excel Spreadsheets
    5. Working with Multiple Files
    6. Reading Parquet
    7. Reading JSON and NDJSON
      1. JSON
      2. NDJSON
    8. Other File Formats
    9. Querying Databases
    10. Writing Data
      1. CSV Format
      2. Excel Format
      3. Parquet Format
      4. Other Considerations
    11. Conclusion
  6. 6. Beginning Expressions
    1. Methods and Namespaces
    2. Expressions by Example
      1. Selecting Columns with Expressions
      2. Creating New Columns with Expressions
      3. Filtering Rows with Expressions
      4. Aggregating with Expressions
      5. Sorting Rows with Expressions
    3. What Exactly Is an Expression?
      1. Properties of Expressions
    4. Creating Expressions
      1. From Existing Columns
      2. From Literal Values
      3. From Ranges
      4. Other Functions to Create Expressions
    5. Renaming Expressions
    6. Expressions Are Idiomatic
    7. Conclusion
  7. 7. Continuing Expressions
    1. Types of Operations
      1. Example A: Element-Wise Operations
      2. Example B: Operations that Summarize to One
      3. Example C: Operations that Summarize to One or More
      4. Example D: Operations that Extend
    2. Element-Wise Operations
      1. Operations That Perform Mathematical Transformations
      2. Operations Related to Trigonometry
      3. Operations That Round and Categorize
      4. Operations for Missing or Infinite Values
      5. Other Operations
    3. Nonreducing Series-Wise Operations
      1. Operations That Accumulate
      2. Operations That Fill and Shift
      3. Operations Related to Duplicate Values
      4. Operations That Compute Rolling Statistics
      5. Operations That Sort
      6. Other Operations
    4. Series-Wise Operations that Summarize to One
      1. Operations That Are Quantifiers
      2. Operations That Compute Statistics
      3. Operations That Count
      4. Other Operations
    5. Series-Wise Operations that Summarize to One or More
      1. Operations Related to Unique Values
      2. Operations That Select
      3. Operations That Drop Missing Values
      4. Other Operations
    6. Series-Wise Operations that Extend
    7. Conclusion
  8. 8. Combining Expressions
    1. Inline Operators Versus Methods
    2. Arithmetic Operations
    3. Comparison Operations
    4. Boolean Algebra Operations
    5. Bitwise Operations
    6. Using Functions
    7. Conclusion
  9. 9. Selecting and Creating Columns
    1. Selecting Columns
      1. Introducing Selectors
      2. Selecting Based on Name
      3. Selecting Based on Data Type
      4. Selecting Based on Position
      5. Combining Selectors
    2. Creating Columns
    3. Related Column Operations
    4. Takeaways
  10. 10. Filtering and Sorting Rows
    1. Filtering Rows
      1. Filtering Based on Expressions
      2. Filtering Based on Column Names
      3. Filtering Based on Constraints
    2. Sorting Rows
      1. Sorting Based On a Single Column
      2. Sorting in Reverse
      3. Sorting Based on Multiple Columns
      4. Sorting Based on Expressions
      5. Sorting Nested Data Types
    3. Related Row Operations
    4. Takeaways
  11. 11. Working with Special Data Types
    1. Strings
      1. Methods
      2. Examples
    2. Categoricals
      1. Methods
      2. Examples
      3. Enum
    3. Temporal Data
      1. Methods
      2. Examples
    4. List
      1. Methods
      2. Examples
    5. Array
      1. Methods
      2. Examples
    6. Structs
      1. Methods
      2. Examples
    7. Conclusion
  12. 12. Summarizing and Aggregating
    1. Group by Context
    2. The Descriptives
    3. The Advanced
    4. User-Defined Functions
    5. Row-wise Aggregations with reduce and fold
    6. over() Expressions in Selection Context
    7. Dynamic Grouping with group_by_dynamic
    8. Rolling Aggregations with rolling
    9. Conclusion
  13. 13. Joining and Concatenating
    1. Joining
      1. Join Strategies
      2. Joining on Multiple Columns
      3. Validation
    2. Inexact Joining
      1. join_asof Strategies
      2. Additional Finetuning with tolerance and by
      3. Use Case: Marketing Campaign Attribution
    3. Vertical and Horizontal Concatenation
    4. Conclusion
  14. 14. Reshaping
    1. Wide Versus Long DataFrames
    2. Pivot to Wider DataFrame
    3. Melt to Longer DataFrame
    4. Transposing
    5. Exploding
    6. Partition into Multiple DataFrames
    7. Conclusion
  15. 15. Visualizing Data
    1. NYC Bike Trips
    2. Built-in Plotting with hvPlot
      1. A First Plot
      2. Methods in the Plot Namespace
      3. Getting Help for a Method
      4. Pandas as Backup
      5. Manual Transformations
      6. Changing the Plotting Backend
      7. Plotting Points on a Map
      8. Composing Plots
      9. Adding Interactive Widgets
      10. Common Customizations
    3. Alternative Packages
      1. Plotnine
      2. Great Tables
    4. Takeaways
  16. 16. Extending Polars
    1. User Defined Functions in Python
    2. Registering Your Own Namespace
    3. Polars Plug-Ins in Rust
      1. Prerequisites
      2. The Anatomy of a Plug-in Project
      3. The Plug-in
      4. Compiling the Plug-in
      5. Performance Benchmark
      6. Register Arguments
      7. Using a Rust Crate
      8. Use Case: Geo
    4. Conclusion
  17. 17. Polars Internals
    1. Arrow
    2. Multi-Threaded Computations and SIMD Operations
    3. The String Data Type in Memory
    4. ChunkedArrays in Series
    5. Query Optimization
      1. LazyFrame Scan Level Optimizations
      2. Other Optimizations
    6. Profiling Polars
    7. Tests in Polars
      1. Comparing DataFrames and Series
    8. Common Anti-patterns
      1. Using Brackets for Column Selection
      2. Misusing collect()
      3. Using Python Code in your Polars Queries
    9. Takeaways
  18. About the Authors

Product information

  • Title: Python Polars: The Definitive Guide
  • Author(s): Jeroen Janssens, Thijs Nieuwdorp
  • Release date: February 2025
  • Publisher(s): O'Reilly Media, Inc.
  • ISBN: 9781098156084