Book description
DuckDB is an open source in-process database created for OLAP workloads. It provides key advantages that separate this database from more mainstream OLAP solutions, including embeddability, compatibility with SQL, optimization for fast and efficient analytics, and integration with Python. This practical book shows you how DuckDB leverages Python libraries and tools for data analytics, machine learning, and AI.
Author Wei-Meng Lee shows developers, data engineers, data analysts, and data scientists how to get started. You'll learn the primary features and functions of DuckDB, explore use cases and best practices, and examine practical examples of how DuckDB can be used for a variety of data analytics tasks. You'll also dive into specific topics including how to import data into DuckDB, work with tables, perform exploratory data analysis, visualize DuckDB data, perform spatial analysis, and use DuckDB with JSON files, Polars, and JupySQL.
You'll also explore:
- The purpose of DuckDB and its main functions
- How to conduct data analytics tasks using DuckDB
- Methods for integrating DuckDB with pandas, Polars, and JupySQL
- How to use DuckDB to query your data
- Ways to perform spatial analytics using DuckDB's spatial extension
- How to work with a diverse range of data including Parquet, CSV, and JSON
Wei-Meng Lee is a technologist and founder of Developer Learning Solutions, a company that provides hands-on training on the latest technologies.
Publisher resources
Table of contents
- Brief Table of Contents (Not Yet Final)
- 1. Getting Started with DuckDB
- 2. Importing Data into DuckDB
- 3. A Primer on SQL
- 4. Using DuckDB with Polars
-
5. Performing EDA with DuckDB
- Our Dataset – The 2015 Flights Delay Dataset
- Geospatial Analytics
-
Performing Descriptive Analytics
- Finding the Airports for Each State and City
- Aggregating the Total Number of Airports in Each State
- Obtaining the Flight Counts for Each Pair of Origin and Destination Airports
- Getting the Cancelled Flights from Airlines
- Getting the Flight Count for Each Day of Week
- Finding the Most Common Timeslot for Flight Delays
- Finding the Airlines with the Most and Least Delays
- Summary
- 6. Using DuckDB with JSON Files
- 7. Using DuckDB with JupySQL
- 8. Accessing Remote Data using DuckDB
- About the Author
Product information
- Title: DuckDB: Up and Running
- Author(s):
- Release date: January 2025
- Publisher(s): O'Reilly Media, Inc.
- ISBN: 9781098159696
You might also like
book
FastAPI
FastAPI is a young yet solid framework that takes advantage of newer Python features in a …
book
Delta Lake: Up and Running
With the surge in big data and AI, organizations can rapidly create data products. However, the …
book
Kubernetes: Up and Running, 3rd Edition
This third edition comes with a dedicated playlist of interactive Katacoda labs mapped to each section …
book
Rust in Action
Rust in Action introduces the Rust programming language by exploring numerous systems programming concepts and techniques. …