Chapter 2. Importing Data into DuckDB

In Chapter 1, you saw how you can create a simple DuckDB database and load tables into it. In the real world, your data often comes from different data sources and file formats—such as CSV, Excel, Parquet, or database servers. In this chapter, you’ll first learn the different ways to create your DuckDB databases, and then learn how to load them using various data sources. By the end of this chapter, you’ll have a clear idea of how to work with each data source, as well as tips and tricks for dealing with them.

Creating DuckDB Databases

In this section, we will dive into the different ways you can create DuckDB databases and provide suggestions on which methods may suit your purposes.

The simplest way to create a DuckDB database is to use the connect() function in the duckdb module:

import duckdb

conn = duckdb.connect()

The connect() function returns a DuckDBPyConnection object. By default, this statement opens a modifiable in-memory database, as seen here:

conn = duckdb.connect(':memory:')

If you wish to create a DuckDB database that is persisted on storage, set the database argument to the name of a database, for example, mydb.duckdb (you can use any extension you wish for the filename):

conn = duckdb.connect(database = 'mydb.duckdb', read_only = False)
Note

The first time you run this statement, the mydb.duckdb database file will be created in the same folder as your code (such as Jupyter Notebook). You can set the read_only argument to ...

Get DuckDB: Up and Running now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.