Chapter 2. Importing Data into DuckDB
In Chapter 1, you saw how you can create a simple DuckDB database and load tables into it. In the real world, your data often comes from different data sources and file formats—such as CSV, Excel, Parquet, or database servers. In this chapter, you’ll first learn the different ways to create your DuckDB databases, and then learn how to load them using various data sources. By the end of this chapter, you’ll have a clear idea of how to work with each data source, as well as tips and tricks for dealing with them.
Creating DuckDB Databases
In this section, we will dive into the different ways you can create DuckDB databases and provide suggestions on which methods may suit your purposes.
The simplest way to create a DuckDB database is to use the connect()
function in the duckdb
module:
import
duckdb
conn
=
duckdb
.
connect
()
The connect()
function returns a DuckDBPyConnection
object. By default, this statement opens a modifiable in-memory database, as seen here:
conn
=
duckdb
.
connect
(
':memory:'
)
If you wish to create a DuckDB database that is persisted on storage, set the database
argument to the name of a database, for example, mydb.duckdb (you can use any extension you wish for the filename):
conn
=
duckdb
.
connect
(
database
=
'mydb.duckdb'
,
read_only
=
False
)
Note
The first time you run this statement, the mydb.duckdb database file will be created in the same folder as your code (such as Jupyter Notebook). You can set the read_only
argument to ...
Get DuckDB: Up and Running now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.