Chapter 5. Data Ingestion: Loading Data

In Chapter 4, you extracted data from your desired source system. Now it’s time to complete the data ingestion by loading the data into your Redshift data warehouse. How you load depends on what the output of your data extraction looks like. In this section I will describe how to load data extracted into CSV files with the values corresponding to each column in a table, as well as extraction output containing CDC-formatted data.

Configuring an Amazon Redshift Warehouse as a Destination

If you’re using Amazon Redshift for your data warehouse, integration with S3 for loading data after it has been extracted is quite simple. The first step is to create an IAM role for loading data if you don’t already have one.

Note

For instructions on setting up an Amazon Redshift cluster, check the latest documentation and pricing, including free trials.

To create the role, follow these instructions or check the AWS documentation for the latest details:

  1. Under the Services menu in the AWS console (or top navigation bar), navigate to IAM.

  2. On the left navigation menu, select Roles, and then click the “Create role” button.

Get Data Pipelines Pocket Reference now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.