Chapter 2. A Modern Data Infrastructure

Before deciding on products and design for building pipelines, it’s worth understanding what makes up a modern data stack. As with most things in technology, there’s no single right way to design your analytics ecosystem or choose products and vendors. Regardless, there are some key needs and concepts that have become industry standard and set the stage for best practices in implementing pipelines.

Let’s take a look at the key components of such an infrastructure as displayed in Figure 2-1. Future chapters explore how each component factors into the design and implementation of data pipelines.

Diversity of Data Sources

The majority of organizations have dozens, if not hundreds, of data sources that feed their analytics endeavors. Data sources vary across many dimensions covered in this section.

dppr 0201
Figure 2-1. The key components of a modern data infrastructure.

Source System Ownership

It’s typical for an analytics team to ingest data from source systems that are built and owned by the organization as well as from third-party tools and vendors. For example, an ecommerce company might store data from their shopping cart in a PostgreSQL (also known as Postgres) database behind their web app. They may also use a third-party web analytics tool such as Google Analytics to track usage on their website. The combination of the two data sources (illustrated ...

Get Data Pipelines Pocket Reference now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.