Book description
Unlocking the value of modern data is critical for data-driven companies. This report provides a concise, practical guide to building a data architecture that efficiently delivers big, complex, and streaming data to both internal users and customers.
Authors Ori Rafael, Roy Hasson, and Rick Bilodeau from Upsolver examine how modern data pipelines can improve business outcomes. Tech leaders and data engineers will explore the role these pipelines play in the data architecture and learn how to intelligently consider tradeoffs between different data architecture patterns and data pipeline development approaches.
You will:
- Examine how recent changes in data, data management systems, and data consumption patterns have made data pipelines challenging to engineer
- Learn how three data architecture patterns (event sourcing, stateful streaming, and declarative data pipelines) can help you upgrade your practices to address modern data
- Compare five approaches for building modern data pipelines, including pure data replication, ELT over a data warehouse, Apache Spark over data lakes, declarative pipelines over data lakes, and declarative data lake staging to a data warehouse
Table of contents
- Introduction
- 1. The Modern Data Landscape and Its Impact on Data Engineering
- 2. Emerging Architecture Patterns
-
3. Modern Data Pipeline Alternatives
- Criteria for Evaluating Approaches to Data Pipelines
- Option 1: Pure Data Replication
- Option 2: ELT over a Data Warehouse
- Option 3: Apache Spark (Hadoop) over Data Lakes
- Option 4: Declarative Pipelines over Data Lakes
- Option 5: Declarative Data Lake Staging to a Data Warehouse or Other Analytics Systems
- Choosing the Best Approach for You
- Conclusion
- About the Authors
Product information
- Title: Unlock Complex and Streaming Data with Declarative Data Pipelines
- Author(s):
- Release date: July 2022
- Publisher(s): O'Reilly Media, Inc.
- ISBN: 9781098135829
You might also like
book
Cluster API and Declarative Kubernetes Management
Although Kubernetes continues to be today's dominant container orchestration tool, running and managing Kubernetes is no …
book
Managing Cloud Native Data on Kubernetes
Is Kubernetes ready for stateful workloads? This open source system has become the primary platform for …
book
Machine Learning for Streaming Data with Python
Apply machine learning to streaming data with the help of practical examples, and deal with challenges …
book
Building Machine Learning Pipelines
Companies are spending billions on machine learning projects, but it’s money wasted if the models can’t …