Book description
Data engineering has grown rapidly in the past decade, leaving many software engineers, data scientists, and analysts looking for a comprehensive view of this practice. With this practical book, you'll learn how to plan and build systems to serve the needs of your organization and customers by evaluating the best technologies available through the framework of the data engineering lifecycle.
Authors Joe Reis and Matt Housley walk you through the data engineering lifecycle and show you how to stitch together a variety of cloud technologies to serve the needs of downstream data consumers. You'll understand how to apply the concepts of data generation, ingestion, orchestration, transformation, storage, and governance that are critical in any data environment regardless of the underlying technology.
This book will help you:
- Get a concise overview of the entire data engineering landscape
- Assess data engineering problems using an end-to-end framework of best practices
- Cut through marketing hype when choosing data technologies, architecture, and processes
- Use the data engineering lifecycle to design and build a robust architecture
- Incorporate data governance and security across the data engineering lifecycle
Publisher resources
Table of contents
- Preface
- I. Foundation and Building Blocks
- 1. Data Engineering Described
- 2. The Data Engineering Lifecycle
-
3. Designing Good Data Architecture
- What Is Data Architecture?
-
Principles of Good Data Architecture
- Principle 1: Choose Common Components Wisely
- Principle 2: Plan for Failure
- Principle 3: Architect for Scalability
- Principle 4: Architecture Is Leadership
- Principle 5: Always Be Architecting
- Principle 6: Build Loosely Coupled Systems
- Principle 7: Make Reversible Decisions
- Principle 8: Prioritize Security
- Principle 9: Embrace FinOps
- Major Architecture Concepts
- Examples and Types of Data Architecture
- Who’s Involved with Designing a Data Architecture?
- Conclusion
- Additional Resources
-
4. Choosing Technologies Across the Data Engineering Lifecycle
- Team Size and Capabilities
- Speed to Market
- Interoperability
- Cost Optimization and Business Value
- Today Versus the Future: Immutable Versus Transitory Technologies
- Location
- Build Versus Buy
- Monolith Versus Modular
- Serverless Versus Servers
- Optimization, Performance, and the Benchmark Wars
- Undercurrents and Their Impacts on Choosing Technologies
- Conclusion
- Additional Resources
- II. The Data Engineering Lifecycle in Depth
- 5. Data Generation in Source Systems
- 6. Storage
-
7. Ingestion
- What Is Data Ingestion?
- Key Engineering Considerations for the Ingestion Phase
- Batch Ingestion Considerations
- Message and Stream Ingestion Considerations
-
Ways to Ingest Data
- Direct Database Connection
- Change Data Capture
- APIs
- Message Queues and Event-Streaming Platforms
- Managed Data Connectors
- Moving Data with Object Storage
- EDI
- Databases and File Export
- Practical Issues with Common File Formats
- Shell
- SSH
- SFTP and SCP
- Webhooks
- Web Interface
- Web Scraping
- Transfer Appliances for Data Migration
- Data Sharing
- Whom You’ll Work With
- Undercurrents
- Conclusion
- Additional Resources
- 8. Queries, Modeling, and Transformation
- 9. Serving Data for Analytics, Machine Learning, and Reverse ETL
- III. Security, Privacy, and the Future of Data Engineering
- 10. Security and Privacy
-
11. The Future of Data Engineering
- The Data Engineering Lifecycle Isn’t Going Away
- The Decline of Complexity and the Rise of Easy-to-Use Data Tools
- The Cloud-Scale Data OS and Improved Interoperability
- “Enterprisey” Data Engineering
- Titles and Responsibilities Will Morph...
- Moving Beyond the Modern Data Stack, Toward the Live Data Stack
- Conclusion
- A. Serialization and Compression Technical Details
- B. Cloud Networking
- Index
- About the Authors
Product information
- Title: Fundamentals of Data Engineering
- Author(s):
- Release date: June 2022
- Publisher(s): O'Reilly Media, Inc.
- ISBN: 9781098108304
You might also like
audiobook
Fundamentals of Data Engineering
Data engineering has grown rapidly in the past decade, leaving many software engineers, data scientists, and …
book
Deciphering Data Architectures
Data fabric, data lakehouse, and data mesh have recently appeared as viable alternatives to the modern …
book
Data Science from Scratch, 2nd Edition
To really learn data science, you should not only master the tools—data science libraries, frameworks, modules, …
book
Practical Statistics for Data Scientists, 2nd Edition
Statistical methods are a key part of data science, yet few data scientists have formal statistical …