Book description
Do your product dashboards look funky? Are your quarterly reports stale? Is the data set you're using broken or just plain wrong? These problems affect almost every team, yet they're usually addressed on an ad hoc basis and in a reactive manner. If you answered yes to these questions, this book is for you.
Many data engineering teams today face the "good pipelines, bad data" problem. It doesn't matter how advanced your data infrastructure is if the data you're piping is bad. In this book, Barr Moses, Lior Gavish, and Molly Vorwerck, from the data observability company Monte Carlo, explain how to tackle data quality and trust at scale by leveraging best practices and technologies used by some of the world's most innovative companies.
- Build more trustworthy and reliable data pipelines
- Write scripts to make data checks and identify broken pipelines with data observability
- Learn how to set and maintain data SLAs, SLIs, and SLOs
- Develop and lead data quality initiatives at your company
- Learn how to treat data services and systems with the diligence of production software
- Automate data lineage graphs across your data ecosystem
- Build anomaly detectors for your critical data assets
Publisher resources
Table of contents
- Preface
- 1. Why Data Quality Deserves Attention—Now
- 2. Assembling the Building Blocks of a Reliable Data System
- 3. Collecting, Cleaning, Transforming, and Testing Data
-
4. Monitoring and Anomaly Detection for Your Data Pipelines
- Knowing Your Known Unknowns and Unknown Unknowns
- Building an Anomaly Detection Algorithm
- Building Monitors for Schema and Lineage
- Scaling Anomaly Detection with Python and Machine Learning
- Beyond the Surface: Other Useful Anomaly Detection Approaches
- Designing Data Quality Monitors for Warehouses Versus Lakes
- Summary
- 5. Architecting for Data Reliability
- 6. Fixing Data Quality Issues at Scale
- 7. Building End-to-End Lineage
-
8. Democratizing Data Quality
- Treating Your “Data” Like a Product
- Perspectives on Treating Data Like a Product
- Building Trust in Your Data Platform
- Assigning Ownership for Data Quality
- Creating Accountability for Data Quality
- Balancing Data Accessibility with Trust
- Certifying Your Data
- Seven Steps to Implementing a Data Certification Program
- Case Study: Toast’s Journey to Finding the Right Structure for Their Data Team
- Increasing Data Literacy
- Prioritizing Data Governance and Compliance
- Building a Data Quality Strategy
- Summary
-
9. Data Quality in the Real World: Conversations and Case Studies
- Building a Data Mesh for Greater Data Quality
- Why Implement a Data Mesh?
-
A Conversation with Zhamak Dehghani: The Role of Data Quality Across the Data Mesh
- Can You Build a Data Mesh from a Single Solution?
- Is Data Mesh Another Word for Data Virtualization?
- Does Each Data Product Team Manage Their Own Separate Data Stores?
- Is a Self-Serve Data Platform the Same Thing as a Decentralized Data Mesh?
- Is the Data Mesh Right for All Data Teams?
- Does One Person on Your Team “Own” the Data Mesh?
- Does the Data Mesh Cause Friction Between Data Engineers and Data Analysts?
- Case Study: Kolibri Games’ Data Stack Journey
- Making Metadata Work for the Business
- Unlocking the Value of Metadata with Data Discovery
-
Deciding When to Get Started with Data Quality at Your Company
- You’ve Recently Migrated to the Cloud
- Your Data Stack Is Scaling with More Data Sources, More Tables, and More Complexity
- Your Data Team Is Growing
- Your Team Is Spending at Least 30% of Their Time Firefighting Data Quality Issues
- Your Team Has More Data Consumers Than They Did One Year Ago
- Your Company Is Moving to a Self-Service Analytics Model
- Data Is a Key Part of the Customer Value Proposition
- Data Quality Starts with Trust
- Summary
- 10. Pioneering the Future of Reliable Data Systems
- Index
- About the Authors
Product information
- Title: Data Quality Fundamentals
- Author(s):
- Release date: September 2022
- Publisher(s): O'Reilly Media, Inc.
- ISBN: 9781098112042
You might also like
book
Data Management at Scale, 2nd Edition
As data management continues to evolve rapidly, managing all of your data in a central place, …
book
The Enterprise Data Catalog
Combing the web is simple, but how do you search for data at work? It's difficult …
book
Data Governance: The Definitive Guide
As you move data to the cloud, you need to consider a comprehensive approach to data …
book
Fundamentals of Data Engineering
Data engineering has grown rapidly in the past decade, leaving many software engineers, data scientists, and …