Driving Data Quality with Data Contracts

Book description

Everything you need to know to apply data contracts and build a truly data-driven organization that harnesses quality data to deliver tangible business value Purchase of the print or Kindle book includes a free PDF eBook

Key Features

  • Understand data contracts and their power to resolving the problems in contemporary data platforms
  • Learn how to design and implement a cutting-edge data platform powered by data contracts
  • Access practical guidance from the pioneer of data contracts to get expert insights on effective utilization

Book Description

Despite the passage of time and the evolution of technology and architecture, the challenges we face in building data platforms persist. Our data often remains unreliable, lacks trust, and fails to deliver the promised value.

With Driving Data Quality with Data Contracts, you’ll discover the potential of data contracts to transform how you build your data platforms, finally overcoming these enduring problems. You’ll learn how establishing contracts as the interface allows you to explicitly assign responsibility and accountability of the data to those who know it best—the data generators—and give them the autonomy to generate and manage data as required. The book will show you how data contracts ensure that consumers get quality data with clearly defined expectations, enabling them to build on that data with confidence to deliver valuable analytics, performant ML models, and trusted data-driven products.

By the end of this book, you’ll have gained a comprehensive understanding of how data contracts can revolutionize your organization’s data culture and provide a competitive advantage by unlocking the real value within your data.

What you will learn

  • Gain insights into the intricacies and shortcomings of today's data architectures
  • Understand exactly how data contracts can solve prevalent data challenges
  • Drive a fundamental transformation of your data culture by implementing data contracts
  • Discover what goes into a data contract and why it's important
  • Design a modern data architecture that leverages the power of data contracts
  • Explore sample implementations to get practical knowledge of using data contracts
  • Embrace best practices for the successful deployment of data contracts

Who this book is for

If you’re a data engineer, data leader, architect, or practitioner thinking about your data architecture and looking to design one that enables your organization to get the most value from your data, this book is for you. Additionally, staff engineers, product managers, and software engineering leaders and executives will also find valuable insights.

Table of contents

  1. Driving Data Quality with Data Contracts
  2. Foreword
  3. Contributors
  4. About the author
  5. About the reviewers
  6. Preface
    1. Who this book is for
    2. What this book covers
    3. To get the most out of this book
    4. Download the example code files
    5. Conventions used
    6. Get in touch
    7. Share Your Thoughts
    8. Download a free PDF copy of this book
  7. Part 1: Why Data Contracts?
  8. Chapter 1: A Brief History of Data Platforms
    1. The enterprise data warehouse
    2. The big data platform
    3. The modern data stack
    4. The state of today’s data platforms
      1. The lack of expectations
      2. The lack of reliability
      3. The lack of autonomy
    5. The ever-increasing use of data in business-critical applications
    6. Summary
    7. Further reading
  9. Chapter 2: Introducing Data Contracts
    1. What is a data contract?
      1. An agreed interface between the generators of data, and its consumers
      2. Setting expectations around that data
      3. Defining how the data should be governed
      4. Facilitating the explicit generation of quality data
      5. The four principles of data contracts
    2. When to use data contracts
    3. Data contracts and the data mesh
      1. Domain ownership
      2. Data as a product
      3. Self-serve data platform
      4. Federated computational governance
      5. Data contracts enable a data mesh
    4. Summary
    5. Further reading
  10. Part 2: Driving Data Culture Change with Data Contracts
  11. Chapter 3: How to Get Adoption in Your Organization
    1. Using data contracts to change an organization
    2. Articulating the value of your data
    3. Building data products
      1. What is a data product?
      2. Adopting a data product mindset
      3. Designing a data product
    4. Walking through an example of a data product
    5. Summary
    6. Further reading
  12. Chapter 4: Bringing Data Consumers and Generators Closer Together
    1. Who is a consumer, and who is a generator?
      1. Data consumers
      2. Data generators
    2. Assigning responsibility and accountability
    3. Feeding data back to the product teams
    4. Managing the evolution of data
    5. Summary
    6. Further reading
  13. Chapter 5: Embedding Data Governance
    1. Why we need data governance
      1. The requirements of data governance
      2. How data governance programs are typically applied
    2. Promoting data governance through data contracts
    3. Assigning responsibility for data governance
      1. Responsibilities of the data generators
      2. Introducing the data architecture council
      3. Working together to implement federated data governance
    4. Summary
    5. Further reading
  14. Part 3: Designing and Implementing a Data Architecture Based on Data Contracts
  15. Chapter 6: What Makes Up a Data Contract
    1. The schema of a data contract
      1. Defining a schema
      2. Using a schema registry as the source of truth
    2. Evolving your data over time
      1. Evolving your schemas
      2. Migrating your consumers
    3. Defining the governance and controls
    4. Summary
    5. Further reading
  16. Chapter 7: A Contract-Driven Data Architecture
    1. A step-change in building data platforms
      1. Building generic data tooling
      2. Introducing a data infrastructure team
      3. A case study from GoCardless in promoting autonomy
      4. Promoting autonomy through decentralization
    2. Introducing the principles of a contract-driven data architecture
      1. Automation
      2. Guidelines and guardrails
      3. Consistency
    3. Providing self-served data infrastructure
    4. Summary
    5. Further reading
  17. Chapter 8: A Sample Implementation
    1. Technical requirements
    2. Creating a data contract
    3. Providing the interfaces to the data
      1. Introducing IaC
      2. Creating the interfaces from the data contract
    4. Creating libraries for data generators
    5. Populating a central schema registry
      1. Registering a schema with the Confluent schema registry
      2. Managing schema evolution
    6. Implementing contract-driven tooling
    7. Summary
    8. Further reading
  18. Chapter 9: Implementing Data Contracts in Your Organization
    1. Getting started with data contracts
      1. The ability to define a data contract
      2. The ability to provision an interface for the data for consumers to query
      3. The ability of generators to write data to the interface
    2. Migrating to data contracts
    3. Discovering data contracts
      1. What is a data catalog?
      2. Why are data catalogs important for discovering data contracts?
      3. What is data lineage?
      4. Why is data lineage important for data contracts?
    4. Building a mature data contracts-backed data culture
    5. Summary
    6. Further reading
  19. Chapter 10: Data Contracts in Practice
    1. Designing a data contract
      1. Identifying the purpose
      2. Considering the trade-offs
      3. Defining the data contract
      4. Deploying the data contract
    2. Monitoring and enforcing data contracts
      1. The data contract’s definition
      2. The quality of the data
      3. The performance and dependability of the data
    3. Data contract publishing patterns
      1. Writing directly to the interface
      2. Materialized views on CDC
      3. The transactional outbox pattern
      4. The listen-to-yourself pattern
    4. Summary
    5. Further reading
  20. Index
    1. Why subscribe?
  21. Other Books You May Enjoy
    1. Packt is searching for authors like you
    2. Share Your Thoughts
    3. Download a free PDF copy of this book

Product information

  • Title: Driving Data Quality with Data Contracts
  • Author(s): Andrew Jones
  • Release date: June 2023
  • Publisher(s): Packt Publishing
  • ISBN: 9781837635009