Data Mesh in Action

Book description

Revolutionize the way your organization approaches data with a data mesh! This new decentralized architecture outpaces monolithic lakes and warehouses and can work for a company of any size.

In Data Mesh in Action you will learn how to:

  • Implement a data mesh in your organization
  • Turn data into a data product
  • Move from your current data architecture to a data mesh
  • Identify data domains, and decompose an organization into smaller, manageable domains
  • Set up the central governance and local governance levels over data
  • Balance responsibilities between the two levels of governance
  • Establish a platform that allows efficient connection of distributed data products and automated governance

Data Mesh in Action reveals how this groundbreaking architecture looks for both startups and large enterprises. You won’t need any new technology—this book shows you how to start implementing a data mesh with flexible processes and organizational change. You’ll explore both an extended case study and real-world examples. As you go, you’ll be expertly guided through discussions around Socio-Technical Architecture and Domain-Driven Design with the goal of building a sleek data-as-a-product system. Plus, dozens of workshop techniques for both in-person and remote meetings help you onboard colleagues and drive a successful transition.

About the Technology
Business increasingly relies on efficiently storing and accessing large volumes of data. The data mesh is a new way to decentralize data management that radically improves security and discoverability. A well-designed data mesh simplifies self-service data consumption and reduces the bottlenecks created by monolithic data architectures.

About the Book
Data Mesh in Action teaches you pragmatic ways to decentralize your data and organize it into an effective data mesh. You’ll start by building a minimum viable data product, which you’ll expand into a self-service data platform, chapter-by-chapter. You’ll love the book’s unique “sliders” that adjust the mesh to meet your specific needs. You’ll also learn processes and leadership techniques that will change the way you and your colleagues think about data.

What's Inside
  • Decompose an organization into manageable domains
  • Turn data into a data product
  • Set up central and local governance levels
  • Build a fit-for-purpose data platform
  • Improve management, initiation, and support techniques


About the Reader
For data professionals. Requires no specific programming stack or data platform.

About the Authors
Jacek Majchrzak is a hands-on lead data architect. Dr. Sven Balnojan manages data products and teams. Dr. Marian Siwiak is a data scientist and a management consultant for IT, scientific, and technical projects.

Quotes
This book teleports you into the seat of the chief architect on a data mesh project.
- From the Foreword by Jean-Georges Perrin, PayPal

A must-read for anyone who works in data.
- Prukalpa Sankar, Co-Founder of Atlan

Satisfies all those ‘what’, ‘why’, and ‘how’ questions. A unique blend of process and technology, and an excellent, example-driven resource.
- Shiroshica Kulatilake, WSO2

The starting point for your journey in the new generation of data platforms.
- Arnaud Castelltort, University of Montpellier

Table of contents

  1. inside front cover
  2. Data Mesh in Action
  3. Copyright
  4. brief contents
  5. contents
  6. front matter
    1. foreword
    2. preface
    3. acknowledgments
    4. about this book
      1. Who should read this book?
      2. How this book is organized: A road map
      3. How to use this book
      4. The Messflix case study
      5. liveBook discussion forum
    5. about the authors
    6. about the cover illustration
  7. Part 1. Foundations
  8. 1 The what and why of the data mesh
    1. 1.1 Data mesh 101
    2. 1.2 Why the data mesh?
      1. 1.2.1 Alternatives
      2. 1.2.2 Data warehouses and data lakes inside the data mesh
      3. 1.2.3 Data mesh benefits
    3. 1.3 Use case: A snow-shoveling business
    4. 1.4 Data mesh principles
      1. 1.4.1 Domain-oriented decentralized data ownership and architecture
      2. 1.4.2 Data as a product
      3. 1.4.3 Federated computational governance
      4. 1.4.4 Self-serve data infrastructure as a platform
    5. 1.5 Back to snow shoveling
    6. 1.6 Socio-technical architecture
      1. 1.6.1 Conway’s law
      2. 1.6.2 Team topologies
      3. 1.6.3 Cognitive load
    7. 1.7 Data mesh challenges
      1. 1.7.1 Technological challenges
      2. 1.7.2 Data management challenges
      3. 1.7.3 Organizational challenges
    8. Summary
  9. 2 Is a data mesh right for you?
    1. 2.1 Analyzing data mesh drivers
      1. 2.1.1 Business drivers
      2. 2.1.2 Organizational drivers
      3. 2.1.3 Domain-data drivers
      4. 2.1.4 Minor organizational drivers
      5. 2.1.5 Is a data mesh a good fit for me?
    2. 2.2 Data mesh alternatives and complementary solutions
      1. 2.2.1 Enterprise data warehouse
      2. 2.2.2 Data lake
      3. 2.2.3 Data lakehouse
      4. 2.2.4 Data fabric
      5. 2.2.5 Data mesh vs. the rest of the world
    3. 2.3 Understanding a data mesh implementation effort
      1. 2.3.1 The data mesh development cycle
      2. 2.3.2 Development cycle in the shoveling example
      3. 2.3.3 Enabling the team
      4. 2.3.4 Development cycle in detail
    4. Summary
  10. 3 Kickstart your data mesh MVP in a month
    1. 3.1 Getting the lay of the land
      1. 3.1.1 Drawing a system landscape diagram
      2. 3.1.2 Performing stakeholder analysis
    2. 3.2 Identifying candidates for the MVP implementation team
      1. 3.2.1 Choosing development teams
      2. 3.2.2 Choosing the cooperation model
      3. 3.2.3 Choosing a data governance team
    3. 3.3 Setting up MVP governance
      1. 3.3.1 Defining data mesh value statement(s)
      2. 3.3.2 Defining data governance policies
      3. 3.3.3 Federating data governance
    4. 3.4 Developing minimal data products
      1. 3.4.1 Identifying domain-oriented datasets
      2. 3.4.2 Choosing data product owners
      3. 3.4.3 Deciding on the minimum viable data product description
      4. 3.4.4 Developing the simplest tools to expose your data
    5. 3.5 Setting up the minimal platform
      1. 3.5.1 Ensuring platform-forced governability
      2. 3.5.2 Ensuring platform security
    6. Summary
  11. Part 2. The four principles in practice
  12. 4 Domain ownership
    1. 4.1 Capturing and analyzing domains
      1. 4.1.1 Domain-driven design 101
      2. 4.1.2 Invite the right people
      3. 4.1.3 Choose the correct workshop technique
    2. 4.2 Applying ownership using domain decomposition
      1. 4.2.1 Domain, subdomain, and business capability
      2. 4.2.2 Decompose domains using business capability modeling
      3. 4.2.3 How are domains and business capabilities related to data?
      4. 4.2.4 Assign responsibilities to the data-product-owning team
      5. 4.2.5 Choose the right team to own data
    3. 4.3 Applying ownership using data use cases
      1. 4.3.1 Data use cases
      2. 4.3.2 Model and bounded context
      3. 4.3.3 Set up boundaries of use-case-driven data products
      4. 4.3.4 Choose the right team to own data
    4. 4.4 Applying ownership using design heuristics
      1. 4.4.1 What is a heuristic?
      2. 4.4.2 Using design heuristics
      3. 4.4.3 Designing heuristics and possible boundaries
    5. 4.5 Final landscape: The mesh of interconnected data products
      1. 4.5.1 Messflix data mesh
      2. 4.5.2 Data products form a mesh
      3. 4.5.3 Is it already a data mesh?
    6. Summary
  13. 5 Data as a product
    1. 5.1 Applying product thinking
      1. 5.1.1 Product thinking analysis
      2. 5.1.2 Data product canvas
    2. 5.2 What is a data product?
      1. 5.2.1 Data product definition
      2. 5.2.2 Product, not project
      3. 5.2.3 What can be a data product?
    3. 5.3 Data product ownership
      1. 5.3.1 Data product owner
      2. 5.3.2 Data product owner responsibilities
      3. 5.3.3 An Agile DevOps team as a base for data product dev team
      4. 5.3.4 Data product owner and product owner
    4. 5.4 Conceptual architecture of a data product
      1. 5.4.1 External architecture view
      2. 5.4.2 Internal architecture view
    5. 5.5 Data product fundamental characteristics
      1. 5.5.1 Self-described data product
      2. 5.5.2 Introduction to metadata
      3. 5.5.3 Metadata as code
      4. 5.5.4 Data product metadata
      5. 5.5.5 Domain dataset metadata
      6. 5.5.6 Other kinds of metadata
    6. 5.6 Additional data product characteristics: FAIR and immutability
      1. 5.6.1 Findability
      2. 5.6.2 Accessibility
      3. 5.6.3 Interoperable
      4. 5.6.4 Reusable
      5. 5.6.5 Immutable
    7. 5.7 Data contracts and sharing agreements inside the data mesh
      1. 5.7.1 Data contracts and sharing agreements
      2. 5.7.2 Implementing data contracts and sharing agreements
    8. Summary
  14. 6 Federated computational governance
    1. 6.1 Data governance in a nutshell
    2. 6.2 Benefits of data governance
      1. 6.2.1 Business value perspective
      2. 6.2.2 Data usability perspective
      3. 6.2.3 Data control perspective
    3. 6.3 Planning data governance outcomes
      1. 6.3.1 Hierarchy of data governance outcomes
      2. 6.3.2 Strategic-level outcomes
      3. 6.3.3 Tactical-level outcomes
      4. 6.3.4 Implementation-level outcomes
    4. 6.4 Federating data governance
      1. 6.4.1 Thinking of data governance in terms of “sliders”
      2. 6.4.2 Extreme ends of data governance models
      3. 6.4.3 Federated data governance model
      4. 6.4.4 Setting-up governance team operations
    5. 6.5 Making data governance computational
      1. 6.5.1 Making policies computational
      2. 6.5.2 Automating policy checks
    6. Summary
  15. 7 The self-serve data platform
    1. 7.1 The MVP platform
      1. 7.1.1 Platform definition
      2. 7.1.2 Platform thinking
    2. 7.2 Improvements with X as a service
      1. 7.2.1 X as a service explained
      2. 7.2.2 X as a service applied
    3. 7.3 Improvements with platform architecture
      1. 7.3.1 Platform architecture explained
      2. 7.3.2 Platform architecture applied
    4. 7.4 Improvements for the data producers
    5. Summary
  16. Part 3. Infrastructure and technical architecture
  17. 8 Comparing self-serve data platforms
    1. 8.1 Data mesh on Google Cloud Platform
      1. 8.1.1 Self-serve data platform architecture
      2. 8.1.2 Identifying the components of the platform
      3. 8.1.3 Identifying the components of the data product
      4. 8.1.4 Workflows
      5. 8.1.5 Variations
      6. 8.1.6 Relation to data mesh ideas
      7. 8.1.7 GCP architecture summary
    2. 8.2 Data mesh on AWS
      1. 8.2.1 Self-serve data platform architecture
      2. 8.2.2 Identifying the components of the platform
      3. 8.2.3 Identifying the components of the data products
      4. 8.2.4 Workflows
      5. 8.2.5 Relation to data mesh ideas
      6. 8.2.6 Variations
      7. 8.2.7 AWS architecture summary
    3. 8.3 Data mesh on Databricks
      1. 8.3.1 Self-serve data platform architecture
      2. 8.3.2 Identifying the components of the platform
      3. 8.3.3 Identifying the components of the data product
      4. 8.3.4 Workflow considerations
      5. 8.3.5 Variations
      6. 8.3.6 Databricks architecture summary
    4. 8.4 Data mesh on Kafka
      1. 8.4.1 Self-serve data platform architecture
      2. 8.4.2 Identifying the components
      3. 8.4.3 Considerations
      4. 8.4.4 Kafka architecture summary
    5. Summary
  18. 9 Solution architecture design
    1. 9.1 Capturing and understanding the current state
      1. 9.1.1 What is software architecture?
      2. 9.1.2 How to document architecture: The C4 model
    2. 9.2 Understanding architectural drivers of a data product design
      1. 9.2.1 Architectural drivers
      2. 9.2.2 Capturing architectural drivers for a data-product design
    3. 9.3 Designing the future architecture of a data product and related systems
      1. 9.3.1 Design session
      2. 9.3.2 File-based data product: Spreadsheet
      3. 9.3.3 From monolith and microservice to a data product
      4. 9.3.4 Exposing data for stream processing and batch processing
    4. Summary
  19. Appendix A.
  20. Appendix B.
  21. Appendix C.
  22. Appendix D.
    1. D.1 Notes on thinnest viable platforms
    2. D.2 Note on phasing out interfaces
  23. index
  24. inside back cover

Product information

  • Title: Data Mesh in Action
  • Author(s): Jacek Majchrzak, Sven Balnojan, Marian Siwiak
  • Release date: February 2023
  • Publisher(s): Manning Publications
  • ISBN: 9781633439979