Skip to content
  • Sign In
  • Try Now
View all events
Data Lake

Data Mesh in Practice—with Interactivity

Published by O'Reilly Media, Inc.

Intermediate content levelIntermediate

How to set the foundations for federated data ownership

This live event utilizes Jupyter Notebook technology

The data lake paradigm is often considered the scalable successor of the more curated data warehouse approach when it comes to democratization of data. However, many who set out to build a centralized data lake came back with a data swamp of unclear responsibilities, a lack of data ownership, and subpar data availability.

Accessibility and availability can only be guaranteed at scale when moving more responsibilities to those who pick up the data and have the respective domain knowledge—the data owners —while keeping only data governance and metadata information central. Such a decentralized and domain focused approach has recently been coined a data mesh.

Join experts Max Schultze and Arif Wider for a concise, comprehensive overview of the data mesh. You’ll learn how to tackle the challenges of decentralized data ownership and how to provide the right platform tooling that enables data owners to take over responsibility in a scalable and sustainable fashion. You’ll also discover how to provide data in such a way that others can create value from it, and explore the concept of a data product, which goes beyond sharing of files toward guarantees of quality and acknowledgement of data ownership.

What you’ll learn and how you can apply it

By the end of this live online course, you’ll understand:

  • The consequences of unclear data ownership
  • What a scalable structure of domain-driven, federated responsibilities looks like
  • How a shared data infrastructure platform can contribute

And you’ll be able to:

  • Facilitate steps toward federated data ownership in your company
  • Provide data in such a way that others can create value from it
  • Support data ownership by providing domain-agnostic infrastructure tooling

This live event is for you because...

  • You’re a software or data engineer.
  • You work with data production, infrastructure, or consumption.
  • You want to become a data product owner.

Prerequisites

  • list text here- Familiarity with distributed data processing
  • A basic understanding of Python

Recommended preparation:

Recommended follow-up:

Schedule

The time frames are only estimates and may vary according to how the class is progressing.

Introduction to data mesh (25 minutes)

  • Presentation: What’s the data mesh paradigm?; Why was it invented?
  • Exercise: Jupyter Notebook setup

The data consumer perspective (45 minutes)

  • Exercise: Calculate a set of business KPIs from a prepared, fairly undocumented dataset
  • Presentation: Overview of data mesh—product thinking for data, domain-driven design applied to distributed data, and platform thinking for data infrastructure; issues on the consumer side
  • Q&A

Break (5 minutes)

The data producer perspective (45 minutes)

  • Presentation: What to do on the data producer side; how to create a data product; how to think about domain boundaries
  • Exercise: Rewrite the introduced dataset with a proper column description; create a schema and dataset description
  • Presentation: why building a good data product is hard
  • Q&A

Break (5 minutes)

The data infrastructure platform perspective (45 minutes)

  • Exercise: answer an access request by calling some prepared functions; answer repeatedly to many access requests
  • Presentation: What makes a good data infrastructure platform?—domain agnostic, self-service, etc.; the trap of taking centralized responsibility for data; platform thinking—multitenancy, how to enable interoperability, and how to stay out of domain responsibility
  • Demo: Build a platform capability / self service tool
  • Q&A

Conclusion and Wrap up (10 minutes)

  • Presentation: the goal state; Key learnings; what did we NOT talk about? Followup suggestions
  • Q&A

Your Instructors

  • Max Schultze

    Max Schultze is an associate director of data engineering at the data platform of HelloFresh, the world's leading meal kit company. His focus lies on offering company-wide platform solutions around data infrastructure and governance. Previously he was working as an engineering manager at Zalando where he was building data pipelines at petabytes scale, productionizing distributed processing engines like Spark and Trino, and providing services and tooling for data management. As an early adopter of the data mesh paradigm, he is frequently advocating its usage through conference appearances, online trainings, and publications. Max originally graduated from Humboldt University of Berlin, actively taking part in the university’s initial development of Apache Flink.

    linkedinXsearch
  • Arif Wider

    Arif Wider is a professor of software engineering at HTW Berlin, Germany, and a lead technology consultant with ThoughtWorks. At Thoughtworks, he worked with Zhamak Dehghani, who coined the term Data Mesh in 2019. Outside of teaching, Arif enjoys building scalable software that makes an impact, as well as building teams that create such software. More specifically, he is fascinated by applications of Artificial Intelligence and how effectively building such applications requires data scientists and developers (like himself) to work closely together.

    linkedinXsearch