Fixing Data Quality at Scale with Data Observability—with Interactivity

Beginner to intermediate

How to apply observability to your data pipelines

This live event utilizes Jupyter Notebook technology

Do your product dashboards look funky? Are your quarterly reports way off? Are you sick and tired of running a SQL query only to discover that the dataset you’re using is broken or just plain wrong? These errors are highly costly and affect almost every team, yet they’re typically only addressed on an ad hoc basis and in a reactive manner.

As companies increasingly rely on data to lead operations and drive decision making, you need to ensure that your data pipelines are consistently healthy and reliable. In the same way that software developers tackle application downtime, data professionals deal with their own set of availability challenges. In other words, data downtime—the periods of time when your data is partial, erroneous, missing, or otherwise inaccurate. To identify and eliminate data downtime, teams must leverage the five pillars of data observability and embrace automated checks to monitor pipeline performance.

Join experts Barr Moses and Ryan Kearns to learn how to minimize data downtime and increase observability into your data ecosystem. You’ll explore the concept of data downtime and see how to measure it to determine the quality and health of your data using SQL, a sample data table, and a Jupyter notebook. From there, you’ll apply software engineering principles of observability to your data through five key pillars of data health—volume, schema, lineage, freshness, and distribution—as you set service-level objectives for data observability in your data table and implement basic data observability checks. You’ll end by creating your very own anomaly detection algorithm that will help capture data downtime incidents in your data table.

What you’ll learn and how you can apply it

By the end of this live online course, you’ll understand:

What data downtime is and how to measure it
How to determine the quality of your data
The five pillars of data observability
How to set SLOs for data observability
Basic data observability checks
Best practices for eliminating data downtime

And you’ll be able to:

Apply best practices from DevOps to data analytics and data engineering
Write SQL scripts that accomplish basic data observability checks
Identify broken data pipelines
Perform basic data lineage searches
Set alerts for data quality issues

This live event is for you because...

You’re a data professional who relies on reliable, accurate data to generate rich analytics and won’t settle for anything less.
You have a love-hate relationship with SQL and are constantly on the lookout for query hacks.
You believe that data downtime doesn’t receive the diligence it deserves.
You want to learn new ways to fold observability best practices into your data management routine.

Prerequisites

A basic understanding of SQL
Familiarity with common data warehouse technologies and the principles of DevOps observability

Recommended preparation:

Read “The Rise of Data Downtime” (article)
Read “What Is Data Observability?” (article)
Read “Good Pipelines, Bad Data” (article)
Take SQL Fundamentals for Data (live online training course with Thomas Nield)

Recommended follow-up:

Read Cloud Native Data Center Networking (book)
Read The Modern Data Warehouse in Azure (book)

Schedule

The time frames are only estimates and may vary according to how the class is progressing.

Introducing data downtime (40 minutes)

Presentation: Walk-through of a data downtime incident; defining data downtime; what data downtime looks like under the hood; measuring data downtime
Group discussion: Have you encountered data downtime in your pipelines or analytics?; How much time do you spend on data downtime incidents?
Jupyter Notebook exercise: Find the data issues in a dataset; measure data downtime
Q&A

Break (5 minutes)

Introducing data observability (40 minutes)

Presentation: Traditional methods of data quality monitoring—row counts and ad hoc queries; additional important measurements; applying best practices from software engineering and DevOps observability to data—SLOs, SLAs, and monitoring, alerting, and triaging; the five pillars of data observability—volume, schema, freshness, lineage, and distribution
Jupyter Notebook exercise: Identify the five pillars of data observability in your dataset
Q&A

Break (5 minutes)

Detecting data anomalies (40 minutes)

Presentation: What is anomaly detection?; What are data anomalies, and how do you find them? (manual approaches, how AI can help); signs you have anomalous data
Jupyter Notebook exercise: Create an anomaly detection algorithm (for data volume or freshness)
Q&A

Break (5 minutes)

Eliminating data downtime (35 minutes)

Presentation: Data observability principles to help you eliminate data downtime
Jupyter Notebook exercise: Use your anomaly detection algorithm on your dataset; consider a few approaches to ensure long-term data observability

Wrap-up and Q&A (10 minutes)

Your Instructors

Barr Moses
Barr Moses is cofounder and CEO of Monte Carlo, a data reliability company backed by Accel and other top Silicon Valley investors. Previously, she was VP of customer operations at customer success company Gainsight, where she helped scale the company 10x in revenue and, among other functions, built the data and analytics team; a management consultant at Bain & Company; and a research assistant in the Statistics Department at Stanford. She also served in the Israeli Air Force as a commander of an intelligence data analyst unit. Barr holds a BSc in mathematical and computational science from Stanford.

linkedin link search
Ryan Kearns
Ryan Kearns is a founding data scientist at Monte Carlo, where he develops machine learning algorithms for the company’s data observability platform. Together with CEO and cofounder Barr Moses, he instructed the first course on data observability for O'Reilly—the first tutorial on the subject using out-of-the-box SQL. He received bachelor’s degrees in computer science and philosophy (with honors) from Stanford University.

linkedin link search