Chapter 97. Your Data Tests Failed! Now What?
Sam Bail, PhD
Congratulations—you’ve successfully implemented data testing in your pipeline! Whether that’s using an off-the-shelf tool or home-cooked validation code, you’ve taken the crucial steps to ensuring high-quality data insights. But do you also have a plan for what happens when your tests actually fail? In this chapter, we’ll talk through some key stages of responding to data test failures and critical questions to ask when developing a data-quality strategy for your team.
System Response
Automated system responses are the first line of response to a failed data test. This could be either doing nothing, isolating “bad” data and continuing the pipeline, or stopping the pipeline.
Logging and Alerting
Which errors need alerting and which ones can simply be logged for later use? Which medium (email, Slack, PagerDuty, etc.) do you choose for the alerts to make sure they get noticed? When are they sent (instantaneously, at the end of a pipeline run, or at a fixed time)? And finally, are the alerts clear enough for the responders to understand what happened and the severity of the incident?
Alert Response
Who will see and respond to those notifications? Is there an on-call rotation? What is the agreed-upon response time? And do all stakeholders know who owns the response?
Stakeholder Communication
You’ll want to let data ...
Get 97 Things Every Data Engineer Should Know now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.