Chapter 6. Fixing Data Quality Issues at Scale
Picture this: it’s Friday at 5 p.m., and you’re about to log off for the day. You start closing your tabs, packing up your bag, and settling into your weekend state of mind. Just as you’re about to turn off your laptop, you get an urgent Slack message from your CFO about a broken dashboard.
“The numbers are wrong in our quarterly results report,” she Slacks you. “I didn’t sign off on this!”
Assuming the issue is about the data itself and not rooted in your company’s shoddy financials, you have a serious case of data downtime on your hands. You frantically open Looker to find she’s right—the report looks way off and you have no idea why. You validated the numbers yesterday with her. Your charts and graphs were absolutely glowing with accuracy.
You pull up the source data (an Excel spreadsheet living on your desktop, “Financial Report V. 212 GOOD_I_ PROMISE_YES_GOOD”), but that confuses you even more. Dozens of emails, two phone calls, a few Zoom meetings, and seven hours later, you determined the culprit of the errant dashboard: a schema change upstream with a source table.
Great, you figured out what happened—now what?
For most data teams, pausing the pipeline and identifying the root cause of the issue at hand is just the tip of the iceberg when it comes to restoring data reliability and trust in your data.
Fixing Quality Issues in Software Development
Fortunately, analysts and engineers don’t need to reinvent the wheel when it ...
Get Data Quality Fundamentals now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.