Chapter 3. Data Quality Specifications

This chapter defines valid (within tolerance), suspect (approaching tolerance bounds), and invalid (out of tolerance) conditions, relative to the applicable data dimensions at the datum level. These conditions reflect the data quality specifications (DQS) of the downstream consumer. This chapter will detail an approach to data validation that ensures alignment with the DQS of a consumer.

Manufacturing Controls

Recall from Chapter 1 that manufacturing refers to the production of products using labor, machines, tools, chemical and biological processing, or formulation. Industrial manufacturing is the transformation of raw materials into finished products at scale. The manufacturing processes in the production pipeline are controlled using quality control and assurance plans and specifications.

Just like manufacturing uses control specifications, the financial industry uses precise DQS to engineer data quality validations, to control data quality, and to identify data anomalies. Data quality is assessed before the data is provisioned to downstream processes, applications, or consumers. DQS will differ depending on the consumer use cases. Pre-use data validations prevent data that does not satisfy DQS from polluting the downstream data ecosystem. Data quality validations use anomaly and outlier detection techniques that identify items, events, patterns, and observations that do not conform to specifications, tolerances, and expected ...

Get Data Quality Engineering in Financial Services now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.