Chapter 7. Unit Testing

Hayao Miyazaki’s movie The Wind Rises chronicles the life of Jiro, an aeronautical engineer who designs airplanes during World War II. While he sketches planes at his drafting desk, he imagines these ideas coming to life. As a plane soars into the sky, Jiro probes the design for failure points. He sees that his wing design is inadequate, watching as the wing rips off and the plane crashes to the ground.

From a spark of creativity to a ball of flames, perhaps you can relate to Jiro’s thought process. It’s important to consider the ways your design could fail and to correct bugs before they happen, which is why testing is a cornerstone of software development best practices. Fortunately, software is a lot easier to test than airplanes.

Data pipelines present a particular challenge for unit testing, with a multitude of interfaces, dependencies, and data needs to consider. This complexity often leads to heavy reliance on end-to-end testing, where a pipeline is run from start to finish using many of the cloud services, data sources, and sinks required for production operation. Not only is this approach costly in terms of cloud bills, it also wastes engineering resources by increasing the amount of time it takes to run tests, fix bugs, and develop new features.

In my experience, there are two primary drivers of an “end-to-end testing to rule them all” strategy. In Chapter 6 you saw one of these drivers, code design, and how to structure data pipeline code to ...

Get Cost-Effective Data Pipelines now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.