Chapter 6. Automated Testing: ML Model Tests

In the previous chapter, we saw the price we pay for not having automated tests in ML solutions, and the benefits that tests bring to teams in terms of quality, flow, cognitive load, and satisfaction. We outlined the building blocks of a comprehensive test strategy and dived into details for the first category of tests: software tests.

In this chapter, we will explore the next category of tests: ML model tests (or model tests, for short). As large language models (LLMs) have taken the world by storm, we’ll also cover techniques for testing LLMs and LLM applications.

In addition, we’ll explore practices that complement ML model tests, such as visualization and error analysis, closing the data collection loop, and open-closed test design. We’ll also discuss data tests briefly before concluding with concrete next steps that can help you implement these tests in your ML systems.

In this chapter, we will focus on offline testing at scale, and we won’t cover online testing techniques (e.g., A/B testing, bandits, interleaving experiments) as they are well covered in Chip Huyen’s great book Designing Machine Learning Systems (O’Reilly).

Model Tests

ML practitioners are no strangers to manual model evaluation procedures, and while the exploratory nature of such tests is useful in early phases of developing a model, this manual work easily becomes overly time-consuming and tedious. As we identify measures and heuristics that tell us if a model ...

Get Effective Machine Learning Teams now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.