Chapter 10. Improving the Modeling Experience: Fairness Evaluation and Hyperparameter Tuning
Getting ML models to work well is an iterative process. It requires many rounds of tuning the model parameters, architectures, and training durations. While you have to work with the data that’s available, of course, ideally you want the training data to be balanced. In other words, it should contain an equal number of classes or uniform distribution across ranges.
Why is this balance important? Because if any features in the data are skewed, then the trained model will reproduce that skew. This is known as model bias.
Imagine that you’re training a model to recognize dogs. If there are 99 negative cases and 1 positive case in your training images—that is, only one actual dog image—then the model will simply predict a negative result every time, with a 99% chance of being correct. The model learns to minimize the errors it makes during training, and the easiest way to do so is to produce a negative prediction—in short, to guess “not a dog” every time. This is known as the data imbalance problem, and it is prevalent in the real world; it’s also a complicated subject to which I cannot do justice here. It requires many different approaches, including adding synthetic data through a technique known as data augmentation.
In this chapter, I’ll introduce you to Fairness Indicators, a new tool (as of this writing) to evaluate model bias. It is part of the TensorFlow Model Analysis library and ...
Get TensorFlow 2 Pocket Reference now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.