CHAPTER 2Background: Modeling and the Black-Box Algorithm
We reviewed the basics of standard supervised predictive algorithms in Chapter 1, “Why Data Science Should Be Ethical.” In this chapter, we build upon that base level of understanding by doing the following:
- Reviewing several important unsupervised algorithms, i.e., algorithms where there is no known outcome to train a model
- Discussing how to assess the performance of prediction models
- Exploring issues of interpretability, and how the inability to interpret black-box models can pose ethical challenges
Assessing Model Performance
Let's review traditional ways of assessing model performance. Measuring how well a predictive model performs depends on which of two categories a predictive model falls within.
- Predicting class membership and predicting the probability of belonging to a class (classification)
- Predicting a numerical value (regression)
Predicting Class Membership
The most common model type is one that predicts class membership—an image could be a dog or a cat, a loan could default or pay off, a web visitor could purchase or not, body tissue in an image might be malignant or benign, etc. Less commonly, the choice might fall among multiple categories (e.g., an auto accident might involve fatalities, injuries only, or property damage only). The most intuitive measure of a such a model is accuracy—what percentage of the predictions are accurate. In the typical binary classification case, the predictions versus ...
Get Responsible Data Science now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.