CHAPTER 2Background: Modeling and the Black-Box Algorithm

We reviewed the basics of standard supervised predictive algorithms in Chapter 1, “Why Data Science Should Be Ethical.” In this chapter, we build upon that base level of understanding by doing the following:

Reviewing several important unsupervised algorithms, i.e., algorithms where there is no known outcome to train a model
Discussing how to assess the performance of prediction models
Exploring issues of interpretability, and how the inability to interpret black-box models can pose ethical challenges

Assessing Model Performance

Let's review traditional ways of assessing model performance. Measuring how well a predictive model performs depends on which of two categories a predictive model falls within.

Predicting class membership and predicting the probability of belonging to a class (classification)
Predicting a numerical value (regression)

Predicting Class Membership

The most common model type is one that predicts class membership—an image could be a dog or a cat, a loan could default or pay off, a web visitor could purchase or not, body tissue in an image might be malignant or benign, etc. Less commonly, the choice might fall among multiple categories (e.g., an auto accident might involve fatalities, injuries only, or property damage only). The most intuitive measure of a such a model is accuracy—what percentage of the predictions are accurate. In the typical binary classification case, the predictions versus ...

Get Responsible Data Science now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Responsible Data Science by Grant Fleming, Peter C. Bruce

CHAPTER 2Background: Modeling and the Black-Box Algorithm

Assessing Model Performance

Predicting Class Membership

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly