Chapter 6. Logistic Regression and Classification

In this chapter we are going to cover logistic regression, a type of regression that predicts a probability of an outcome given one or more independent variables. This in turn can be used for classification, which is predicting categories rather than real numbers as we did with linear regression.

We are not always interested in representing variables as continuous, where they can represent an infinite number of real decimal values. There are situations where we would rather variables be discrete, or representative of whole numbers, integers, or booleans (1/0, true/false). Logistic regression is trained on an output variable that is discrete (a binary 1 or 0) or a categorical number (which is a whole number). It does output a continuous variable in the form of probability, but that can be converted into a discrete value with a threshold.

Logistic regression is easy to implement and fairly resilient against outliers and other data challenges. Many machine learning problems can best be solved with logistic regression, offering more practicality and performance than other types of supervised machine learning.

Just like we did in Chapter 5 when we covered linear regression, we will attempt to walk the line between statistics and machine learning, using tools and analysis from both disciplines. Logistic regression will integrate many concepts we have learned from this book, from probability to linear regression.

Understanding Logistic ...

Get Essential Math for Data Science now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.