Chapter 15. Linear Models

At this point in the book, we’ve covered the four stages of the data science lifecycle to different extents. We’ve talked about formulating questions and obtaining and cleaning data, and we’ve used exploratory data analysis to better understand the data. In this chapter, we extend the constant model introduced in Chapter 4 to the linear model. Linear models are a popular tool in the last stage of the lifecycle: understanding the world.

Knowing how to fit linear models opens the door to all kinds of useful data analyses. We can use these models to make predictions—for example, environmental scientists developed a linear model to predict air quality based on air sensor measurements and weather conditions (see Chapter 12). In that case study, understanding how measurements from two instruments varied enabled us to calibrate inexpensive sensors and improve their air quality readings. We can also use these models to make inferences about the form of a relationship between features—for example, in Chapter 18 we’ll see how veterinarians used a linear model to infer the coefficients for length and girth for a donkey’s weight: L e n g t h   +   2 × G i r t h     175 . In that case study, the model enables vets working in the field to prescribe medication for sick donkeys. Models can also help describe relationships and provide insights—for example, in this chapter we explore relationships between factors correlated with upward mobility, such as commute time, ...

Get Learning Data Science now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.