Chapter 11. Comparing Models with Resampling

Once we create two or more models, the next step is to compare them to understand which one is best. In some cases, comparisons might be within-model, where the same model might be evaluated with different features or preprocessing methods. Alternatively, between-model comparisons, such as when we compared linear regression and random forest models in ChapterÂ 10, are the more common scenario.

In either case, the result is a collection of resampled summary statistics (e.g.,Â RMSE, accuracy, etc.) for each model. In this chapter, weâll first demonstrate how workflow sets can be used to fit multiple models. Then, weâll discuss important aspects of resampling statistics. Finally, weâll look at how to formally compare models (using either hypothesis testing or a Bayesian approach).

Creating Multiple Models with Workflow Sets

In ChapterÂ 7 we described the idea of a workflow set where different preprocessors and/or models can be combinatorially generated. In ChapterÂ 10, we used a recipe for the Ames data that included an interaction term as well as spline functions for longitude and latitude. To demonstrate more with workflow sets, letâs create three different linear models that add these preprocessing steps incrementally; we can test whether these additional terms improve the model results. Weâll create three recipes then combine them into a workflow set:

library(tidymodels)
tidymodels_prefer()

basic_rec <-
  recipe(Sale_Price ...

Get Tidy Modeling with R now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Tidy Modeling with R by Max Kuhn, Julia Silge

Chapter 11. Comparing Models with Resampling

Creating Multiple Models with Workflow Sets

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly