Chapter 9: Data Modeling – Preprocessing

In this chapter, you will learn two important processes used to prepare data for modeling – splitting and scaling. You will learn how to use the sklearn methods – .StandardScaler and .MinMaxScaler for scaling, and .train_test_split for splitting. You will also be introduced to the reasons behind scaling and exactly what these methods do. As part of exploring splitting and scaling, you will use sklearn LinearRegression and statsmodels to create simple linear regression models.

By the end of this chapter, you will be comfortable preparing datasets to begin modeling. The main ideas you will learn in this chapter are as follows:

  • Exploring independent and dependent variables
  • Understanding data scaling and ...

Get The Pandas Workshop now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.