How it works...

First, we create the StandardScaler(...) object. The two parameters set to Truethe former stands for mean, the latter stands for standard deviation—indicate that we want the model to standardize our features using Z-score: , where  is the ith observation of the f feature, μf is the mean of all the observations in the f feature, and σf is the standard deviation of all the observations in the f feature.

Next, we .fit(...) the data using StandardScaler(...). Note that we do not standardize the first feature as it is actually our ...

Get PySpark Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.