Preface

The work that researchers do to prepare data for analysis – extraction, transformation, cleaning, and exploration – has not changed fundamentally with the increased popularity of machine learning tools. When we prepared data for multivariate analyses 30 years ago, we were every bit as concerned with missing values, outliers, the shape of the distribution of our variables, and how variables correlate, as we are when we use machine learning algorithms now. Although it is true that widespread use of the same libraries for machine learning (scikit-learn, TensorFlow, PyTorch, and others) does encourage greater uniformity in approach, good data cleaning and exploration practices are largely unchanged.

How we talk about machine learning is ...

Get Data Cleaning and Exploration with Machine Learning now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.