Chapter 1. Preparing the Data

In this chapter, we will cover the basic tasks of reading, storing, and cleaning data using Python and OpenRefine. You will learn the following recipes:

  • Reading and writing CSV/TSV files with Python
  • Reading and writing JSON files with Python
  • Reading and writing Excel files with Python
  • Reading and writing XML files with Python
  • Retrieving HTML pages with pandas
  • Storing and retrieving from a relational database
  • Storing and retrieving from MongoDB
  • Opening and transforming data with OpenRefine
  • Exploring the data with OpenRefine
  • Removing duplicates
  • Using regular expressions and GREL to clean up the data
  • Imputing missing observations
  • Normalizing and standardizing features
  • Binning the observations
  • Encoding categorical variables

Introduction ...

Get Practical Data Analysis Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.