Chapter 5. Data Analysis with pandas

This chapter will introduce you to pandas, the Python Data Analysis Library or—how I like to put it—the Python-based spreadsheet with superpowers. It’s so powerful that some of the companies that I worked with have managed to get rid of Excel completely by replacing it with a combination of Jupyter notebooks and pandas. As a reader of this book, however, I assume you will keep Excel, in which case pandas will serve as an interface for getting data in and out of spreadsheets. pandas makes tasks that are particularly painful in Excel easier, faster, and less error-prone. Some of these tasks include getting big datasets from external sources and working with statistics, time series, and interactive charts. pandas’ most important superpowers are vectorization and data alignment. As we’ve already seen in the previous chapter with NumPy arrays, vectorization allows you to write concise, array-based code while data alignment makes sure that there is no data mismatch when you work with multiple datasets.

This chapter covers the whole data analysis journey: it starts with cleaning and preparing data before it shows you how to make sense out of bigger datasets via aggregation, descriptive statistics, and visualization. At the end of the chapter, we’ll see how we can import and export data with pandas. But first things first—let’s get started with an introduction to pandas’ main data structures: DataFrame and Series!

DataFrame and Series

DataFrame

Get Python for Excel now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.