Chapter 9. Introduction to Data Analysis

So far, this book has focused mostly on the logistics of acquiring, assessing, transforming, and augmenting data. We’ve explored how to write code that can retrieve data from the internet, extract it from unfriendly formats, evaluate its completeness, and account for inconsistencies. We’ve even spent some time thinking about how to make sure that the tools we use to do all this—our Python scripts—are optimized to meet our needs, both now and in the future.

At this point, though, it’s time to revisit the why of all this work. Back in “What Is “Data Wrangling”?”, I described the purpose of data wrangling as transforming “raw” data into something that can be used to generate insight and meaning. But unless we follow through with at least some degree of analysis, there’s no way to know if our wrangling efforts were sufficient—or what insights they might produce. In that sense, stopping our data wrangling work at the augmentation/transformation phase would be like setting up your mise en place and then walking out of the kitchen. You don’t spend hours carefully prepping vegetables and measuring ingredients unless you want to cook. And that’s what data analysis is: taking all that beautifully cleaned and prepared data and turning it into new insight and knowledge.

If you fear we’re slipping into abstractions again, don’t worry—the fundamentals of data analysis are simple and concrete enough. Like our data quality assessments, however, they are ...

Get Practical Python Data Wrangling and Data Quality now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.