Practical Data Cleaning with Python
Katharine Jarmul will show you how to use Python libraries to speed up the data wrangling process and automate data cleaning, how to handle messy data, and how to write data unit tests that monitor data validity.
It’s a commonly cited statistic that data scientists spend roughly 80% of their time processing, wrangling, and munging their data and only 20% actually analyzing it. Speeding up the time you spend cleaning your data even a small amount can lead to valuable gains down the line.
Join expert Katharine Jarmul for a hands-on, in-depth exploration of practical data cleaning with Python, as she highlights the tools that can help speed up the data wrangling process and automate (or at least allow for general scripting) of some of the repetitive processes. You’ll get an overview of best libraries and tools to use when handling messy data and learn how to apply software development practices to data wrangling problems by writing data unit tests, which allow you to catch problems before they have created innacurate data for your entire company. Along the way, you’ll explore a few case studies to see the application of these techniques on real-world data problems.