Chapter 8. Structuring and Refactoring Your Code

Before we move on to the analyzing and visualizing aspects of data wrangling, we’re going to take a brief “detour” to discuss some strategies for making the most of everything we’ve done so far. In the last few chapters, we’ve explored how to access and parse data from a variety of data formats and sources, how to evaluate its quality in practical terms, and how to clean and augment it for eventual analysis. In the process, our relatively simple programs have evolved and changed, becoming—inevitably—more convoluted and complex. Our for loops now have one or (more) nested if statements, and some of those now have apparently “magic” numbers embedded in them (like our the_date.weekday() <= 4 in Example 7-5). Is this just the price of more functional code?

Remember that commenting our code can do a lot to help keep the logic of our scripts understandable, both to potential collaborators and our future selves. But it turns out that detailed documentation (much as I love it) isn’t the only way that we improve the clarity of our Python code. Just like other types of written documents, Python supports a range of useful mechanisms for structuring and organizing our code. By making judicious use of these, we can make it simpler to both use and reuse our programming work down the line.

So in this chapter, we’re going to go over the tools and concepts that will allow us to refine our code in such a way that it is both readable and reusable. ...

Get Practical Python Data Wrangling and Data Quality now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.