Chapter 8. Design and Refactoring

In this chapter, I want to move away from thinking about the finer details of each line of code you write and toward the bigger picture: how to design your projects, how to arrange your code, and how to refactor your code when that design changes. I’ll include some ideas for how to organize and standardize the high-level structure of your projects and I’ll suggest how to break your code into modular, reusable functions.

Good design, whether at the level of a whole project or at the level of individual functions, has a number of benefits for your code. If your project design is somewhat standardized, it removes some of the mental load of switching from one project to another. It’s easier for someone to work on your project if they have seen something similar before. If your code is well designed, it is easier to reuse pieces of it in other projects, and it is easier to add new features.

In my experience as a data scientist, I’ve seen many projects in which all the code is in one giant Jupyter notebook. I’ve created projects like this myself. A Jupyter notebook is a fantastic way to get started on a project, draft your ideas, and try things out. But notebooks can be limiting when your project scales up or becomes more complex. You can see a framework for turning your notebooks into Python scripts in “From Notebooks to Scalable Scripts”.

It’s sometimes difficult in data science to know exactly when to design the structure of your project. You may ...

Get Software Engineering for Data Scientists now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.