Jupyter Insights: Andreas Mueller, a lecturer at the Data Science Institute at Columbia University
Approaches to data analysis, iterative workflows, and writing a book with Jupyter.
Andreas Mueller is a lecturer at the Data Science Institute at Columbia University and co-author of “Introduction to Machine Learning with Python,” which describes a practical approach to machine learning with Python and scikit-learn. He also is a core developer of the scikit-learn library.
Below, Mueller shares his thoughts on the current and future state of Jupyter. He will also be speaking at JupyterCon, August 22-25, 2017, in New York City.
1. How has Jupyter changed the way you work?
Jupyter has profoundly changed the way I do data analysis, prototyping, and education. For prototyping and data analysis, the main benefit is how Jupyter Notebooks support an iterative workflow, that starts with a simple solution and allows the user to gradually increase perplexity. In a less interactive setting, you would have to start over again each time; while in a terminal application, working with any real amount of code becomes cumbersome.
2. How do you expect Jupyter to be extended in the coming year?
I think with JupyterLab we will have more of an IDE-like environment, which is good for some applications. I also expect to see more integration of packages with the possibilities of the interface provided by Jupyter. I’m actively working on how scikit-learn can better make use of the possibility of user interactions in a browser. I’m hoping to see progress in terms of reproducibility and collaboration, and nbdime has been a great step forward. Still, version control—and therefore collaboration—is still a big issue.
3. What will you be talking about at JupyterCon?
Doing data analysis with Jupyter and how to write a book with Jupyter. I used the notebook as the primary format to write my book Introduction to Machine Learning with Python.
Writing using notebooks was great, and it makes for a very natural experience that I can directly share with the readers.
However, because of the use of markdown and the specific markdown engine, some things are not possible in a way that nbconvert understands—for example, references between notebooks or within notebooks, and captions for figures and images (in a way that would make sense for a book). Even some simple things like column spans in tables are not easily possible. I will talk about the problems I faced and my workarounds, and how support for writing in notebooks can be improved in the future.
4. What sessions are you looking forward to seeing at JupyterCon?
I’m particularly interested in the sessions around reproducibility and collaboration, but also for setting up multi-user JupyterHub systems and educational uses of Jupyter.