Chapter 2. Getting Started with Dask

We are so happy that you’ve decided to explore whether Dask is the system for you by trying it out. In this chapter, we will focus on getting started with Dask in its local mode. Using this, we’ll explore a few more straightforward parallel computing tasks (including everyone’s favorite, word count).1

Installing Dask Locally

Installing Dask locally is reasonably straightforward. If you want to begin running on multiple machines, doing so is often easier when you start with a conda environment (or virtualenv). This lets you figure out what packages you depend on by running pip freeze to make sure they’re on all of the workers when it’s time to scale.

While you can just run pip install -U dask, we prefer using a conda environment since it’s easier to match the version of Python to that on a cluster, which allows us to connect a local machine to the cluster directly.2 If you don’t already have conda on your machine, Miniforge is a good and quick way to get conda installed across multiple platforms. The installation of Dask into a new conda environment is shown in Example 2-1.

Example 2-1. Installing Dask into a new conda environment
conda create -n dask python=3.8.6  mamba -y
conda activate dask
mamba install --yes python==3.8.6 cytoolz dask==2021.7.0 numpy \
      pandas==1.3.0 beautifulsoup4 requests

Here we install a specific version of Dask rather than just the latest version. If you’re planning to connect to a cluster later on, it will be useful ...

Get Scaling Python with Dask now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.