Chapter 2. Getting Started with Dask
We are so happy that you’ve decided to explore whether Dask is the system for you by trying it out. In this chapter, we will focus on getting started with Dask in its local mode. Using this, we’ll explore a few more straightforward parallel computing tasks (including everyone’s favorite, word count).1
Installing Dask Locally
Installing Dask locally is reasonably straightforward. If you want to begin running on multiple machines, doing so is often easier when you start with a conda environment (or virtualenv). This lets you figure out what packages you depend on by running pip freeze
to make sure they’re on all of the workers when it’s time to scale.
While you can just run pip install -U dask
, we prefer using a conda environment since it’s easier to match the version of Python to that on a cluster, which allows us to connect a local machine to the cluster directly.2 If you don’t already have conda on your machine, Miniforge is a good and quick way to get conda installed across multiple platforms. The installation of Dask into a new conda environment is shown in Example 2-1.
Example 2-1. Installing Dask into a new conda environment
condacreate
-n
dask
python
=
3
.8.6mamba
-y conda
activate
dask mamba
install
--yes
python
==
3
.8.6cytoolz
dask
==
2021
.7.0numpy
\
pandas
==
1
.3.0beautifulsoup4
requests
Here we install a specific version of Dask rather than just the latest version. If you’re planning to connect to a cluster later on, it will be useful ...
Get Scaling Python with Dask now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.