Dask Cloud Provider

Native Cloud integration for Dask.

This library creates Dask clusters on a given cloud provider with no set up other than having credentials. Currently, it only supports AWS.

Installation

Pip

$ pip install dask-cloudprovider

Conda

$ conda install -c conda-forge dask-cloudprovider

Below are the different modules for creating clusters on various cloud providers.

AWS

In order to create clusters on AWS you need to set your access key, secret key and region. The simplest way is to use the aws command line tool.

$ pip install awscli
$ aws configure

Fargate/ECS

The FargateCluster will create a new Fargate ECS cluster by default along with all the IAM roles, security groups, and so on that it needs to function.

from dask_cloudprovider import FargateCluster
cluster = FargateCluster()
⚠ All AWS resources created by FargateCluster should be removed on garbage collection. If the process is killed harshly this will not happen.

You can also create Dask clusters using EC2 based ECS clusters using ECSCluster.

Creating the ECS cluster is out of scope for this library but you can pass in the ARN of an existing one like this:

from dask_cloudprovider import ECSCluster
cluster = ECSCluster(cluster_arn="arn:aws:ecs:<region>:<acctid>:cluster/<clustername>")

All the other required resources such as roles, task definitions, tasks, etc will be created automatically like in FargateCluster.

GPU Support

There is also support in ECSCLuster for GPU aware Dask clusters. To do this you need to create an ECS cluster with GPU capable instances (from the p3 or p3dn families) and specify the number of GPUs each worker task should have.

from dask_cloudprovider import ECSCluster
cluster = ECSCluster(
    cluster_arn="arn:aws:ecs:<region>:<acctid>:cluster/<gpuclustername>",
    worker_gpu=1)