Processing a 250 TB dataset with Coiled, Dask, and Xarray

We processed 250TB of geospatial cloud data in twenty minutes on the cloud with Xarray, Dask, and Coiled. We do this to demonstrate scale and to think about costs.

County-level heat map of the continental US showing mean depth to soil saturation (in meters) in 2020.

How well does Dask run on Graviton?

ARM-based processors are known for matching performance of x86-based instance types at a lower cost, since they consume far less energy for the same performance. It’s not surprising then that some companies, like Honeycomb, are switching their entire infrastructure to ARM.

bar chart of AWS cost vs. processor type

Just in time Python environments

Docker is a great tool for creating portable software environments, but we found it’s too slow for interactive exploration. We find that clusters depending on docker images often take 5+ minutes to launch. Ouch.


