All Posts
Process Hundreds of GB of Data in the Cloud with Polars
- 17 November 2023
Local machines can struggle to process large datasets due to memory and network limitations. Coiled Functions provide a cloud-based solution that allows for efficient and cost-effective handling of such extensive datasets, overcoming the constraints of local hardware for complex data processing tasks. Incorporating libraries like Polars can further enhance this approach, leveraging optimized computation capabilities to process data more quickly and efficiently.

Processing Terabyte-Scale NASA Cloud Datasets with Coiled
- 01 November 2023
We show how to run existing NASA data workflows on the cloud, in parallel, with minimal code changes using Coiled. We also discuss cost optimization.
How to Run Your Jupyter Notebook on a GPU in the Cloud
- 10 October 2023
You can often significantly accelerate the time it takes to train your neural network by using advanced hardware, like GPUs. In this example, we’ll go through how to train a PyTorch neural network on a GPU in the cloud using Coiled notebooks.

TPC-H Benchmarks for Query Optimization with Dask Expressions
- 05 October 2023
Dask-expr is an ongoing effort to add a logical query optimization layer to Dask DataFrames. We now have the first benchmark results to share that were run against the current DataFrame implementation.
Coiled observability wins: Chunksize
- 19 September 2023
Distributed computing is hard, distributed debugging is even harder. Dask tries to simplify this process as much as possible. Coiled adds additional observability features for your Dask clusters and processes them to help users understand their workflows better.

Parallel Serverless Functions at Scale
- 07 September 2023
The cloud offers amazing scale, but it can be difficult for Python data developers to use. This post walks through how to use Coiled Functions to run your existing code in parallel on the cloud with minimal code changes.
Processing a 250 TB dataset with Coiled, Dask, and Xarray
- 05 September 2023
We processed 250TB of geospatial cloud data in twenty minutes on the cloud with Xarray, Dask, and Coiled. We do this to demonstrate scale and to think about costs.

Reduce training time for CPU intensive models with scikit-learn and Coiled Functions
- 01 September 2023
You can use Coiled Run and Coiled Functions for easily running scripts and functions on a VM in the cloud.

Fine Performance Metrics and Spans
- 23 August 2023
While it’s trivial to measure the end-to-end runtime of a Dask workload, the next logical step - breaking down this time to understand if it could be faster - has historically been a much more arduous task that required a lot of intuition and legwork, for novice and expert users alike. We wanted to change that.

Data-proximate Computing with Coiled Functions
- 10 August 2023
Coiled Functions make it easy to improve performance and reduce costs by moving your computations next to your cloud data.

Dask, Dagster, and Coiled for Production Analysis at OnlineApp
- 09 August 2023
We show a simple integration between Dagster and Dask+Coiled. We discuss how this made a common problem, processing a large set of files every month, really easy.
Process Hundreds of GB of Data with DuckDB in the Cloud
- 07 August 2023
DuckDB is great tool for running efficient queries on large datasets. When you want cloud data proximity or need more RAM, Coiled makes it easy to run your Python function in the cloud. In this post we’ll use Coiled Functions to process the 150 GB Uber-Lyft dataset on a single machine with DuckDB.

High Level Query Optimization in Dask
- 04 August 2023
Dask DataFrame doesn’t currently optimize your code for you (like Spark or a SQL database would). This means that users waste a lot of computation. Let’s look at a common example which looks ok at first glance, but is actually pretty inefficient.
Easy Heavyweight Serverless Functions
- 01 August 2023
What is the easiest way to run Python code in the cloud, especially for compute jobs?
How to Train a Neural Network on a GPU in the Cloud with coiled functions
- 24 July 2023
We recently pushed out two new and experimental features coiled run
and coiled functions
which is a deviation of coiled run
. We are excited about both of them because they:
Dask performance benchmarking put to the test: Fixing a pandas bottleneck
- 23 June 2023
Getting notified of a significant performance regression the day before release sucks, but quickly identifying and resolving it feels great!
Coiled notebooks
- 14 June 2023
We recently pushed out a new, experimental notebooks feature for easily launching Jupyter servers in the cloud from your local machine. We’re excited about Coiled notebooks because they:
Utilizing PyArrow to improve pandas and Dask workflows
- 05 June 2023
Get the most out of PyArrow support in pandas and Dask right now
Distributed printing
- 18 May 2023
Dask makes it easy to print whether you’re running code locally on your laptop, or remotely on a cluster in the cloud.

Observability for Distributed Computing with Dask
- 16 May 2023
Debugging is hard. Distributed debugging is hell.
When dealing with unexpected issues in a distributed system, you need to understand what and why it happened, how interactions between individual pieces contributed to the problems, and how to avoid them in the future. In other words, you need observability. This article explains what observability is, how Dask implements it, what pain points remain, and how Coiled helps you overcome these.

Performance testing at Coiled
- 05 May 2023
At Coiled we develop Dask and automatically deploy it to large clusters of cloud workers (sometimes 1000+ EC2 instances at once!). In order to avoid surprises when we publish a new release, Dask needs to be covered by a comprehensive battery of tests — both for functionality and performance.

How well does Dask run on Graviton?
- 05 May 2023
ARM-based processors are known for matching performance of x86-based instance types at a lower cost, since they consume far less energy for the same performance. It’s not surprising then that some companies, like Honeycomb, are switching their entire infrastructure to ARM.

Upstream testing in Dask
- 18 April 2023
Dask has deep integrations with other libraries in the PyData ecosystem like NumPy, pandas, Zarr, PyArrow, and more. Part of providing a good experience for Dask users is making sure that Dask continues to work well with this community of libraries as they push out new releases. This post walks through how Dask maintainers proactively ensure Dask continuously works with its surrounding ecosystem.
Burstable vs non-burstable AWS instance types for data engineering workloads
- 04 April 2023
There are many instance types to choose from on AWS. In this post, we’ll look at one choice you can make—burstable vs non-burstable instances—and show how the “cheaper” burstable option can end up being more expensive for data engineering workloads.
Shuffling large data at constant memory in Dask
- 15 March 2023
With release 2023.2.1
, dask.dataframe
introduces a new shuffling method called P2P, making sorts, merges, and joins faster and using constant memory.
Benchmarks show impressive improvements:

Just in time Python environments
- 23 February 2023
Docker is a great tool for creating portable software environments, but we found it’s too slow for interactive exploration. We find that clusters depending on docker images often take 5+ minutes to launch. Ouch.
How many PEPs does it take to install a package?
- 17 January 2023
A few months ago we released package sync, a feature that takes your Python environment and replicates it in the cloud with zero effort.
Scaling Hyperparameter Optimization With XGBoost, Optuna, and Dask
- 06 January 2023
XGBoost is one of the most well-known libraries among data scientists, having become one of the top choices among Kaggle competitors. It is performant in a wide of array of supervised machine learning problems, implements scalable training through the rabit library, and integrates with many big data processing tools, including Dask.

Handling Unexpected AWS IAM Changes
- 06 January 2023
The cloud is tricky! You might think the rules that determine which IAM permissions are required for which actions will continue to apply in the same way. You might think they’d apply the same way to different AWS accounts. Or that if these things aren’t true, at least AWS will let you know. (I did.) You’d be wrong!
AWS Cost Explorer Tips and Tricks
- 06 January 2023
Spending time in AWS Cost Explorer is one of the best ways to understand what’s going on in your AWS account. It’s one of the few places in the AWS Console where you can get a global view of your account or even of your entire organization.

Automated Data Pipelines On Dask With Coiled & Prefect
- 19 December 2022
Dask is widely used among data scientists and engineers proficient in Python for interacting with big data, doing statistical analysis, and developing machine learning models. Operationalizing this work has traditionally required lengthy code rewrites, which makes moving from development and production hard. This gap slows business progress and increases risk for data science and data engineering projects in an enterprise setting. The need to remove this bottleneck has prompted the emergence of production deployment solutions that allow code written by data scientists and engineers to be directly deployed to production, unlocking the power of continuous deployment for pure Python data science and engineers.
