Posts tagged performance

TPC-H Benchmarks for Query Optimization with Dask Expressions

Dask-expr is an ongoing effort to add a logical query optimization layer to Dask DataFrames. We now have the first benchmark results to share that were run against the current DataFrame implementation.

Read more ...


Fine Performance Metrics and Spans

While it’s trivial to measure the end-to-end runtime of a Dask workload, the next logical step - breaking down this time to understand if it could be faster - has historically been a much more arduous task that required a lot of intuition and legwork, for novice and expert users alike. We wanted to change that.

Populated Fine Performance Metrics dashboard

Read more ...


High Level Query Optimization in Dask

Dask DataFrame doesn’t currently optimize your code for you (like Spark or a SQL database would). This means that users waste a lot of computation. Let’s look at a common example which looks ok at first glance, but is actually pretty inefficient.

Read more ...


Dask performance benchmarking put to the test: Fixing a pandas bottleneck

Getting notified of a significant performance regression the day before release sucks, but quickly identifying and resolving it feels great!

Read more ...


Performance testing at Coiled

At Coiled we develop Dask and automatically deploy it to large clusters of cloud workers (sometimes 1000+ EC2 instances at once!). In order to avoid surprises when we publish a new release, Dask needs to be covered by a comprehensive battery of tests — both for functionality and performance.

Nightly tests report

Read more ...