Posts by Sarah Johnson
One Trillion Row Challenge
- 05 February 2024
Last month Gunnar Morling launched the One Billion Row Challenge with the task of writing a Java program for retrieving temperature measurement values from a text file and calculating the min, mean, and max temperature per weather station. This took off greater than anyone would expect, gathering dozens of submissions from different tools.
1BRC in Python with Dask
- 16 January 2024
Last week Gunnar Morling launched the One Billion Row Challenge and it’s been fun to follow along. Though the official challenge is limited to Java implementations, we were inspired by an unofficial Python submission and have our own unofficial submission for Dask.
How to Run Your Jupyter Notebook on a GPU in the Cloud
- 10 October 2023
You can often significantly accelerate the time it takes to train your neural network by using advanced hardware, like GPUs. In this example, we’ll go through how to train a PyTorch neural network on a GPU in the cloud using Coiled notebooks.
Processing a 250 TB dataset with Coiled, Dask, and Xarray
- 05 September 2023
We processed 250TB of geospatial cloud data in twenty minutes on the cloud with Xarray, Dask, and Coiled. We do this to demonstrate scale and to think about costs.
How well does Dask run on Graviton?
- 05 May 2023
ARM-based processors are known for matching performance of x86-based instance types at a lower cost, since they consume far less energy for the same performance. It’s not surprising then that some companies, like Honeycomb, are switching their entire infrastructure to ARM.
Just in time Python environments
- 23 February 2023
Docker is a great tool for creating portable software environments, but we found it’s too slow for interactive exploration. We find that clusters depending on docker images often take 5+ minutes to launch. Ouch.