have labels which encode information about how the array values map to locations in space, time, etc. Xarray doesn’t just keep track of labels on arrays – it uses them to provide a powerful and concise interface
trees in parallel. 1. Prepare and clean our possibly large data, probably with a lot of Pandas wrangling 2. Set up XGBoost master and workers 3. Hand data our cleaned data from a bunch of distributed Pandas dataframes to XGBoost workers across our cluster
conventions giving you a powerful, format-agnostic interface for working with your data. It excels when working with multi-dimensional Earth Science data, where tabular representations become unwieldy and inefficient.
like Harvard medical school, Chan Zuckerberg, and Novartis 2. Finance like Barclays and Capital One 3. Geophysical Sciences like NASA, LANL, and the UK Met Office 4. Beamline facilities like Brookhaven 5. Retail like Walmart, JDA, and Grubhub` (which, as you know, is everywhere) https://youtu.be/t_GRK4L-bnw
images. Built with Python and Dask, but aimed at non-Python users “This makes it easy to quickly evaluate large datasets, which in turn helps us make analysis decisions faster” Napari
move complex workflows, code, and data between the cloud and their local workstation https://coiled.io/ Prefect Prefect is a new workflow management system, designed for modern infrastructure and powered by open-source software. https://www.prefect.io/ Saturn Cloud Saturn Cloud enables data scientists to work at scale using the tools they know best: Python, Jupyter, and Dask https://www.saturncloud.io/
production-grade communication framework for data-centric and high-performance applications. UCX Unified Communication X https://www.openucx.org/ https://www.openucx.org/
the standard big data analytics benchmark, known as TPCx-BB. Dask scaled the workload onto 16 DGX A100 machines with a total of 128 NVIDIA A100 GPUs. https://github.com/rapidsai/tpcx-bb
one per user) • A Proxy for proxying both the connection between the user’s client and their respective scheduler, and the Dask Web UI for each cluster • A central Gateway that manages authentication and cluster startup/shutdown
be hiring someone to focus on growth of Dask in the biological sciences field. If that is of interest keep an eye on our Twitter account for more updates. @dask_dev
built on Dask • Dask used to beat big data benchmarks • More info on CZI funded maintainer position coming soon... Learn More Jacob Tomlinson @_jacobtomlinson Dask Website dask.org Dask Twitter @dask_dev Take the Dask 2020 survey at dask.org/survey