collected contiguously in 1+ dimensions (e.g. satellite image) and chunked in another (e.g. time) • But then we want to analyze it along the time axis • This requires rechecking • dask.array.rechunk often fails • Huge graphs • Workers running out of memory
1+ dimensions (e.g. satellite image) and chunked in another (e.g. time) • But then we want to analyze it along the time axis • This requires rechecking • dask.array.rechunk often fails • Huge graphs • Workers running out of memory The Problem: Rechunking
Array Shape Conserved Nt = const, Nx = const Number of Chunks Nt = nt Ct , Nx = nx Cx Chunk Size Sc = Ct Cx Chunk-Size-Preserving Operations C0 t C0 x = C1 t C1 x Full-Shuffle Rechunk C0 x = Nx , C1 t = Nt C1 x = C0 t Nx /Nt