n c e d o w e w a n t t o d o w i t h S at e l l i t e D ata? Lahoz, W. A., & Schneider, P. (2014). Data assimilation: making sense of Earth Observation. Frontiers in Environmental Science, 2, 16. https://doi.org/10.3389/fenvs.2014.00016
if you want to access data that isn’t included? • Who pays? Are platform priorities aligned with scientific community? T h e T r o u b l e w i t h “ P l at f o r m s ” !17
i t e c t u r e Jupyter for interactive access remote systems Cloud / HPC Xarray provides data structures and intuitive interface for interacting with datasets Parallel computing system allows users deploy clusters of compute nodes for data processing. Dask tells the nodes what to do. Distributed storage “Analysis Ready Data” stored on globally-available distributed storage.
tools? (No agency or lab “owns” these, but they are critical infrastructure.) • Who provides cloud-style computing to the science community? • How do we avoid data silos? (I want both NASA + NOAA data in the same place) • How do we train (and retrain) scientists to feel comfortable with new tools cloud-native workflows? F u t u r e C h a l l e n g e s !26 Pangeo has the potential to transform Earth-System Science. But it’s not clear how to scale it.