Upgrade to Pro — share decks privately, control downloads, hide ads and more …

CMIP6 in the Cloud-A Prototype to Break Barrier...

Julius Busecke
September 26, 2024
5

CMIP6 in the Cloud-A Prototype to Break Barriers and Accelerate Collaboration

Slides for "CMIP6 in the Cloud:A Prototype to Break Barriers and Accelerate Collaboration" presented at Navigating Weather and Climate Data Event - Earthmover Climate Week 2024 Event on September 26, 2024 by Julius Busecke.

Julius Busecke

September 26, 2024
Tweet

More Decks by Julius Busecke

Transcript

  1. JULIUS BUSECKE* SEP 24 CMIP6 IN THE CLOUD: A PROTOTYPE

    TO BREAK BARRIERS AND ACCELERATE COLLABORATION *THANKS TO MY COLLABORATORS, ESPECIALLY CHARLES STERN!
  2. WHO AM I? M²LInES jbusecke juliusbusecke.com @JuliusBusecke @[email protected] @codeandcurrents.bsky.social 🌊

    Climate Scientist Ocean transport of Heat, Carbon Oxygen Impact of small scale processes on global climate variability. 🤓Developer/Data Nerd Pangeo CMIP6 Cloud Data xMIP/xGCM 🤝 Open Science Advocate Manager for Data and Computation - NSF- LEAP Lead of Open Research - m2lines
  3. "THE INTERGOVERNMENTAL PANEL ON CLIMATE CHANGE (IPCC) IS THE UNITED

    NATIONS BODY FOR ASSESSING THE SCIENCE RELATED TO CLIMATE CHANGE." WWW.IPCC.CH/
  4. "THE INTERGOVERNMENTAL PANEL ON CLIMATE CHANGE (IPCC) IS THE UNITED

    NATIONS BODY FOR ASSESSING THE SCIENCE RELATED TO CLIMATE CHANGE." WWW.IPCC.CH/
  5. "THE INTERGOVERNMENTAL PANEL ON CLIMATE CHANGE (IPCC) IS THE UNITED

    NATIONS BODY FOR ASSESSING THE SCIENCE RELATED TO CLIMATE CHANGE." WWW.IPCC.CH/ Comparing Simulations to observations
  6. "THE INTERGOVERNMENTAL PANEL ON CLIMATE CHANGE (IPCC) IS THE UNITED

    NATIONS BODY FOR ASSESSING THE SCIENCE RELATED TO CLIMATE CHANGE." WWW.IPCC.CH/ Comparing Simulations to observations Future Predictions
  7. "THE INTERGOVERNMENTAL PANEL ON CLIMATE CHANGE (IPCC) IS THE UNITED

    NATIONS BODY FOR ASSESSING THE SCIENCE RELATED TO CLIMATE CHANGE." WWW.IPCC.CH/ Comparing Simulations to observations Future Predictions Emission Scenarios
  8. "THE INTERGOVERNMENTAL PANEL ON CLIMATE CHANGE (IPCC) IS THE UNITED

    NATIONS BODY FOR ASSESSING THE SCIENCE RELATED TO CLIMATE CHANGE." WWW.IPCC.CH/ Comparing Simulations to observations Future Predictions Emission Scenarios Model Spread (Uncertainty)
  9. COUPLED MODEL INTERCOMPARISON PROJECT The objective of CMIP is to

    better understand past, present and future climate changes arising from natural, unforced variability or in response to changes in radiative forcing in a multi-model context. [...] 1000s of individual simulations - 20PB data More use-cases and opportunities to explore
  10. - Many 100.000s of individual datasets - Each dataset is

    identi fi ed by a unique id, consisting of 'facets' - Facets are part of CMIP controlled vocabulary (https://wcrp-cmip.github.io/CMIP6_CVs/) CMIP DATA https://wcrp-cmip.org/cmip-data-access/#access-routes CMIP Cycle MIP activity Modelling Center Model Code Experiment/forcing scenarios Ensemble member Output Variable Model Grid CMIP6.ScenarioMIP.NOAA-GFDL.GFDL-CM4.ssp585.r1i1p1f1.Omon.thetao.gn mip_era.activity_id.institution_id.source_id.experiment_id.member_id.table_id.variable_id.grid_label.version di ff erent simulations components of a single simulation example MIP table used
  11. - Many 100.000s of individual datasets - Each dataset is

    identi fi ed by a unique id, consisting of 'facets' - Facets are part of CMIP controlled vocabulary (https://wcrp- cmip.github.io/CMIP6_CVs/) - Variable names are standardized using the CMOR (https:// cmor.llnl.gov/) library. CMIP DATA https://wcrp-cmip.org/cmip-data-access/#access-routes CMIP Cycle MIP activity Modelling Center Model Code Experiment/forcing scenarios Ensemble member Output Variable Model Grid CMIP6.ScenarioMIP.NOAA-GFDL.GFDL-CM4.ssp585.r1i1p1f1.Omon.thetao.gn mip_era.activity_id.institution_id.source_id.experiment_id.member_id.table_id.variable_id.grid_label.version di ff erent simulations components of a single simulation example MIP table used https://wcrp-cmip.org/map/#map_of_modelling_centres_and_esgf_nodes
  12. - Many 100.000s of individual datasets - Each dataset is

    identi fi ed by a unique id, consisting of 'facets' - Facets are part of CMIP controlled vocabulary (https://wcrp- cmip.github.io/CMIP6_CVs/) - Variable names are standardized using the CMOR (https:// cmor.llnl.gov/) library. CMIP DATA https://wcrp-cmip.org/cmip-data-access/#access-routes CMIP Cycle MIP activity Modelling Center Model Code Experiment/forcing scenarios Ensemble member Output Variable Model Grid CMIP6.ScenarioMIP.NOAA-GFDL.GFDL-CM4.ssp585.r1i1p1f1.Omon.thetao.gn mip_era.activity_id.institution_id.source_id.experiment_id.member_id.table_id.variable_id.grid_label.version di ff erent simulations components of a single simulation example MIP table used https://wcrp-cmip.org/map/#map_of_modelling_centres_and_esgf_nodes
  13. CMIP DATA https://wcrp-cmip.org/cmip-data-access/#access-routes CMIP Cycle MIP activity Modelling Center Model

    Code Experiment/forcing scenarios Ensemble member Output Variable Model Grid CMIP6.ScenarioMIP.NOAA-GFDL.GFDL-CM4.ssp585.r1i1p1f1.Omon.thetao.gn mip_era.activity_id.institution_id.source_id.experiment_id.member_id.table_id.variable_id.grid_label.version di ff erent simulations components of a single simulation example MIP table used
  14. CMIP DATA https://wcrp-cmip.org/cmip-data-access/#access-routes CMIP Cycle MIP activity Modelling Center Model

    Code Experiment/forcing scenarios Ensemble member Output Variable Model Grid CMIP6.ScenarioMIP.NOAA-GFDL.GFDL-CM4.ssp585.r1i1p1f1.Omon.thetao.gn mip_era.activity_id.institution_id.source_id.experiment_id.member_id.table_id.variable_id.grid_label.version di ff erent simulations components of a single simulation example MIP table used Hingray and Saïd 2014
  15. CMIP DATA https://expearth.uib.no/?page_id=28 CMIP Cycle MIP activity Modelling Center Model

    Code Experiment/forcing scenarios Ensemble member Output Variable Model Grid CMIP6.ScenarioMIP.NOAA-GFDL.GFDL-CM4.ssp585.r1i1p1f1.Omon.thetao.gn mip_era.activity_id.institution_id.source_id.experiment_id.member_id.table_id.variable_id.grid_label.version di ff erent simulations components of a single simulation example MIP table used - Variable names are standardized using the CMOR (https://cmor.llnl.gov/) library.
  16. - Data is distributed via the Earth System Grid Federation

    (ESGF) - Federation of public sector data centers hosting CMIP data (and more) on their servers. - Licensed, public, free - Data nodes serve netcdf fi les primarily to download. ESGF
  17. - Reality: Large institutions create mirrors of parts of the

    archive, restricted to employees (data fortresses) - Large overhead, requires both expertise, time and funds - Individual access/data cleaning approaches might be incompatible, hindering reusability/reproducibility - E ff ectively limits conducting climate science to large legacy orgs CMIP DATA: CHALLENGES ESGF Custom Code Custom Code Custom Code University Lab Industry ❌ ✋🚫
  18. ANALYSIS-READY CLOUD-OPTIMIZED (ARCO) DATA Analysis-Ready: • Think in “Datasets/ Datacubes”

    not “ fi les” and "folders" • Rich Metadata Cloud Optimized: Chunked appropriately for analysis Rich metadata Everything in one dataset object
  19. ANALYSIS-READY CLOUD-OPTIMIZED (ARCO) DATA Analysis-Ready: • Think in “Datasets/ Datacubes”

    not “ fi les” and "folders" • Rich Metadata Cloud Optimized: • E ffi cient cloud native access • Integration with data science and ML ecosystem
  20. CMIP6 CLOUD DATA ESGF Ingestion Pipeline A single data repository

    in the cloud serves all use cases Everybody rolls their own Custom Code Custom Code Custom Code University Lab Industry ❌ ✋🚫 Storage Provided by Google as Public Dataset
  21. CMIP6 CLOUD DATA A single data repository in the cloud

    serves all use cases Collaborative and agile Research Inclusive Education on real climate data Fast Iteration - Lower Barrier of Entry Portable Methods and Results not just for Academia
  22. CMIP6 CLOUD DATA A single data repository in the cloud

    serves all use cases Collaborative and agile Research Inclusive Education on real climate data Portable Methods and Results not just for Academia Fast Iteration - Lower Barrier of Entry
  23. CMIP6 CLOUD DATA A single data repository in the cloud

    serves all use cases Collaborative and agile Research Portable Methods and Results not just for Academia Inclusive Education on real climate data Fast Iteration - Lower Barrier of Entry
  24. CMIP6 CLOUD DATA A single data repository in the cloud

    serves all use cases Portable Methods and Results not just for Academia Collaborative and agile Research Inclusive Education on real climate data Fast Iteration - Lower Barrier of Entry
  25. CMIP6 CLOUD DATA A single data repository in the cloud

    serves all use cases Collaborative and agile Research Inclusive Education on real climate data Fast Iteration - Lower Barrier of Entry Portable Methods and Results not just for Academia
  26. MORE INFO? I ❤ QUESTIONS jbusecke juliusbusecke.com @JuliusBusecke @[email protected] @codeandcurrents.bsky.social

    https://github.com/leap-stc/cmip6-leap-feedstock https://pangeo-data.github.io/pangeo-cmip6-cloud/