a re we a ther a nd clim a te d a t a used for?  Where do we a ther a nd clim a te d a t a come from? • Types of d a t a : observ a tions, models, d a t a a ssimil a tion • Popul a r d a t a providers a nd d a t a products • CMIP6 spotlight  Technology for we a ther a nd clim a te d a t a systems • P a st / present / future
X a rr a y, Z a rr, Xgcm, Rechunker, P a ngeo Forge, etc. Ex Professor Physic a l Oce a nogr a pher & Clim a te Scientist NASA SWOT Science Te a m LEAP & M2LiNES who a m I?
a public benefit corpor a tion MISSION STATEMENT To empower people to use scientific d a t a to solve hum a nity’s gre a test ch a llenges Ry a n Abern a they CO - FOUNDER & CEO Joe H a mm a n CO - FOUNDER & CTO OUR TEAM  Experienced scientists.  Open source community leaders.  Seasoned engineers who have built production-grade scientific data infrastructure at top companies. OUR PRIOR EXPERIENCE OUR OPEN - SOURCE LEADERSHIP
time sc a les of E a rth System phenomen a torn a does severe storms hurric a nes extr a tropic a l cyclones se a sons ENSO clim a te ch a nge Ad a pted from T a v a kolif a r et a l. 2017 https://doi.org/10.2166/wcc.2017.107 M IN U TES H O U R S D AYS W EEKS M O N TH S YEA R S D EC A D ES C EN TU R IES 1 KM 10 KM 100 KM 1000 KM 10,000 KM S EC O N D S GLOBAL SMALL BIG FAST SLOW m a rine he a tw a ves wild fi re drought flood
time sc a les torn a does severe storms hurric a nes extr a tropic a l cyclones se a sons ENSO clim a te ch a nge Ad a pted from T a v a kolif a r et a l. 2017 https://doi.org/10.2166/wcc.2017.107 M IN U TES H O U R S D AYS W EEKS M O N TH S YEA R S D EC A D ES C EN TU R IES 1 KM 10 KM 100 KM 1000 KM 10,000 KM S EC O N D S GLOBAL SMALL BIG FAST SLOW m a rine he a tw a ves wild fi re drought flood high-frequency tr a ders long-term a sset m a n a gers d a y tr a ders stocks, bonds, commodities, investment b a nks, priv a te equity, VC fin a nci a l sector
a cted by we a ther a nd clim a te phenomen a ?  Agriculture  Forestry  Energy • Fossil Fuels (dem a nd) • Renew a bles (supply + dem a nd)  Tr a nsport a tion  He a lth  Ret a il  Hospit a lity  Construction  Re a l Est a te  Fin a nci a ls  Insur a nce  C a rbon m a rkets
v a lue ch a in  Body level one • Body level two 🛰 d a t a providers Gener a te r a w d a t a sets using direct observ a tions or simul a tion c a p a bilities. ex a mples: 🏭 d a t a refiners Process r a w d a t a to produce v a lue- a dded d a t a sets using modeling, st a tistics, AI / ML. ex a mples: ⛏ d a t a end users Comp a nies / institutions with a ctu a l a ssets a t risk to environment a l f a ctors. ex a mples:
tellite d a t a → “simple” s a tellite im a gery → a dv a nced remote sensing of physic a l v a ri a bles SWOT S a tellite Im a ge Credit: NASA, CNES Arctic Wildfires LANDSAT S a tellite Im a ge Credit: NASA E a rth Observ a tory
a m a rthi, R a o, et a l. “Glob a l Clim a te Models.” Downsc a ling Techniques for High-Resolution Clim a te Projections: From Glob a l Ch a nge to Loc a l Imp a cts. C a mbridge: C a mbridge University Press, 2021. 19 – 39. Print.  Equ a tions of physics  Numeric a l discretiz a tion of the E a rth (i.e. a “grid”)  Millions of lines of FORTRAN code
resolution a nd comput a tion a l cost IPCC AR4 WG1 Figure 1.4 1990 1996 2001 2007 Frontier Ex a sc a le Supercomputer. Credit OLCF Incre a sing Comput a tion a l Cost Incre a sing D a t a Volume
st d a t a products  Europe a n Center for Medium R a nge We a ther Forec a sting (ECMWF) • HRES - High-resolution 10-d a y forec a st • ENS - Ensemble 15-d a y forec a st • SEAS - Se a son a l Forec a st  US N a tion a l Oce a nogr a phic a nd Atmospheric Administr a tion (NOAA) • GFS - Glob a l Forec a st System • GEFS - Glob a l Ensemble Forec a st System • HRRR - High Resolution R a pid Refresh
→ consistent view of long-term historic a l record  C a reful qu a lity control of observ a tions  Physic a lly consistent, g a p-free, jump-free record  H a rmonized sp a tiotempor a l grid  Suit a ble for long-term clim a te studies Credit: ECMWF
n a lysis d a t a products  ECMWF ERA5  NASA MERRA2  NCEP North Americ a n Region a l Re a n a lysis  JRA - 55: J a p a nese 55-ye a r Re a n a lysis, Ne a r Re a l-Time D a t a  ECCO: Estim a ting the Circul a tion a nd Clim a te of the Oce a n  … a nd m a ny m a ny more! Active rese a rch topic
rison Project  Intern a tion a l e ff ort by WCRP to coordin a te clim a te modeling e ff orts  Pre-de fi ned emissions scen a rios a nd experiment protocols  E a ch modeling center runs their own models  D a t a st a nd a rdiz a tion  Beg a n in 1995  CMIP6 out; CMIP7 beginning
 M a ny 100,000s of individu a l d a t a sets  E a ch d a t a set is identi fi ed by a unique id, consisting of 'f a cets'  F a cets a re p a rt of CMIP controlled voc a bul a ry (https://wcrp-cmip.github.io/CMIP6_CVs/) https://wcrp-cmip.org/cmip-data-access/#access-routes CMIP Cycle MIP activity Modelling Center Model Code Experiment/forcing scenarios Ensemble member Output Variable Model Grid CMIP6.ScenarioMIP.NOAA-GFDL.GFDL-CM4.ssp585.r1i1p1f1.Omon.thetao.gn mip_era.activity_id.institution_id.source_id.experiment_id.member_id.table_id.variable_id.grid_label.version di ff erent simulations components of a single simulation example MIP table used
→ institution https://wcrp-cmip.org/cmip-data-access/#access-routes CMIP Cycle MIP activity Modelling Center Model Code Experiment/forcing scenarios Ensemble member Output Variable Model Grid CMIP6.ScenarioMIP.NOAA-GFDL.GFDL-CM4.ssp585.r1i1p1f1.Omon.thetao.gn mip_era.activity_id.institution_id.source_id.experiment_id.member_id.table_id.variable_id.grid_label.version di ff erent simulations components of a single simulation example MIP table used https://wcrp-cmip.org/map/#map_of_modelling_centres_and_esgf_nodes
→ source https://wcrp-cmip.org/cmip-data-access/#access-routes CMIP Cycle MIP activity Modelling Center Model Code Experiment/forcing scenarios Ensemble member Output Variable Model Grid CMIP6.ScenarioMIP.NOAA-GFDL.GFDL-CM4.ssp585.r1i1p1f1.Omon.thetao.gn mip_era.activity_id.institution_id.source_id.experiment_id.member_id.table_id.variable_id.grid_label.version di ff erent simulations components of a single simulation example MIP table used https://wcrp-cmip.org/map/#map_of_modelling_centres_and_esgf_nodes
https://wcrp-cmip.org/cmip-data-access/#access-routes CMIP Cycle MIP activity Modelling Center Model Code Experiment/forcing scenarios Ensemble member Output Variable Model Grid CMIP6.ScenarioMIP.NOAA-GFDL.GFDL-CM4.ssp585.r1i1p1f1.Omon.thetao.gn mip_era.activity_id.institution_id.source_id.experiment_id.member_id.table_id.variable_id.grid_label.version di ff erent simulations components of a single simulation example MIP table used → experiment
https://wcrp-cmip.org/cmip-data-access/#access-routes CMIP Cycle MIP activity Modelling Center Model Code Experiment/forcing scenarios Ensemble member Output Variable Model Grid CMIP6.ScenarioMIP.NOAA-GFDL.GFDL-CM4.ssp585.r1i1p1f1.Omon.thetao.gn mip_era.activity_id.institution_id.source_id.experiment_id.member_id.table_id.variable_id.grid_label.version di ff erent simulations components of a single simulation example MIP table used Hingray and Saïd 2014 → member
 V a ri a ble n a mes a re st a nd a rdized using the CMOR (https://cmor.llnl.gov/) libr a ry. https://expearth.uib.no/?page_id=28 CMIP Cycle MIP activity Modelling Center Model Code Experiment/forcing scenarios Ensemble member Output Variable Model Grid CMIP6.ScenarioMIP.NOAA-GFDL.GFDL-CM4.ssp585.r1i1p1f1.Omon.thetao.gn mip_era.activity_id.institution_id.source_id.experiment_id.member_id.table_id.variable_id.grid_label.version di ff erent simulations components of a single simulation example MIP table used → d a t a set
Feder a tion → CMIP D a t a Distribution System  Feder a tion of n a tion a l l a bs a nd public-sector a gencies  Peer-to-peer d a t a sh a ring system b a sed on NetCDF fi les  D a t a c a t a log a nd se a rch c a p a bilities
: Ch a llenges  D a t a a rchive is 20 PB!  D a t a must be downlo a ded to loc a l stor a ge a nd org a nized in order to be used for a n a lysis.  This cre a tes a l a rge overhe a d for working with CMIP d a t a  Individu a l a ccess/d a t a cle a ning a ppro a ches might be incomp a tible, hindering reus a bility/reproducibility ESGF Custom Code Custom Code Custom Code University Lab Industry ❌ ✋🚫
re p a inful 1 - DATA INGESTION Downlo a d a bunch of files from d a t a providers in form a ts like Grib, NetCDF, HDF. ☹ DATA PROVIDER → the st a tus quo
re p a inful 1 - DATA INGESTION Downlo a d a bunch of files from d a t a providers in form a ts like Grib, NetCDF, HDF. ☹ DATA PROVIDER → the st a tus quo 2 - DATA CLEANING Wr a ngle downlo a ded files into some sort of org a nize structure. H a rmonize inconsistencies a cross files. 😫
re p a inful 1 - DATA INGESTION Downlo a d a bunch of files from d a t a providers in form a ts like Grib, NetCDF, HDF. ☹ DATA PROVIDER → the st a tus quo 2 - DATA CLEANING Wr a ngle downlo a ded files into some sort of org a nize structure. H a rmonize inconsistencies a cross files. 😫 3 - ANALYSIS / MODELING The fun p a rt. 🎉
re p a inful 1 - DATA INGESTION Downlo a d a bunch of files from d a t a providers in form a ts like Grib, NetCDF, HDF. ☹ DATA PROVIDER → the st a tus quo 2 - DATA CLEANING Wr a ngle downlo a ded files into some sort of org a nize structure. H a rmonize inconsistencies a cross files. 😫 3 - ANALYSIS / MODELING The fun p a rt. 🎉 4 - SHARE REPRODUCIBLE RESULTS Ye a h right! 😂
system d a t a a re multidimension a l, not t a bul a r ch a llenge D a t a systems designed for typic a l business d a t a (t a bles / d a t a fr a mes) don’t work for multidimension a l d a t a .
The Cloud Pl a tform for We a ther a nd Clim a te D a t a connectors for common we a ther a nd clim a te d a t a sources a nd form a ts ingestion engine query engine stor a ge engine e ff icient, perform a nt cloud n a tive stor a ge a nd c a t a log integr a tions with popul a r d a t a science a nd GIS tools public cloud infr a structure
ke for?  Comp a nies l a rge a nd sm a ll System of record for your business-critic a l we a ther, clim a te a nd geosp a ti a l d a t a .  Public sector Use Arr a yl a ke to dissemin a te a n a lysis-re a dy, cloud-optimized d a t a to your t a rget users.  Ac a demi a Build a d a t a l a ke to support your dep a rtment’s rese a rch a nd educ a tion.
a feder a tion in the cloud vision Se a mless exch a nge of d a t a a cross a c a demi a , government, a nd industry will empower reproducible open science a nd a cceler a te clim a te ch a nge solutions. University L a b AI St a rtup Big Tech Co. Gov’t Agency NGO