Plenary talk presented at the 2021 February 3-4 launch meeting in response to the U.S. National Committee for the Ocean Decade call for disruptive advances in ocean science.
/ LDEO https://ocean-transport.github.io/ Co-founder of Pangeo Open Source Developer Open Science Advocate O c e a n C l o u d Transforming oceanography with a new approach to data and computing Ryan Abernathey
scientific inquiry and restricts participation. 😔 Solution: OceanCloud: a new approach to infrastructure based on cloud computing, open data, and open-source software. 😎 T h i s Ta l k 3
I n s t i t u t i o n s c r e at e “ D ata F o r t r e s s e s * ” 7 Image credit: Moahim, CC BY-SA 4.0, via Wikimedia Commons *Coined by Chelle Gentemann
I n s t i t u t i o n s c r e at e “ D ata F o r t r e s s e s * ” 7 Image credit: Moahim, CC BY-SA 4.0, via Wikimedia Commons *Coined by Chelle Gentemann
I n s t i t u t i o n s c r e at e “ D ata F o r t r e s s e s * ” 8 Image credit: Moahim, CC BY-SA 4.0, via Wikimedia Commons Data *Coined by Chelle Gentemann
I n s t i t u t i o n s c r e at e “ D ata F o r t r e s s e s * ” 8 Image credit: Moahim, CC BY-SA 4.0, via Wikimedia Commons Data *Coined by Chelle Gentemann
I n s t i t u t i o n s c r e at e “ D ata F o r t r e s s e s * ” 9 Image credit: Moahim, CC BY-SA 4.0, via Wikimedia Commons Data ❌ Results not reproducible outside fortress ❌ Barrier to collaboration ❌ Inefficient / duplicative ❌ Can’t scale to future data needs ❌ Limits inclusion and knowledge transfer *Coined by Chelle Gentemann
for solving real problems • Foundational support from NSF EarthCube • International partners, industry connections 10 A C o m m u n i t y P l at f o r m f o r B i g - D ata G e o s c i e n c e http://pangeo.io
for solving real problems • Foundational support from NSF EarthCube • International partners, industry connections 10 A C o m m u n i t y P l at f o r m f o r B i g - D ata G e o s c i e n c e http://pangeo.io
cloud infrastructure • Define science questions • Use software / infrastructure • Identify bugs / bottlenecks • Provide feedback to developers • Contribute widely the the open source scientific python ecosystem • Maintain / extend existing libraries, start new ones reluctantly • Solve integration challenges • Deploy interactive analysis environments • Curate analysis-ready datasets • Platform agnostic Agile development 👩💻 Pa n g e o B u i l d s w i t h O p e n D e v e l o p m e n t 11
r s o f C l o u d D ata E n v i r o n m e n t s 12 “Analysis Ready Data” Cleaned, curated open- access datasets available via high-performance globally available strorage system “Elastic Scaling” Automatically provision many computers on demand to accelerate big data processing. cloud “Data Proximate Computing” Bring analysis to the data. Web- based access provides “on-click to compute” access.
r s o f C l o u d D ata E n v i r o n m e n t s 12 “Analysis Ready Data” Cleaned, curated open- access datasets available via high-performance globally available strorage system “Elastic Scaling” Automatically provision many computers on demand to accelerate big data processing. cloud “Data Proximate Computing” Bring analysis to the data. Web- based access provides “on-click to compute” access.
Forecasting Research Education Industry ✅ Faster science, more discoveries ✅ Inherently reproducible ✅ Allows seamless global collaboration ✅ Unleashes creativity ✅ Cost effective ✅ Accessible to all ✅ Connects with industry *Coined by Fernando Perez O c e a n - C l o u d W i l l b e a “ D ata W at e r i n g H o l e * ”
distribution to cloud… • …but the missing link is accessible cloud computing environments for all. Pangeo and its partners can help with this. • We must avoid building new fortresses in the cloud and ensure interoperability from the start! • A National Oceanographic Partnership Program (NOPP) could provide funding, help facilitate inter-agency collaboration, and support user adoption. H o w c a n w e A c h i e v e t h i s ? 16