Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Ocean Cloud (OceanShot NAS Plenary Talk)

Ocean Cloud (OceanShot NAS Plenary Talk)

Plenary talk presented at the 2021 February 3-4 launch meeting in response to the U.S. National Committee for the Ocean Decade call for disruptive advances in ocean science.

https://www.nationalacademies.org/our-work/us-national-committee-on-ocean-science-for-sustainable-development-2021-2030/ocean-shot-directory

Ryan Abernathey

June 21, 2021
Tweet

More Decks by Ryan Abernathey

Other Decks in Science

Transcript

  1. O c e a n C l o u d

    Transforming oceanography with a new approach to data and computing Ryan Abernathey
  2. Physical Oceanographer Ph.D. From MIT, 2012 Associate Prof. at Columbia

    / LDEO https://ocean-transport.github.io/ Co-founder of Pangeo Open Source Developer Open Science Advocate O c e a n C l o u d Transforming oceanography with a new approach to data and computing Ryan Abernathey
  3. Problem: Ocean data are huge and complex! 🤯 This limits

    scientific inquiry and restricts participation. 😔 Solution: OceanCloud: a new approach to infrastructure based on cloud computing, open data, and open-source software. 😎 T h i s Ta l k 3
  4. P r i v i l e g e d

    I n s t i t u t i o n s c r e at e “ D ata F o r t r e s s e s * ” 7 Image credit: Moahim, CC BY-SA 4.0, via Wikimedia Commons *Coined by Chelle Gentemann
  5. P r i v i l e g e d

    I n s t i t u t i o n s c r e at e “ D ata F o r t r e s s e s * ” 7 Image credit: Moahim, CC BY-SA 4.0, via Wikimedia Commons *Coined by Chelle Gentemann
  6. P r i v i l e g e d

    I n s t i t u t i o n s c r e at e “ D ata F o r t r e s s e s * ” 8 Image credit: Moahim, CC BY-SA 4.0, via Wikimedia Commons Data *Coined by Chelle Gentemann
  7. P r i v i l e g e d

    I n s t i t u t i o n s c r e at e “ D ata F o r t r e s s e s * ” 8 Image credit: Moahim, CC BY-SA 4.0, via Wikimedia Commons Data *Coined by Chelle Gentemann
  8. P r i v i l e g e d

    I n s t i t u t i o n s c r e at e “ D ata F o r t r e s s e s * ” 9 Image credit: Moahim, CC BY-SA 4.0, via Wikimedia Commons Data ❌ Results not reproducible outside fortress ❌ Barrier to collaboration ❌ Inefficient / duplicative ❌ Can’t scale to future data needs ❌ Limits inclusion and knowledge transfer *Coined by Chelle Gentemann
  9. • Grass-roots collaboration between scientists, software developers around open-source tools

    for solving real problems • Foundational support from NSF EarthCube • International partners, industry connections 10 A C o m m u n i t y P l at f o r m f o r B i g - D ata G e o s c i e n c e http://pangeo.io
  10. • Grass-roots collaboration between scientists, software developers around open-source tools

    for solving real problems • Foundational support from NSF EarthCube • International partners, industry connections 10 A C o m m u n i t y P l at f o r m f o r B i g - D ata G e o s c i e n c e http://pangeo.io
  11. Scientific users / use cases Open-source software libraries HPC and

    cloud infrastructure • Define science questions • Use software / infrastructure • Identify bugs / bottlenecks • Provide feedback to developers • Contribute widely the the open source scientific python ecosystem • Maintain / extend existing libraries, start new ones reluctantly • Solve integration challenges • Deploy interactive analysis environments • Curate analysis-ready datasets • Platform agnostic Agile development 👩💻 Pa n g e o B u i l d s w i t h O p e n D e v e l o p m e n t 11
  12. T h r e e P i l l a

    r s o f C l o u d D ata E n v i r o n m e n t s 12 “Analysis Ready Data” Cleaned, curated open- access datasets available via high-performance globally available strorage system “Elastic Scaling” Automatically provision many computers on demand to accelerate big data processing. cloud “Data Proximate Computing” Bring analysis to the data. Web- based access provides “on-click to compute” access.
  13. T h r e e P i l l a

    r s o f C l o u d D ata E n v i r o n m e n t s 12 “Analysis Ready Data” Cleaned, curated open- access datasets available via high-performance globally available strorage system “Elastic Scaling” Automatically provision many computers on demand to accelerate big data processing. cloud “Data Proximate Computing” Bring analysis to the data. Web- based access provides “on-click to compute” access.
  14. O c e a n - C l o u

    d W i l l b e a “ D ata W at e r i n g H o l e * ” 13 *Coined by Fernando Perez
  15. 14 *Coined by Fernando Perez O c e a n

    - C l o u d W i l l b e a “ D ata W at e r i n g H o l e * ”
  16. 15 👩💻👨💻👩💻 Group A: Air-Sea Interaction 👩💻👨💻👩💻 Group B: Seasonal

    Forecasting Research Education Industry *Coined by Fernando Perez O c e a n - C l o u d W i l l b e a “ D ata W at e r i n g H o l e * ”
  17. 15 👩💻👨💻👩💻 Group A: Air-Sea Interaction 👩💻👨💻👩💻 Group B: Seasonal

    Forecasting Research Education Industry ✅ Faster science, more discoveries ✅ Inherently reproducible ✅ Allows seamless global collaboration ✅ Unleashes creativity ✅ Cost effective ✅ Accessible to all ✅ Connects with industry *Coined by Fernando Perez O c e a n - C l o u d W i l l b e a “ D ata W at e r i n g H o l e * ”
  18. • Many agencies (e.g. NASA, NOAA) are already moving data

    distribution to cloud… • …but the missing link is accessible cloud computing environments for all. Pangeo and its partners can help with this. • We must avoid building new fortresses in the cloud and ensure interoperability from the start! • A National Oceanographic Partnership Program (NOPP) could provide funding, help facilitate inter-agency collaboration, and support user adoption. H o w c a n w e A c h i e v e t h i s ? 16
  19. L e a r n M o r e 17

    http://pangeo.io https://github.com/pangeo-data/ https://medium.com/pangeo @pangeo_data