Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Open source, academic science and the public mi...

Open source, academic science and the public mission of research: reflections from the field

A presentation delivered as part of the CISE Distinguished Lecture Series, at the National Science Foundation headquarters.

Co-author: Lindsey Heagy (https://lindseyjh.ca).

A discussion of Project Jupyter and the role of open source tools in science, along with a reflection on how to tackle the funding and structural challenges of giving these tools a sustainable future.

Video is available here (you need to register for the livestream, even though it's in th past, to access the video player):
http://www.tvworldwide.com/events/nsf/190815

Fernando Perez

August 15, 2019
Tweet

More Decks by Fernando Perez

Other Decks in Science

Transcript

  1. Fernando Pérez Lindsey Heagy Open source, academic science and the

    public mission of research: reflections from the field
  2. JupyterLab: a grand unified theory of Jupyter Huge Team Effort!

    C. Colbert, S. Corlay, A. Darian, B. Granger, J. Grout, P. Ivanov, I. Rose, S. Silvester, C. Willing, J. Zosa-Forde …
  3. Core ideas of the web: HTTP & HTML HTML: format

    to represent content HyperText Markup Language HTTP: protocol to connect clients and servers HyperText Transport Protocol Image credit: eviltester.com
  4. Jupyter Protocol web-age capture of the process of interactive computing

    any mime-type output ❖ text ❖ svg, png, jpeg
  5. Jupyter Protocol web-age capture of the process of interactive computing

    any mime-type output ❖ text ❖ svg, png, jpeg ❖ latex, pdf
  6. Jupyter Protocol web-age capture of the process of interactive computing

    any mime-type output ❖ text ❖ svg, png, jpeg ❖ latex, pdf ❖ html, javascript
  7. Jupyter Protocol web-age capture of the process of interactive computing

    any mime-type output ❖ text ❖ svg, png, jpeg ❖ latex, pdf ❖ html, javascript ❖ interactive widgets
  8. A language agnostic protocol u a l j i ~100

    different kernels: https://github.com/jupyter/jupyter/wiki/Jupyter-kernels
  9. With these tools, we provide: ❖ Broad disciplinary reach and

    impact of statistical thinking. ❖ Drastically lowered barriers to student access - intellectual and economic. ❖ Lowered barriers for faculty* to engage with statistical and computational ideas. ❖ (*) typically from non computational/statistical domains) Organizational and intellectual leadership: Cathryn Carson, Ani Adhikari, John DeNero, … (many more)
  10. Reproducible Research An article about computational science in a scientific

    publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment and the complete set of instructions which generated the figures. Buckheit and Donoho, WaveLab and Reproducible Research, 1995
  11. LSST: one of the largest line items in NSF budget

    https://docushare.lsst.org/docushare/dsweb/Get/LSE-319
  12. HEP & reproducibility A ML-based likelihood-free inference tool for particle

    physics: A new python-based implementation of a statistical tool used for Higgs discovery. Kyle Cranmer, NYU
  13. HEP & reproducibility A ML-based likelihood-free inference tool for particle

    physics: A new python-based implementation of a statistical tool used for Higgs discovery. Kyle Cranmer, NYU
  14. Geosciences: research & education Lindsey Heagy, Berkeley 2019 GWH Career

    Achievement Award for outstanding junior scientist SimPEG: https://simpeg.xyz http://geosci.xyz
  15. Pangeo: open geosciences (and more!) Harnessing the power of cloud

    computing to study the whole earth interactively. https://pangeo.io Ryan Abernathey
  16. Pangeo: open geosciences (and more!) Harnessing the power of cloud

    computing to study the whole earth interactively. https://pangeo.io Ryan Abernathey
  17. Pangeo: open geosciences (and more!) Harnessing the power of cloud

    computing to study the whole earth interactively. https://pangeo.io Ryan Abernathey
  18. Scientific Open Source: Despite (direct) federal $$ support ❖ Note:

    “indirectly”, lots of $ have supported Scientific OSS projects/ tools. ❖ Under the cover of domain-focused work. ❖ Recently recommended for funding “Jupyter meets the Earth” (Jupyter + Pangeo team) NSF grant (Earth Cube/Shree Mishra) ❖ FP, Laurel Larsen, Lindsey Heagy (Berkeley), Joe Hamman (NCAR) ❖ Thank you!!!
  19. Traditional software infrastructure funding Yes, it’s true, the budget is

    gone again… But you can’t deny that now, we get here in an instant! Quino (Argentinian cartoonist)
  20. Contrasts in culture and incentives Open Source Academia Credit Distributed

    PI & hierarchy Output/artifacts Continuous & Project-specific Discrete papers Collaborators Fluid: professionals, volunteers, … Structured, funding-dependent Governance/ decision making Open, community based Top-down, PI Authorship Fluid, roles can evolve, no clear “first/ senior” author Need to say more? Peer review Continuous, open, pervasive, friendly The opposite Value metric Utility, need, impact “Novel and transformative”
  21. “The Stack”: a complete ecosystem Domain-agnostic backbone/trunk • Not “real

    CS” • Not “real research” • Nobody’s problem • Yet critical to everybody else
  22. Skills in education The Carpentries Tracy Teal Executive Director The

    Society of Research Software Engineering was founded on the belief that a world which relies on software must recognise the people who develop it. https://society-rse.org The Society of Research Software Engineering Career paths
  23. JOSS experience is positive, and yet… Yet! Not indexed by

    Google Scholar https://github.com/openjournals/joss/issues/130
  24. Bang for the buck? ❖ Federal 2018 R&D budget: $176.8B

    (AAAS analysis) ❖ What fraction of R&D today depends critically on computing? 10%? 30%? 50%? ❖ $200M is ~0.1% of that. ❖ $200M annually (well spent) would have major impact.
  25. “Well spent” That should be easy… ❖ Some features of

    successful, resilient projects ❖ Broad community engagement ❖ Actively managed pipeline for new contributions ❖ Capacity for short and long-term planning ❖ Writing code only small part of the job ❖ Treat OSS projects like organizations
  26. Strategic vision: requires professionalization ❖ Full-time work ❖ R&D, operations,

    community, fundraising ❖ Professionalization is inclusive: ❖ reliance on volunteers excludes those who can’t afford to volunteer.