Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The Berkeley Institute for Data Science: a plac...

The Berkeley Institute for Data Science: a place for people like us

Video of presentation: https://www.youtube.com/watch?v=q5yAy4WWTyU

BIDS was created as a novel environment for Data Science, in collaboration with the University of Washington in Seattle and NYU, with funding from the Gordon and Betty Moore Foundation and the Alfred P. Sloan Foundation.

Fernando Perez

July 09, 2014
Tweet

More Decks by Fernando Perez

Other Decks in Science

Transcript

  1. “People like us”? (a careers problem in academia) • Folks

    at the intersection of domain science, methods and software. • We actually care about our code – Not just about our results/publication count. • We often belong to the Department of Connective Tissue – Good luck getting tenure in that one! • The Big Data Brain Drain: Why Science is in Trouble. – Great blog post by Jake van der Plas.
  2. Raise your hand if… Your dept. chair/dean/provost • Loves that

    you write lots of code and therefore less papers. • Loves that your papers have many authors from different departments and you’re lost in the middle. • Loves that the journals you’ve published in seems like a random collection of unrelated topics. • Encourages you to spend more time on mailing lists/github/hipchat/IRC helping random strangers. • …
  3. Catalyst: Moore/Sloan Initiative • 3 chosen out of 15 LOIs

    invited • 5 Yrs, $38MM • Announced at White House OSTP event Nov 2013
  4. •Moore/Sloan Initiative: Core goals •Support meaningful collaborations between •Methodology fields:

    Comp Sci, Stats, Applied Math •Science domains •Establish sustainable Data Science career paths •A new generation of multi-disciplinary scientists •A new generation of data scientists focused on tool development •Build an ecosystem of tools and research practices •Sustainable, reusable, extensible •Effective as scientific research tools 8
  5. Initial Data Science Faculty Group Joshua Bloom, Professor, Astronomy; Director,

    Center for Time Domain Informatics Henry Brady, Dean, Goldman School of Public Policy Cathryn Carson, Associate Dean, Social Sciences; Acting Director of Social Sciences Data Laboratory "D-Lab” David Culler, Chair, EECS Michael Franklin, Professor; EECS, Co- Director, AMP Lab Erik Mitchell, Associate University Librarian •Faculty Lead/PI: Saul Perlmutter, Physics, Berkeley Center for Cosmological Physics •Fernando Perez, Researcher, Henry H. Wheeler Jr. Brain Imaging Center •Jasjeet Sekhon, Professor, Political Science and Statistics; Center for Causal Inference and Program Evaluation • Jamie Sethian, Professor, Mathematics •Kimmen Sjölander, Professor, Bioengineering, Plant and Microbial Biology •Philip Stark, Chair, Statistics • Ion Stoica, Professor, EECS; Co-Director, AMP Lab
  6. Berkeley Institute for Data Science (BIDS) Relevance across the campus

    suggests need for central location that will serve as home for data science efforts Doe Library Enhancing strengths of • Simons Institute for the Theory of Computing • D-Lab (Barrows) • AMP Lab (EECS) • CITRIS • SDAV Institute (LBL) • Urban Analytics Lab • etc.
  7. Doe Memorial Library Doe Memorial Library at the heart of

    the UC Berkeley campus will be the new home of the Berkeley Institute for Data Science (BIDS). The campus has set aside 5,000 sq ft on the ground floor directly accessible from the building’s north entrance and opposite to the historical Morrison Reading Room.
  8. Exec. Director and Data Science Fellows Exec. Director: Kevin Koy

    • Berkeley Geospatial Innovation Facility (GIF) • Numpy, Pandas, Django tools and REST APIs for GIS in research. Fellows affiliated with this community (15 total): • Katy Huff (SciPy) • Dav Clark (presented Tuesday) • Kyle Barbari (AstroPy) • Karthik Ram (Software Carpentry, ROpenSci) • Justin Kitzes (Software Carpentry)
  9. 13 Applied Math / Working Groups as Bridges The collaboration

    model across NYU-Berkeley-UW • Software Tools and Environments • Reproducible Research and Open Science • Education and Training • Ethnography and Evaluation • Career Paths • Space and Culture
  10. Software Tools and Environments • Tools for open, reproducible data

    science: – Python, Julia, R, Scala, etc. • Our three universities: – Deep expertise we don’t always engage. – Space and resources. – Complement github/OSS models. •14
  11. Reproducible Research and Open Science •15 • Incentive models –funding,

    publication • Build tools and practices. • Ask the right epistemological questions (cf. Lorena)
  12. Education •16 • What are the conceptual foundations of data

    science? • How do we teach them? • Are there new programs (undergrad/grad) or just courses?
  13. Career Paths •17 • Stable, rewarding, competitive. • New criteria

    for faculty tenure/promotion & scholarship? • New kinds of faculty FTEs? • New kinds of Research Software Engineer positions?
  14. Ethnography and Evaluation •18 • Study the process itself of

    doing data science. • This project seeks institutional change. Talk to Dav Clark and Seb Benthall.
  15. Industry Outreach • Problems that cross the industry/academia lines •

    Data/resources are often in industry – Though see Facebook’s recent experience… • Not enough academic jobs: – Many of our students will go to industry – Good collaboration and recruitment opportunities. •19
  16. Questions for discussion • Is “Data Science” actually “a thing”

    in science? • Is something like BIDS the right intellectual & institutional space for many folks from the SciPy community? • How can we (BIDS) better engage and support the SciPy community?
  17. For more context, see this blog post An ambitious experiment

    in Data Science takes off: a biased, Open Source view from Berkeley