Promoting Open Science in the University (OSBD 2016)
This talk gives a broad overview of the various initiatives to promote open science at the University of Washington, through the interdisciplinary eScience institute.
Incentives: Little incentive for academics to devote time to openness. Career paths: Relevant skills are more highly valued outside academia. Education: Undergraduate and graduate curricula lag in data science. Interdisciplinarity: Siloization of disciplines leads to missed opportunities.
Biological Sciences Environmental Sciences Social Sciences Physical Sciences Cecilia Aragon Human Centered Design & Engr. Magda Balazinska CSE Emily Fox Statistics Carlos Guestrin CSE Bill Howe CSE Jeff Heer CSE Ed Lazowska CSE David Beck Chem. Engr. Tom Daniel Biology Bill Noble Genome Sciences Josh Blumenstock iSchool Mark Ellis Geography Tyler McCormick Sociology, CSSS Ginger Armbrust Oceanography Randy LeVeque Applied Math Thom Richardson Statistics, CSSS Werner Stuetzle Statistics Andy Connolly Astronomy John Vidale Earth & Space Sciences
Foundation & Alfred P. Sloan Foundation - $38 million over 5 years, split between UW, NYU, and UC Berkeley Washington Research Foundation - $9.3 million over 5 years for faculty & postdocs - $7.1 million to the closely-aligned Institute for Neuroengineering University of Washington - $550,000/year for staff support - $600,000/year for faculty support National Science Foundation - $2.8 million over 5 years for graduate program development and Ph.D. student funding (IGERT)
Problems This level of support has given eScience opportunity to explore many aspects of these challenges . . . Six interrelated “Working Groups” - Career Paths and Alternative Metrics - Education and Training - Software Tools, Environments, and Support - Reproducibility and Open Science - Data Science Ethnography - Working Spaces and Culture.
Research Practices Rewarding open/reproducible research with “Open Science Badges” One of the ideas being explored by our Reproducibility working group https://osf.io/tvyxz/wiki/home/
Research Practices http://joss.theoj.org/ Code as a first-class research product (on par with traditional publications) - Short papers, review focused on code - Submitted & reviewed in the open on GitHub - Makes code citeable and indexable by traditional tracking services
Career Paths Jake VanderPlas Director of Research, Physical Sciences PhD Astronomy Ariel Rokem Data Scientist PhD, Neuroscience Valentina Staneva Data Scientist PhD, Applied Math Bernease Herman Data Scientist BS Stats, Formerly Amazon & Morgan-Stanley Data Scientists (full support) Research Scientists (partial support) Bryna Hazelton Research Scientist PhD Astrophysics Andrew Gartland Research Scientist PhD Biostatistics Vaughn Iverson Research Scientist PhD Oceanography Anthony Arendt Research Scientist PhD Geophysics Joe Hellerstein Sr. Data Science Fellow PhD Computer Science Formerly Microsoft Research, Google, IBM Watson Dave Beck Director of Research, Life Sciences PhD Medicinal Chemistry Rob Fatland Director Cloud & Data Solutions PhD Geophysics, formerly NASA & Microsoft Research Britta Fiore-Gartland Director of Ethnography PhD Communication Research Faculty Research IT Ethnography of Data Science
Career Paths 2-year/3-year postdoctoral fellowships across a range of departments. Joint mentorship: domain + methodology. Focus on high-impact researchers who will push boundaries in both areas. Post-doctoral Fellowships
Career Paths Example: Mario Juric, Astronomy - Data Management Lead for LSST - Professor of Astronomy, UW - Sr. Fellow at UW eScience Working on scalable software infrastructure for the LSST project, especially regarding the formation, structure, and evolution of the Milky Way. Faculty position half-funded through eScience, half through Astronomy. (one of six such appointments across campus) Interdisciplinary Faculty
Programs Undergraduate Level: “Transcriptable option” for students across departments. Masters Level: Stand-alone evening masters program aimed at working professionals. PhD Level: “Data Science specialty” for graduate students across disciplines.
Fellows Cecilia Noecker Genome Sc. & ML Matt Murbach ChemE & ML Ryan Maas CS & Astro Alex Tank Stats & Allen Inst. for Brain Science Grace Telford Astro & Stats Will Gagne-Maynard Oceanography & MSR eScience Graduate Fellows – first cohort
- Short trainings, workshops, and bootcamps - Annual “Hack Weeks” (e.g. AstroHackWeek, GeoHackWeek, NeuroHackWeek) - Informal seminar series (e.g. Python in Geosciences) - International coding sprints (e.g. Python in Astronomy)
Quarter-long, in-Studio projects, engagement two days per week - Each team: 1 project lead + 1 eScience Data Scientist - 4-6 concurrent teams: Network effects among cohort beyond 1:1 interactions
Developing a Workflow for Managing Large Hydrologic Spatial Datasets to Assist Water Resources Management and Research Project Lead: Nicoleta Cristea, Civil and Environmental Engineering eScience Liaisons: Anthony Arendt, Rob Fatland Methods for Characterizing Human Centromeres Project Lead: Siva Kasinathan, UW School of Medicine eScience Liaison: Andrew Fiore-Gartland, Bryna Hazelton Target Detection for Advanced Environmental Monitoring of Marine Renewable Energy Project Lead: Emma Cotter, Mechanical Engineering eScience Liaison: Bernease Herman Improved Stimulation Protocols for Sight Restoration Technologies Project Leads: Ione Fine, Geoffrey M. Boynton, UW Psychology eScience Liaison: Ariel Rokem AralDIF: A Cloud-based Dynamic Information Framework for the Aral Sea Basin Project Lead: Amanda Tan, Department of Oceanography eScience Liaisons: Rob Fatland, Anthony Arendt
for Social Good Four teams supported each summer Teams include: - Project Leads (1-2 from each org.) - DSSG Student Fellows (4 per team) - Data Science Leads (1-2 per team) - Stakeholders
for Social Good - Open Sidewalk Graph for Accessible Trip Planning - Assessing Community Well-being through Open Data and Social Media - Predictors of Permanent Housing for Homeless Families - Rerouting Solutions and Expensive Ride Analysis for King County Paratransit - Mining Online Data for Early Identification of Unsafe Food Products - Use of ORCA data for improved transit system planning and operation - Global Open Sidewalks: Creating a shared open data layer and an OpenStreetMap data standard for sidewalks - CrowdSensing Census: A heterogenous-based tool for estimating poverty 2015 2016