Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Making data science accessible in the Johns Hop...

Stephanie Hicks
February 27, 2019

Making data science accessible in the Johns Hopkins Data Science Lab

Stephanie Hicks

February 27, 2019
Tweet

More Decks by Stephanie Hicks

Other Decks in Research

Transcript

  1. Making data science accessible in the Johns Hopkins Data Science

    Lab Stephanie Hicks Assistant Professor, Biostatistics Johns Hopkins Bloomberg School of Public Health Faculty Member Johns Hopkins Data Science Lab @stephaniehicks
  2. Teaching: Data Science Research: Genomics (analyzing single-cell gene expression data)

    • R/Bioconductor user and developer (since 2009/2010) Other fun things about me: • Co-founded Baltimore • Creating a children’s book featuring women statisticians and data scientists ABOUT ME JOHNS HOPKINS BLOOMBERG SCHOOL OF PUBLIC HEALTH
  3. Massive Open Online Courses in Data Science • > 4

    million enrolled • > 500K completed courses • > 200K completed specialization
  4. • Variable pricing (including $0) • Readers get all edition

    updates • Author friendly royalty split • Bound books through 3rd party The E-book revolution
  5. The E-book revolution • Variable pricing (including $0) • Readers

    get all edition updates • Author friendly royalty split • Bound books through 3rd party
  6. The Data Science Lab Puppets • Creating children’s videos to

    teach young students about statistics and data science • Puppets have their own DSL YouTube channel and twitter accounts: @LeekPuppet, @puppetpeng
  7. Why data science? Data science is the number one rated

    job by Glassdoor and there are more than 350,000 new data science jobs expected by 2020.
  8. Here, I focus on the term data science as it

    refers generally to Type A data scientists who process and interpret data as it pertains to answering real-world questions.
  9. Data Science in Academia? • Statistics was born directly from

    developing solutions to practical problems by data analysis problems • Galton, Ronald Fisher • Wild and Pfannkuch (1999) describe applied statistics as: • A department that embraces applied statistics defined above is a natural home for data science in academia “part of the information gathering and learning process which, in an ideal world, is undertaken to inform decisions and actions. With industry, medicine and many other sectors of society increasingly relying on data for decision making, statistics should be an integral part of the emerging information era.”
  10. What is missing in the current statistics curriculum? Wild and

    Pfannhuch (1999) complained that: “Large parts of the investigative process, such as problem analysis and measurement, have been largely abandoned by statisticians and statistics educators to the realm of the particular, perhaps to be developed separately within other disciplines.” They add that “[t]he arid, context-free landscape on which so many examples used in statistics teaching are built ensures that large numbers of students never even see, let alone engage in, statistical thinking.”
  11. What is missing in the current statistics curriculum? Computing, Connecting

    • Need more computing in the curriculum • Need to teach how to connect the subject matter question to appropriate dataset and analysis tools
  12. What is missing in the current statistics curriculum? Computing, Connecting,

    Creating • Need more computing in the curriculum • Need to teach how to connect the subject matter question to appropriate dataset and analysis tools • Instead of being passive, teach students to be active and how create and formulate questions to investigate hypotheses with data
  13. Bridging the gap in the classroom to teach introductory data

    science courses • Educators need to be experienced themselves in creating, connecting and computing • Encourage applied statisticians experienced in creating, connecting, and computing to become involved in the development of courses • Encourage statistics departments to reach out to practicing data analysts, perhaps in other departments or from other disciplines, to collaborate in developing these courses
  14. Principles of Teaching Data Science • Organize the course around

    a set of diverse case studies • Integrate computing into every aspect of the course • Teach abstraction, but minimize reliance on mathematical notation • Structure course activities to realistically mimic a data scientist’s experience • Demonstrate the importance of critical thinking / skepticism through examples
  15. Female Male 0 10 20 30 18−24 25−44 18−24 25−44

    count What is your age? clincial effectiveness non−degree quantitative methods global health social and behavorial sciences MPH health policy environmental health computational biology biostatistics epidemiology 0 5 10 15 count What is your primary concentration? VB/VBScript Ruby Perl SQL BASIC Java Python C / C++ R 0 10 20 30 count What is your primary programming language? Less comfortable More comfortable 0 5 10 15 20 1 2 3 4 5 count Overall, how comfortable are you with programming? 0 10 20 <6mos 6mos − 1yr 1−3yrs >3yrs count How long have you been programming? A B C D E
  16. Feel free to send comments/questions: Twitter: @stephaniehicks Email: [email protected] #rladies

    Thank you! https://opencasestudies.github.io https://jhu-advdatasci.github.io/2018/