Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Heidi Seibold - Are (data) scientists bad at sc...

Heidi Seibold - Are (data) scientists bad at science?

Let's face the facts: most research is not reproducible. That means, if you run the same analysis on the same data of a (data) science project again, you will likely not receive the same results. Ups! In this talk I want to explore the stumbling blocks that (data) scientist run into and what to do about it. Spoiler: it has a lot to do with getting organized.

MunichDataGeeks

May 31, 2023
Tweet

More Decks by MunichDataGeeks

Other Decks in Science

Transcript

  1. CC-BY 4.0 Are (data) scientists bad at science? Munich Datageeks

    May Edition 2023 Heidi Seibold Heidiseibold.com Slides: https://bit.ly/3IvAzT0
  2. Project management = good organisation Let's not pretend: we're not

    geniuses ;P http://www.quickmeme.com/meme/3r98zx
  3. Naming • Myabstract.docx • Joe’s Filenames Use Spaces and Punctuation.xlsx

    • figure 1.png • fig 2.png • JW7d^(2sl@deletethisandyourcareerisoverWx2*.txt • 2014-06-08_abstract-for-sla.docx • Joes-filenames-are-getting-better.xlsx • Fig01_scatterplot-talk-length-vs-interest.png • Fig02_histogram-talk-attendance.png • 1986-01-28_raw-data-from-challenger-o-rings.txt NO YES See slides by Jenny Brian
  4. Naming • 2014-06-08_abstract-for-sla.docx • Joes-filenames-are-getting-better.xlsx • Fig01_scatterplot-talk-length-vs-interest.png • Fig02_histogram-talk-attendance.png •

    1986-01-28_raw-data-from-challenger-o-rings.txt YES File names should be: ➔ Machine readable ➔ Human readable ➔ Optional: Consistent ➔ Optional: Play well with default ordering
  5. Organise your files and folders well . ├── analysis <-

    all things data analysis │ └── src <- functions and other source files ├── comm │ ├── internal-comm <- internal communication such as meeting notes │ └── journal-comm <- communication with the journal, e.g. peer review ├── data │ ├── data_clean <- clean version of the data │ └── data_raw <- raw data (don't touch) ├── dissemination │ ├── manuscripts │ ├── posters │ └── presentations ├── documentation <- documentation, e.g. data management plan └── misc <- miscellaneous files that don't fit elsewhere https://github.com/HeidiSeibold/research-project-template
  6. What can data scientists do? • Work according to good

    practices • Be a role model • Collaborate • Teach