Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Teaching data science with puzzles

isteves
January 17, 2019

Teaching data science with puzzles

Of the many coding puzzles on the web, few focus on the programming skills needed for handling untidy data. During my summer internship at RStudio, I worked with Jenny Bryan to develop a series of data science puzzles known as the "Tidies of March." These puzzles isolate data wrangling tasks into bite-sized pieces to nurture core data science skills such as importing, reshaping, and summarizing data. We also provide access to puzzles and puzzle data directly in R through an accompanying Tidies of March package. I will show how this package models best practices for both data wrangling and project management.

isteves

January 17, 2019
Tweet

More Decks by isteves

Other Decks in Programming

Transcript

  1. Bite-sized puzzles that focus on core data science skills as

    championed by the tidyverse set of packages march tidies of
  2. SOOTHSAYER. Beware the ides of March. CAESAR. What man is

    that? BRUTUS. A soothsayer bids you beware the ides of March. CAESAR. Set him before me; let me see his face. CASSIUS. Fellow, come from the throng; look upon Caesar. CAESAR. What say'st thou to me now? Speak once again. SOOTHSAYER. Beware the ides of March. CAESAR. He is a dreamer; let us leave him. The Death of Julius Caesar, Vincenzo Camuccini 1771-1844
  3. Pre-populated file path here::here() for defining the path Knittable .R

    file Omit tidyverse messages from html output
  4. The neighborhood sandwich store makes the best sandwiches! They’ve got

    everything from classics like BLTs to more unusual options like Fluffernutters. Since many of their specialty ingredients keep going bad, they've decided to cut their selection and only focus on their best- selling sandwich. Photo: flickr skywhisperer
  5. To help with the decision, the storeowners have collected data

    on their customers’ favorite sandwiches. Most people listed several varieties (in no particular order). Here’s a sample of the data: In this sample, the Dagwood sandwich is the most popular. In the full dataset, what is the most popular sandwich among the customers?
  6. Beyond the Test cases Parseable and predictable file & folder

    names Projects & git Reproducible code*