Upgrade to Pro — share decks privately, control downloads, hide ads and more …

purrr slides

Avatar for Jennifer (Jenny) Bryan Jennifer (Jenny) Bryan
August 12, 2016
9.3k

purrr slides

Avatar for Jennifer (Jenny) Bryan

Jennifer (Jenny) Bryan

August 12, 2016
Tweet

Transcript

  1. DRAFT https://jennybc.github.io/purrr-tutorial/index.html these are not slides from a talk! I

    refer to them before and during live coding while teaching STAT 545 and DSCI 523 don’t expect them to stand on their own more material developing here:
  2. what is purrr? functional programming blah blah blah ok I

    admit it: FP not actually front of mind when I use purrr
  3. what does purrr help me do? iterate in a data-structure-informed

    way tolerate list-columns in data frames with consistent UI across a large family of fxns and return values that are ready for further computation
  4. for every X do Y return combined results like Z

    X and Z will make reference to actual R data structures Y will be a function, possibly anonymous like for i in 1 to n … but much higher level
  5. iterate in a data-structure-informed way for every GitHub username do

    GET https://api.github.com/users/username and give me HTTP responses in a list https://jennybc.github.io/purrr-tutorial/ex03_github-api-json.html
  6. iterate in a data-structure-informed way for every HTTP response extract

    the “name” element and give me a character vector https://jennybc.github.io/purrr-tutorial/ex03_github-api-json.html
  7. iterate in a data-structure-informed way for every HTTP response extract

    the elements "login", "name", "id", "location" and give me a data frame https://jennybc.github.io/purrr-tutorial/ex03_github-api-json.html
  8. iterate in a data-structure-informed way for every row in a

    data frame create a MIME object and give me a list https://jennybc.github.io/purrr-tutorial/ex20_bulk-gmail.html
  9. iterate in a data-structure-informed way for every MIME object send

    an email and return send status as a list https://jennybc.github.io/purrr-tutorial/ex20_bulk-gmail.html
  10. iterate in data-structure-informed way for every tuple (string, pos of

    substring starts, pos of substring ends) extract the substrings and give me a list of character vectors https://jennybc.github.io/purrr-tutorial/ex10_trump-tweets.html
  11. map(.x, .f, ...) .x is a vector “for every X”

    = for every element of .x remember lists are vectors remember data frames are lists
  12. map(.x, .f, ...) .f is a function possibly specified with

    shortcuts all shown in the worked examples “do Y” = .f(.x[[i]], …)
  13. “give me a Z” map_lgl(.x, .f, ...) map_chr(.x, .f, ...)

    map_int(.x, .f, ...) map_dbl(.x, .f, …) return an atomic vector of requested type
  14. “give me a Z” map_df(.x, .f, ..., .id = NULL)

    basically: map() then dplyr::bind_rows()
  15. “for every X” map2(.x, .y, .f, …) X = (element

    i of .x, element i of .y) pmap(.l, .f, …) X = tuple of the i-th elements of the lists in .l remember a data frame is a list!
  16. how might you be such things today? maybe you don’t,

    because you don’t know how for loops apply(), [slvmt]apply(), split(), by() the plyr package: [adl][adl_]ply() with dplyr: df %>% group_by() %>% do()
  17. this is not my first R rodeo I have gone

    through intense, evangelical phases of iterating with base “apply” functions and plyr I highly recommend you give purrr a try
  18. relationship to base R approaches there’s nothing you can do

    with purrr that you cannot do with base specifically: map() is basically lapply() main reasons to use purrr: - shortcuts facilitate anonymous functions for .f - greater encouragement for type-safety - consistent API across large family of functions
  19. tolerate list-columns in data frames tidyverse lifestyle ~ work in

    a data frame when possible what about stuff that can’t be stored as an atomic vector? - stick it in a list-column but list-columns are awful! - get better at inspecting lists - get better at computing on lists use purrr::map() and friends - probably inside dplyr::mutate()
  20. tolerate list-columns in data frames tidyverse lifestyle ~ work in

    a data frame when possible ok there’s a whole section I want to write here, with more worked examples on the site, etc. but that’s not happening this round what follows are a few hints of the what I will say
  21. every time someone asks: how can I iterate over a

    list, but also access the index i or the list names at the same time? they should probably be working inside a data frame, with a list column and a variable for i or the names use tibble::enframe() on your vexing_list and have at it with mutate(new_var = map_*(vexing_list, f)) or map2() or pmap()
  22. Great example is Gapminder draw on http://r4ds.had.co.nz/many-models.html and STAT 545

    Gapminder materials (translate from plyr and dplyr) natural to nest at country level and put data in list-column fit models, etc. by mutating the data list-column extract model summaries by mutating the fits w broom fxns
  23. more far out example is https://jennybc.github.io/purrr-tutorial/ex24_xml-wrangling.html where I put XML

    nodesets in a data frame each row is one row of a Google Sheet I proceed to wrangle it on the way to get cell contents
  24. also, just to be clear: no one in their right

    mind enjoys having list-columns in a data frame but the benefits often outweigh the costs especially if you have the right tools and a productive mindset it’s always a temporary state goal is always to get back to something simpler
  25. My economic policy speech will be carried live at 12:15

    P.M. Enjoy! Join me in Fayetteville, North Carolina tomorrow evening at 6pm. Tickets now available at: https://t.co/Z80d4MYIg8 The media is going crazy. They totally distort so many things on purpose. Crimea, nuclear, "the baby" and so much more. Very dishonest! I see where Mayor Stephanie Rawlings-Blake of Baltimore is pushing Crooked hard. Look at the job she has done in Baltimore. She is a joke! Bernie Sanders started off strong, but with the selection of Kaine for V.P., is ending really weak. So much for a movement! TOTAL DISRESPECT Crooked Hillary Clinton is unfit to serve as President of the U.S. Her temperament is weak and her opponents are strong. BAD JUDGEMENT! The Cruz-Kasich pact is under great strain. This joke of a deal is falling apart, not being honored and almost dead. Very dumb! substring(text, first, last) [[1]] [1] -1 [[2]] [1] -1 [[3]] [1] 20 [[4]] [1] 134 [[5]] [1] 28 95 [[6]] [1] 87 114 [[7]] [1] 50 112 123 [[1]] [1] -3 [[2]] [1] -3 [[3]] [1] 24 [[4]] [1] 137 [[5]] [1] 33 98 [[6]] [1] 90 119 [[7]] [1] 53 115 126 tweets match_first match_last https://jennybc.github.io/purrr-tutorial/ex10_trump-tweets.html pmap(list(text = tweets, first = match_first, last = match_last), substring)