Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Data Rectangling

Data Rectangling

Talk about data rectangling and list-columns at RStudio Conf 2018 in San Diego
https://www.rstudio.com/conference/
Gist of code I showed:
https://gist.github.com/jennybc/3afafce0a06fde314b5c9844912d6bd7

Jennifer (Jenny) Bryan

February 02, 2018
Tweet

More Decks by Jennifer (Jenny) Bryan

Other Decks in Programming

Transcript

  1. Data Wrangling
    @JennyBryan
    @jennybc


    View full-size slide

  2. Data Wrangling
    @JennyBryan
    @jennybc


    Rect

    View full-size slide

  3. atomic vectors
    logical factor
    integer, double

    View full-size slide

  4. vectors of same length? DATA FRAME!

    View full-size slide

  5. vectors don’t have to be atomic
    works for lists too! list column

    View full-size slide

  6. name


    stuff


    this is a data frame!
    a tibble, specifically

    View full-size slide

  7. a homogeneous list

    View full-size slide

  8. Why work with lists?
    You have no choice.
    •String processing, e.g., splitting
    •JSON or XML, e.g. web APIs
    •Models, plots, & collections thereof

    View full-size slide

  9. An API Of Ice And Fire
    https://anapioficeandfire.com
    https://cran.r-project.org/package=repurrrsive

    View full-size slide

  10. "Combines the excitement of iris and mtcars,
    with the complexity of recursive lists.
    W00t!"
    install.packages("repurrrsive")

    View full-size slide

  11. https://blog.rstudio.com/2017/08/22/rstudio-v1-1-preview-object-explorer/
    View(YOUR_HAIRY_LIST)

    View full-size slide

  12. got_chars[[9]][["name"]]
    got_chars[[9]][["titles"]]

    View full-size slide

  13. x[[i]]
    x[i]
    x
    from
    http://r4ds.had.co.nz/vectors.html#lists-of-condiments

    View full-size slide

  14. http://blog.codinghorror.com/falling-into-the-pit-o
    pit of success

    View full-size slide

  15. https://shibumo.wordpress.com
    gentle hill of striving

    View full-size slide

  16. map(.x, .f, ...)
    purrr::

    View full-size slide

  17. map(.x, .f, ...)
    for every element of .x
    apply .f

    View full-size slide

  18. map(.x, .f, ...)
    .f has some special shortcuts
    to make common tasks easy
    map(.x, "TEXT")
    map(.x, i)

    View full-size slide

  19. map(minis, "pants")

    View full-size slide

  20. map_lgl(.x, .f, ...)
    map_int(.x, .f, ...)
    map_dbl(.x, .f, ...)
    map_chr(.x, .f, ...)

    View full-size slide

  21. map_dfr(minis, `[`,
    c("pants", "torso", "head")

    View full-size slide

  22. If everything is equally easy,
    everything is equally hard.
    paraphrasing David Heinemeier Hansson re: Ruby on Rails

    View full-size slide

  23. map(.x, .f, ...)
    .f can take many forms
    • existing function
    • anonymous function
    • formula

    View full-size slide

  24. map(minis, antennate)

    View full-size slide

  25. library(glue)


    glue_data(

    list(name = "Jenny", born = "in Atlanta"),

    "{name} was born {born}."

    )

    #> Jenny was born in Atlanta.


    glue_data(got_chars[[2]], "{name} was born {born}.")

    #> Tyrion Lannister was born In 273 AC, at Casterly Rock.

    glue_data(got_chars[[9]], "{name} was born {born}.")

    #> Daenerys Targaryen was born In 284 AC, at Dragonstone.

    View full-size slide

  26. glue_data(got_chars[[9]], "{name} was born {born}.")
    ~ glue_data( .x , "{name} was born {born}.")
    replace your
    example with .x
    prefix with ~ to say
    "it's a formula!"

    View full-size slide

  27. map_chr(got_chars, ~ glue_data(.x, "{name} was born {born}."))

    #> [1] "Theon Greyjoy was born In 278 AC or 279 AC, at Pyke."
    #> [2] "Tyrion Lannister was born In 273 AC, at Casterly Rock."
    #> [3] "Victarion Greyjoy was born In 268 AC or before, at Pyke."
    #> [4] "Will was born ."
    #> [5] "Areo Hotah was born In 257 AC or before, at Norvos."
    #> [6] "Chett was born At Hag's Mire."
    #> [7] "Cressen was born In 219 AC or 220 AC."
    #> [8] "Arianne Martell was born In 276 AC, at Sunspear."
    #> [9] "Daenerys Targaryen was born In 284 AC, at Dragonstone."
    drop-in to any member
    of the map_*() family

    View full-size slide

  28. name


    stuff


    this is a data frame!
    a tibble, specifically

    View full-size slide

  29. Why put a list into a data frame?
    safety & convenience
    •Manage multiple vectors holistically
    •Use existing toolkit for filter, select, etc.

    View full-size slide

  30. What happens in the
    data frame
    Stays in the data frame

    View full-size slide

  31. last R example:
    list in a data frame = list-column

    View full-size slide

  32. lists are part of life
    RStudio Object viewer helps
    tibbles are list-friendly
    map() functions help you
    compute on & simplify lists

    View full-size slide