Upgrade to Pro — share decks privately, control downloads, hide ads and more …

coRps R Building Blocks Series - 10 - Functions...

coRps R Building Blocks Series - 10 - Functions/Iteration

Aaron

May 16, 2022
Tweet

More Decks by Aaron

Other Decks in Programming

Transcript

  1. Learning Objectives • Understand the components of a function •

    Know the benefits of creating a function • Gain an understanding of the loop vs functional programming (purrr) and their differences
  2. mutate pivot_wider read_csv unite str_detect ungroup tribble summarize mdy slice_head

    na_if last_col split filter distinct count bind_rows map_dfr str_detect str_detect crossing drop_na across fill nest drive_ls last_col read_delim excel_sheets drive_auth fct_reorder split What are some tidyverse functions you know?
  3. shake <- function() { print(“show treat \n close fist around

    treat”) print(“hand out, palm up”) Sys.sleep(2) if(dog_shake == TRUE){ print(“give treat”) } else { print(“oops”) } } Anatomy
  4. You should consider writing a function whenever youʼve copied and

    pasted a block of code more than twice (i.e. you now have three copies of the same code). “ When should I write a function?
  5. Process • Write the script that solves your problem for

    a real scenario • Wrap the code in function(){script} to save it • Add the name of the real object(s) as an argument(s), function(df, x){script} • Name the function resolve_issue <- function(df, x){ script } • Use the function df_fix <- resolve_issue(df_treat, “Jupiter”)
  6. Naming functions guidelines • Ideally, the name of your function

    will be short, but clearly evoke what the function does • Generally, function names should be verbs, and arguments should be nouns • If your function name is composed of multiple words, I recommend using ʻsnake_caseʼ • If you have a family of functions that do similar things, make sure they have consistent names and arguments. “
  7. Functions in Action Goal - Across different sectors, I want

    to know the overall target achievement Target achievement = cumulative results / targets
  8. Functions in Action achv <- function(df){ df %>% mutate(achievement =

    cumulative / targets, achv_disp = percent(achievement, 1)) %>% arrange(desc(targets)) } achv(df_sect1) achv(df_sect2)
  9. Functions in Action achv <- function(df){ if(“targets” %in% names(df) ||

    “cumulative %in% names(df) stop(“Missing targets or cumulative column”) df %>% mutate(achievement = cumulative / targets, achv_disp = percent(achievement, 1)) %>% arrange(desc(targets)) }
  10. Functions in Action agg_achv <- function(df, low_lvl = .75){ sect_achv

    <- df %>% summarise(cumulative = sum(cumulative), achievement = sum(cumulative) / sum(targets)) if(sect_achv$achievement < low_lvl){ usethis::ui_warn("Sector only achieved {comma(sect_achv$cumulative)} results, only {percent(sect_achv$achievement,1)} of target achievement") } else { usethis::ui_info("{percent(sect_achv$achievement,1)} of sector's targets achieved") } }
  11. One tool for reducing duplication is functions, which reduce duplication

    by identifying repeated patterns of code and extract them out into independent pieces that can be easily reused and updated. Another tool for reducing duplication is iteration, which helps you when you need to do the same thing to multiple inputs: repeating the same operation on different columns, or on different datasets. “ Iterate to reduce duplication
  12. Copy and paste method What is the mean value in

    Sector 3 across the the three indicators, HTS_TST, HTS_TST_POS, and TX_NEW? mean(df_sect3$hts_tst) mean(df_sect3$hts_tst_pos) mean(df_sect3$tx_new)
  13. Letʼs iterate with for loop #remove the facility (character) column

    df <- df_sect3[-1] output <- vector("double", ncol(df)) # 1. output for (i in seq_along(df)) { # 2. sequence output[[i]] <- mean(df[[i]]) # 3. body } > output # 4. print values [1] 463.8 33.5 29.7 critical to allocate sufficient space for the output determine what to loop over actual code being applied output stored/saved to vector we can print
  14. Loop options 1. Loop over numeric indices : for (i

    in seq_along(xs)) 2. Loop over the elements: for (x in xs) 3. Loop over the names: for (nm in names(xs)) 4. Loop over unknown sequence length: while (condition)
  15. For loops are not as important in R as they

    are in other languages because R is a functional programming language. This means that itʼs possible to wrap up for loops in a function, and call that function instead of using the for loop directly. “ Final thoughts on loops
  16. list map(.x, .f) map2(.x, .y, .f) pmap(.l, .f) numeric vector

    map_dbl() map2_dbl() pmap_dbl() integer vector map_int() map2_int() pmap_int() character vector map_chr() map2_chr() pmap_chr() logical vector map_lgl() map2_lgl() pmap_lgl() data frame (by col) map_dfc() map2_dfc() pmap_dfc() data frame (by row) map_dfr() map2_dfr() pmap_dfr() render output not value walk() walk2() pwalk() Apply a function to each/pair/group of element of a list or vector, return a list. One list Two lists Many lists
  17. Letʼs explore map map(.x, .f, …) a list a function

    additional arguments to pass into the function
  18. Letʼs explore map map2_chr(.x = pepfar_country_list$operatingunit, .y = pepfar_country_list$country, .f

    = ~ ifelse(.x == .y, .x, glue("{.x}/{.y}"))) pass in a custom/named function
  19. Use case for map > (files <- list.files(“Data”)) [1] MER_Structured_Datasets_Site_IM_FY50-52_v2_Jupiter.zip

    [2] MER_Structured_Datasets_Site_IM_FY50-52_v2_Neptune.zip [3] MER_Structured_Datasets_Site_IM_FY50-52_v2_Saturn.zip > df_tx_nn <- map_dfr(.x = files, .f = ~ read_msd(.x) %>% filter(indicator %in% c(“TX_CURR”,"TX_NEW", “TX_NET_NEW”), standardizeddisaggregate == "Total Numerator"), fiscal_year == 2052) > plots <- df_tx_nn %>% count(country, orgunituid, wt = cumulative) %>% pivot_wider(names_from = indicator, values_from = n) %>% split(.$country) %>% map(~ggplot(., aes(tx_net_new, tx_new, size = tx_curr)) + geom_point()) > paths <- str_c(names(plots), ".png") > pwalk(list(paths, plots), ggsave, path = tempdir())
  20. Resources & Sources • R for Data Science, Garrett Grolemund

    and Hadley Wickham • Master the Tidyverse - 6 - Iteration with purrr - Garrett Grolemund • Cupcakes (for-loops vs map/lapply) - Hadley Wickham • https://www.tidyverse.org/