Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The Future of Statistics Education: A Computati...

The Future of Statistics Education: A Computational Perspective

Statistics education stands at a critical juncture as we navigate the intersection of traditional statistical theory, modern computational approaches, and emerging AI technologies. This talk examines how statisticians can reimagine curricula by embracing computation as a foundational element rather than an afterthought. While traditional statistics education has prioritized theoretical frameworks and applications, computation has emerged as the backbone of contemporary data analysis, from data acquisition and wrangling to visualization, modeling, and communication. Now, AI tools are further transforming this landscape, creating both opportunities and challenges for statistics and data science educators. The presentation will outline a forward-looking curriculum model for introductory courses that balances statistical thinking, data science methods, and explicit computational instruction.

Avatar for Mine Cetinkaya-Rundel

Mine Cetinkaya-Rundel

May 20, 2025
Tweet

More Decks by Mine Cetinkaya-Rundel

Other Decks in Education

Transcript

  1. Mine Çetinkaya-rundel The Future of Statistics Education A Computational Perspective

    Duke university ICERM - Applied Math in Statistics and Data Science Education May 20, 2025 bit.ly/future-stat-ed-icerm
  2. Pop Quiz You have been given a data set (poisson.csv)

    of count observations along with two features - one numerical and the other categorical. Fit a Poisson regression model to these data with R. Report the estimates for the regression coefficients you obtain and interpret them in the context of the data.
  3. Did do a “good job”? Would it get full credit?

    No. Would it get partial credit? No. Probably. Maybe it shouldn’t?
  4. Pop Quiz You have been given a data set (poisson.csv)

    of count observations along with two features - one numerical and the other categorical. Fit a Poisson regression model to these data with R. Report the estimates for the regression coefficients you obtain and interpret them in the context of the data.
  5. You have been given a data set (poisson.csv) of count

    observations along with two features - one numerical and the other categorical. Fit a Poisson regression model to these data with R. Report the estimates for the regression coefficients you obtain and interpret them in the context of the data.
  6. David Spiegelhalter Professor, University of Cambridge “There is no substitute

    for simply looking at data properly.” from “The Art of Statistics”
  7. “This talk examines how statisticians can reimagine curricula by embracing

    computation as foundational elements rather than afterthoughts.” Over-promiser, but let’s give it a try! Mine Çetinkaya-rundel Professor, Duke university from My abstract
  8. Data science Regression probability Elective Elective Case studies Elective Bayesian

    modeling Math stat Statistical computing Statistical science at Duke
  9. Data science Regression probability Elective Elective Case studies Elective Bayesian

    modeling Math stat Statistical computing Statistical science at Duke with computational labs
  10. Jiang, Yue. STA 440 - Fall 2024. www2.stat.duke.edu/courses/Fall24/sta440.001. Crickets Crickets

    are highly nutritious and provide a cost-effective source of dietary protein […] In the present study, we analyze the life history traits in order to compare productivity of these four species when reared in various conditions. […] Provide a high quality, careful exploratory analysis. Is there anything unusual about the data that might suggest issues with the experimental protocol? Which species is most suited to cultivation? Analyze the growth, reproductive trajectory, and mortality of the species. Are some more suited for cultivation for others with respect to reproduction and growth? Is there a particular time in their life cycle when you see premature mortality? […] Each project must be submitted as a GitHub repository with, at a minimum, a reproducible Quarto document.
  11. Jiang, Yue. STA 440 - Fall 2024. www2.stat.duke.edu/courses/Fall24/sta440.001. Crickets Crickets

    are highly nutritious and provide a cost-effective source of dietary protein […] In the present study, we analyze the life history traits in order to compare productivity of these four species when reared in various conditions. […] Provide a high quality, careful exploratory analysis. Is there anything unusual about the data that might suggest issues with the experimental protocol? Which species is most suited to cultivation? Analyze the growth, reproductive trajectory, and mortality of the species. Are some more suited for cultivation for others with respect to reproduction and growth? Is there a particular time in their life cycle when you see premature mortality? […] Each project must be submitted as a GitHub repository with, at a minimum, a reproducible Quarto document.
  12. Jiang, Yue. STA 440 - Fall 2024. www2.stat.duke.edu/courses/Fall24/sta440.001. Crickets Crickets

    are highly nutritious and provide a cost-effective source of dietary protein […] In the present study, we analyze the life history traits in order to compare productivity of these four species when reared in various conditions. […] Provide a high quality, careful exploratory analysis. Is there anything unusual about the data that might suggest issues with the experimental protocol? Which species is most suited to cultivation? Analyze the growth, reproductive trajectory, and mortality of the species. Are some more suited for cultivation for others with respect to reproduction and growth? Is there a particular time in their life cycle when you see premature mortality? […] Each project must be submitted as a GitHub repository with, at a minimum, a reproducible Quarto document.
  13. This observation is a joke made famous by the late

    comedian Mitch Hedberg. [There’s an old blog post that does this in Python…] Note that both websites have changed substantially in the last decade and the original approaches no longer work. Scraping the Denny's site involves the traversal of a hierarchical series of location and restaurant pages […] This data collection must be constructed in a reproducible fashion - all web pages being scraped should be cached locally and each analysis step should be self contained in a separate R script. You will also create a Makef i le that will run your R scripts and render your report. […] To make our lives even more complicated, La Quinta's website now makes use of Javascript which makes using tools like rvest more difficult. […] Using the results of your scraping you should analyze the veracity of Hedberg's claim. […] Like your previous assignments we have included a GitHub action which is designed to provide feedback on the reproducibility of your assignment. […] Rundel, Colin. STA 323 - Spring 2025. sta323-sp25.github.io. La Quinta is Spanish for next to Denny's
  14. This observation is a joke made famous by the late

    comedian Mitch Hedberg. [There’s an old blog post that does this in Python…] Note that both websites have changed substantially in the last decade and the original approaches no longer work. Scraping the Denny's site involves the traversal of a hierarchical series of location and restaurant pages […] This data collection must be constructed in a reproducible fashion - all web pages being scraped should be cached locally and each analysis step should be self contained in a separate R script. You will also create a Makef i le that will run your R scripts and render your report. […] To make our lives even more complicated, La Quinta's website now makes use of Javascript which makes using tools like rvest more difficult. […] Using the results of your scraping you should analyze the veracity of Hedberg's claim. […] Like your previous assignments we have included a GitHub action which is designed to provide feedback on the reproducibility of your assignment. […] Rundel, Colin. STA 323 - Spring 2025. sta323-sp25.github.io. La Quinta is Spanish for next to Denny's
  15. This observation is a joke made famous by the late

    comedian Mitch Hedberg. [There’s an old blog post that does this in Python…] Note that both websites have changed substantially in the last decade and the original approaches no longer work. Scraping the Denny's site involves the traversal of a hierarchical series of location and restaurant pages […] This data collection must be constructed in a reproducible fashion - all web pages being scraped should be cached locally and each analysis step should be self contained in a separate R script. You will also create a Makef i le that will run your R scripts and render your report. […] To make our lives even more complicated, La Quinta's website now makes use of Javascript which makes using tools like rvest more difficult. […] Using the results of your scraping you should analyze the veracity of Hedberg's claim. […] Like your previous assignments we have included a GitHub action which is designed to provide feedback on the reproducibility of your assignment. […] Rundel, Colin. STA 323 - Spring 2025. sta323-sp25.github.io. La Quinta is Spanish for next to Denny's
  16. This observation is a joke made famous by the late

    comedian Mitch Hedberg. [There’s an old blog post that does this in Python…] Note that both websites have changed substantially in the last decade and the original approaches no longer work. Scraping the Denny's site involves the traversal of a hierarchical series of location and restaurant pages […] This data collection must be constructed in a reproducible fashion - all web pages being scraped should be cached locally and each analysis step should be self contained in a separate R script. You will also create a Makef i le that will run your R scripts and render your report. […] To make our lives even more complicated, La Quinta's website now makes use of Javascript which makes using tools like rvest more difficult. […] Using the results of your scraping you should analyze the veracity of Hedberg's claim. […] Like your previous assignments we have included a GitHub action which is designed to provide feedback on the reproducibility of your assignment. […] Rundel, Colin. STA 323 - Spring 2025. sta323-sp25.github.io. La Quinta is Spanish for next to Denny's
  17. In the twentieth century, avant-garde composers experimented with new techniques

    for composing music. […] He would write music by simulating random processes, either in the physical world or on a computer. To put it crudely, whatever the simulation spit out, that’s what he’d write on paper and hand to the musicians to play. The art of this approach lay in how the simulation was constructed. […] In this lab, we will write our own pieces of stochastic music. We will do this using the gm package by Renfei Mao. […] Your task here is simple: play. Play, and surprise yourself. Set up a simulation, and use the output to randomly determine the elements of a piece of music: the melody, harmony, rhythm, meter, pitch, articulation, instrumentation, etc. The sky is truly the limit. The goal is to generate music that contains surprising emergent properties that you would not have anticipated based on the rules of the system you designed. Upload […]: an R script, an mp3 file, and a paragraph describing the thought process behind the simulation you set up […] and anything that surprised you. Zito, John. STA 240 - Spring 2025. sta240-s25.github.io. Rhapsody in R
  18. In the twentieth century, avant-garde composers experimented with new techniques

    for composing music. […] He would write music by simulating random processes, either in the physical world or on a computer. To put it crudely, whatever the simulation spit out, that’s what he’d write on paper and hand to the musicians to play. The art of this approach lay in how the simulation was constructed. […] In this lab, we will write our own pieces of stochastic music. We will do this using the gm package by Renfei Mao. […] Your task here is simple: play. Play, and surprise yourself. Set up a simulation, and use the output to randomly determine the elements of a piece of music: the melody, harmony, rhythm, meter, pitch, articulation, instrumentation, etc. The sky is truly the limit. The goal is to generate music that contains surprising emergent properties that you would not have anticipated based on the rules of the system you designed. Upload […]: an R script, an mp3 file, and a paragraph describing the thought process behind the simulation you set up […] and anything that surprised you. Zito, John. STA 240 - Spring 2025. sta240-s25.github.io. Rhapsody in R
  19. Program Import Tidy Communicate Understand Transform Model Visualize Wickham, H.,

    Çetinkaya-Rundel, M., & Grolemund, G. (2023). R for Data Science, 2nd Edition. DOING DATA SCIENCE
  20. LEARNING DATA SCIENCE hello world exploring data ethics rigorous conclusions

    looking further visualize import wrangle misrepre- sentaton data privacy algorith- mic bias model infer predict communicate
  21. Ethics exploring data visualize import wrangle ethics misrepre- sentaton data

    privacy algorith- mic bias + responsibility communicate hello world
  22. Rigorous conclusions exploring data visualize import wrangle hello world ethics

    misrepre- sentaton data privacy algorith- mic bias rigorous conclusions model infer predict communicate + complexity +
  23. Looking further exploring data visualize import wrangle hello world ethics

    misrepre- sentaton data privacy algorith- mic bias rigorous conclusions model infer predict looking further 🦪 communicate
  24. Communication communicate exploring data visualize import wrangle hello world ethics

    misrepre- sentaton data privacy algorith- mic bias rigorous conclusions model infer predict looking further
  25. population # A tibble: 217 × 3 country year population

    <chr> <dbl> <dbl> 1 Afghanistan 2022 41129. 2 Albania 2022 2778. 3 Algeria 2022 44903. 4 American Samoa 2022 44.3 5 Andorra 2022 79.8 6 Angola 2022 35589. 7 Antigua and Barbuda 2022 93.8 8 Argentina 2022 46235. 9 Armenia 2022 2780. 10 Aruba 2022 106. # ℹ 207 more rows continents # A tibble: 285 × 4 entity code year continent <chr> <chr> <dbl> <chr> 1 Abkhazia OWID_ABK 2015 Asia 2 Afghanistan AFG 2015 Asia 3 Akrotiri and Dhekelia OWID_AKD 2015 Asia 4 Aland Islands ALA 2015 Europe 5 Albania ALB 2015 Europe 6 Algeria DZA 2015 Africa 7 American Samoa ASM 2015 Oceania 8 Andorra AND 2015 Europe 9 Angola AGO 2015 Africa 10 Anguilla AIA 2015 North America # ℹ 275 more rows population_continents < - left_join(population, continents, join_by(country == entity)) ✓ data joins
  26. population_continents | > f i lter(is.na(continent)) # A tibble: 6

    × 6 country year.x population code year.y continent <chr> <dbl> <dbl> <chr> <dbl> <chr> 1 Congo, Dem. Rep. 2022 99010. NA NA NA 2 Congo, Rep. 2022 5970. NA NA NA 3 Hong Kong SAR, China 2022 7346. NA NA NA 4 Korea, Dem. People's Rep. 2022 26069. NA NA NA 5 Korea, Rep. 2022 51628. NA NA NA 6 Kyrgyz Republic 2022 6975. NA NA NA ✓ data joins ✓ data wrangling
  27. population_continent < - population | > mutate( country = case_when(

    country = = "Congo, Dem. Rep." ~ "Democratic Republic of Congo", country = = "Congo, Rep." ~ "Congo", country = = "Hong Kong SAR, China" ~ "Hong Kong", country = = "Korea, Dem. People's Rep." ~ "North Korea", country = = "Korea, Rep." ~ "South Korea", country = = "Kyrgyz Republic" ~ "Kyrgyzstan", .default = country ) ) | > left_join(continents, by = join_by(country = = entity)) ✓ data joins ✓ data wrangling ✓ data cleaning ✓ ethics
  28. ✓ data joins ✓ data wrangling ✓ data cleaning ✓

    ethics ✓ critique ✓ improving visualizations
  29. ✓ data joins ✓ data wrangling ✓ data cleaning ✓

    ethics ✓ critique ✓ improving visualizations ✓ mapping ✓ iteration
  30. # A tibble: 500 × 6 title author date abstract

    column url <chr> <chr> <date> <chr> <chr> <chr> 1 Community members share remembrances for Ian Hyun Kim Remem… 2025-05-06 "We wel… Campu… http… 2 The Chronicle is accepting remembrances for Ian Hyun Kim Remem… 2025-05-03 "If you… Campu… http… 3 The end Audre… 2025-05-01 "I wish… Opini… http… 4 Stop banning reporters from covering campus protests Robin… 2025-04-26 "Duke s… Opini… http… 5 Your voice is a currency — so use it thoughtfully Alice… 2025-04-23 "The tr… Opini… http… 6 A fortune cookie come true Abby … 2025-04-23 "With m… Opini… http… 7 Journalism is in crisis. We should look to student newspapers for answer… Zoe K… 2025-04-23 "Journa… Opini… http… 8 You can just do things Jules… 2025-04-23 "As I’m… Opini… http… 9 Oh, the places you’ll go Karen… 2025-04-23 "This j… Opini… http… 10 For the love of the game Ranja… 2025-04-23 "What I… Opini… http… # ℹ 490 more rows # ℹ Use `print(n = . . . )` to see more rows ✓ web scraping
  31. bow("https: / / w w w .dukechronicle.com") <polite session> https:

    / / w w w .dukechronicle.com User - agent: polite R package robots.txt: 4 rules are def i ned for 4 bots Crawl delay: 10 sec The path is scrapable for this user - agent ✓ web scraping ✓ terms of use ✓ ethics
  32. ✓ web scraping ✓ terms of use ✓ ethics ✓

    text analysis ✓ data wrangling ✓ data visualization
  33. ✓ web scraping ✓ terms of use ✓ ethics ✓

    text analysis ✓ data wrangling ✓ data visualization ✓ sentiment analysis
  34. ✓ logistic regression ✓ classification ✓ decision errors ✓ sensitivity

    / specificity ✓ intuition around loss functions Patient has cancer Patient doesn’t have cancer Patient is diagnosed with cancer Patient is not diagnosed with cancer Defendant re-offends Defendant doesn’t re-offend Defendant will re-offend Defendant will not re-offend
  35. and we could keep going on with examples… but let’s

    talk a bit about pedagogy, assessment, and challenges (in light of to AI)
  36. 1. Moving some assessments to the classroom 2. Providing targeted

    resources for relevant and unlimited answers 3. Shifting ai use away from taking shortcuts, towards supporting learning (Maybe) 4. Leveling the playing field with explicit “how best to ai” instruction