Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Applied machine learning with tidymodels
Search
Julia Silge
June 22, 2022
Technology
170
0
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
Applied machine learning with tidymodels
useR! 2022 keynote
Julia Silge
June 22, 2022
More Decks by Julia Silge
See All by Julia Silge
Introducing Positron
juliasilge
1
380
The right tool for the job
juliasilge
0
90
Good practices for applied machine learning
juliasilge
0
250
Maintaining an R Package
juliasilge
0
440
Publishing the Stack Overflow Developer Survey
juliasilge
2
100
Text Mining: Exploratory Data Analysis to Machine Learning
juliasilge
1
260
Text Mining Using Tidy Data Principles
juliasilge
0
190
North American Developer Hiring Landscape
juliasilge
0
90
Understanding Principal Component Analysis Using Stack Overflow Data
juliasilge
13
4.6k
Other Decks in Technology
See All in Technology
非定型業務をAI slackbotで自動化する ~ 社内要望を自動壁打ちするbotを作った ~/automating-ad-hoc-work-with-ai-slackbot
shibayu36
0
520
新しいVibe Codingと”自走”について
watany
5
240
社内 AI エージェント Synapse と セマンティックレイヤーの育て方
hiroakis
0
910
やさしいA2A入門
minorun365
PRO
7
770
ABEMA の Datadog × OTel 基盤、 中から見るか? 外から見るか?
tetsuya28
0
110
10倍の生産性を実現するAI駆動並列エージェントのすべて
kumaiu
4
1.1k
AIにフローを作らせようとして挫折した話
hamatsutaichi
0
240
「速く作る」から「正しく作る」へ ─ 生成AI時代の開発フロー改革の ロードマップと実行 ─
starfish719
0
9.1k
なぜ Platform Engineering の土台に Kubernetes を選ぶのか
r4ynode
0
220
あなたの AI ワークスペースに、 専門コーダーを連れてくる - Amazon Quick Desktop 最新情報
kawaji_scratch
1
120
protovalidate-es を導入してみた
bengo4com
0
160
LLMと共に進化するプロセスを目指して
ymatsuwitter
12
3.7k
Featured
See All Featured
Gemini Prompt Engineering: Practical Techniques for Tangible AI Outcomes
mfonobong
2
430
The Web Performance Landscape in 2024 [PerfNow 2024]
tammyeverts
12
1.2k
End of SEO as We Know It (SMX Advanced Version)
ipullrank
3
4.2k
A designer walks into a library…
pauljervisheath
211
24k
DevOps and Value Stream Thinking: Enabling flow, efficiency and business value
helenjbeal
1
220
The Spectacular Lies of Maps
axbom
PRO
1
790
First, design no harm
axbom
PRO
2
1.2k
State of Search Keynote: SEO is Dead Long Live SEO
ryanjones
0
200
Fashionably flexible responsive web design (full day workshop)
malarkey
408
66k
The browser strikes back
jonoalderson
0
1.2k
Visual Storytelling: How to be a Superhuman Communicator
reverentgeek
2
550
The Art of Programming - Codeland 2020
erikaheidi
57
14k
Transcript
A pl ed ac in L ar in w th
t dy od ls J li S lg @j l
H ll @j l
h ://x .c /1 /
I a c : h ://v .c /b /m _l
g/
I a c : h ://v .c /b /m _l
g/
W at's he ar es p rt bo t ac
in l ar in i p ac ic ? @j l
@j l
library(tidymodels) #> ── Attaching packages ────────────────────────────────────────────── tidymodels 0.2.0 ── #>
✔ broom 0.8.0 ✔ rsample 0.1.1 #> ✔ dials 1.0.0 ✔ tibble 3.1.7 #> ✔ dplyr 1.0.9 ✔ tidyr 1.2.0 #> ✔ infer 1.0.2 ✔ tune 0.2.0 #> ✔ modeldata 0.1.1 ✔ workflows 0.2.6 #> ✔ parsnip 1.0.0 ✔ workflowsets 0.2.1 #> ✔ purrr 0.3.4 ✔ yardstick 1.0.0 #> ✔ recipes 0.2.0 #> ── Conflicts ───────────────────────────────────────────────── tidymodels_conflicts() ── #> ✖ purrr::discard() masks scales::discard() #> ✖ dplyr::filter() masks stats::filter() #> ✖ dplyr::lag() masks stats::lag() #> ✖ recipes::step() masks stats::step() #> • Dig deeper into tidy modeling with R at https://www.tmwr.org @j l
None
t wr.o g
T re t pi s or od y 4 S
u t b 4 W u m s n e 4 G u m o u l @j l
S en in y ur at b dg t @j
l
r am le h tp ://r am le.t dy od
ls.o g @j l
D ta pl tt ng @j l
initial_split() S t r y t a t n t
g s penguins_split <- initial_split(penguins, prop = 0.75) penguins_split #> <Training/Testing/Total> #> <249/84/333> @j l
training() a d testing() C t g n t t
o rsplit penguins_train <- training(penguins_split) penguins_train #> # A tibble: 249 × 8 #> species island bill_length_mm bill_depth_mm flipper_length_… body_mass_g sex #> <fct> <fct> <dbl> <dbl> <int> <int> <fct> #> 1 Chinst… Dream 47.6 18.3 195 3850 fema… #> 2 Adelie Torge… 35.7 17 189 3350 fema… #> 3 Gentoo Biscoe 45.5 15 220 5000 male #> 4 Gentoo Biscoe 48.7 15.7 208 5350 male #> 5 Gentoo Biscoe 46.5 13.5 210 4550 fema… #> # … with 244 more rows, and 1 more variable: year <int> @j l
training() a d testing() C t g n t t
o rsplit penguins_test <- testing(penguins_split) penguins_test #> # A tibble: 84 × 8 #> species island bill_length_mm bill_depth_mm flipper_length_… body_mass_g sex #> <fct> <fct> <dbl> <dbl> <int> <int> <fct> #> 1 Adelie Torge… 40.3 18 195 3250 fema… #> 2 Adelie Torge… 36.7 19.3 193 3450 fema… #> 3 Adelie Torge… 36.6 17.8 185 3700 fema… #> 4 Adelie Torge… 34.4 18.4 184 3325 fema… #> 5 Adelie Torge… 46 21.5 194 4200 male #> # … with 79 more rows, and 1 more variable: year <int> @j l
T e es in d ta s re io s
! @j l
H w an e se he ra ni g et
o c mp re, e al at , a d un m de s? @j l
@j l
C os -v li at on 14 18 28 17
21 25 22 8 6 30 1 23 27 3 2 19 11 7 26 24 16 9 4 29 20 12 13 15 5 10 14 18 28 17 21 25 22 8 6 30 1 23 27 3 2 19 11 7 26 24 16 9 4 29 20 12 13 15 5 10 @j l
C os -v li at on Model Fit Using Estimate
Performance Using Fold 1 Iteration Fold 2 Iteration Fold 3 Iteration 14 29 17 20 21 8 24 28 3 1 13 26 16 9 5 30 19 15 6 12 27 22 23 25 2 18 7 4 11 10 11 28 18 22 23 7 25 27 4 2 10 26 16 8 5 29 20 13 6 9 30 19 21 24 1 17 12 3 15 14 14 27 18 21 22 7 23 25 2 1 12 24 17 10 3 30 19 15 4 11 29 20 26 28 5 16 8 6 13 9 @j l
C os -v li at on set.seed(123) vfold_cv(penguins_train, strata =
species) #> # 10-fold cross-validation using stratification #> # A tibble: 10 × 2 #> splits id #> <list> <chr> #> 1 <split [223/26]> Fold01 #> 2 <split [223/26]> Fold02 #> 3 <split [223/26]> Fold03 #> 4 <split [224/25]> Fold04 #> 5 <split [224/25]> Fold05 #> 6 <split [224/25]> Fold06 #> 7 <split [225/24]> Fold07 #> 8 <split [225/24]> Fold08 #> 9 <split [225/24]> Fold09 #> 10 <split [225/24]> Fold10 @j l
B ot tr pp ng Model Fit Using Estimate Performance
Using Bootstrap Iteration 1 16 19 27 19 23 25 23 13 8 29 1 24 25 4 1 21 14 10 25 23 17 13 7 28 22 15 16 16 8 13 18 28 26 30 3 9 2 24 5 11 12 20 6 12 15 27 14 18 23 21 4 4 30 2 22 28 3 2 17 7 4 23 22 14 6 3 28 17 10 11 12 3 6 20 29 5 13 1 26 8 16 19 24 9 15 19 22 18 20 21 20 5 5 30 2 21 22 3 2 19 10 5 21 21 18 6 3 29 20 11 12 16 4 7 24 28 27 8 14 1 26 9 17 23 25 13 Bootstrap Iteration 2 Bootstrap Iteration 3 @j l
B ot tr pp ng set.seed(123) bootstraps(penguins_train, strata = species)
#> # Bootstrap sampling using stratification #> # A tibble: 25 × 2 #> splits id #> <list> <chr> #> 1 <split [249/91]> Bootstrap01 #> 2 <split [249/93]> Bootstrap02 #> 3 <split [249/96]> Bootstrap03 #> 4 <split [249/88]> Bootstrap04 #> 5 <split [249/89]> Bootstrap05 #> 6 <split [249/82]> Bootstrap06 #> 7 <split [249/87]> Bootstrap07 #> 8 <split [249/87]> Bootstrap08 #> 9 <split [249/85]> Bootstrap09 #> 10 <split [249/95]> Bootstrap10 #> # … with 15 more rows @j l
R sa pl ng et od S u t w
t c ea e im la ed al da io s t(s) vfold_cv() loo_cv() mc_cv() bootstraps() validation_split() @j l
W er d es ou m de s ar a
d nd? @j l
@j l
@j l
w rk o s h tp ://w rf lo s.t
dy od ls.o g/ @j l
@j l
W er d es ou m de s ar a
d nd? rf_spec <- rand_forest(mode = "classification") penguin_formula <- species ~ bill_length_mm + bill_depth_mm + sex @j l
W er d es ou m de s ar a
d nd? workflow(penguin_formula, rf_spec) #> ══ Workflow ════════════════════════════════════════════════════════════════════════════ #> Preprocessor: Formula #> Model: rand_forest() #> #> ── Preprocessor ──────────────────────────────────────────────────────────────────────── #> species ~ bill_length_mm + bill_depth_mm + sex #> #> ── Model ─────────────────────────────────────────────────────────────────────────────── #> Random Forest Model Specification (classification) #> #> Computational engine: ranger @j l
W er d es ou m de s ar a
d nd? workflow(penguin_formula, rf_spec) %>% fit(data = penguins_train) #> ══ Workflow [trained] ══════════════════════════════════════════════════════════════════ #> Preprocessor: Formula #> Model: rand_forest() #> #> ── Preprocessor ──────────────────────────────────────────────────────────────────────── #> species ~ bill_length_mm + bill_depth_mm + sex #> #> ── Model ─────────────────────────────────────────────────────────────────────────────── #> Ranger result #> #> Call: #> ranger::ranger(x = maybe_data_frame(x), y = y, num.threads = 1, #> verbose = FALSE, seed = sample.int(10^5, 1), probability = TRUE) #> #> Type: Probability estimation #> Number of trees: 500 #> Sample size: 249 #> Number of independent variables: 3 #> Mtry: 1 #> Target node size: 10 #> Variable importance mode: none #> Splitrule: gini #> OOB prediction error (Brier s.): 0.05585744 @j l
I a A H
W er d es ou m de s ar a
d nd? penguin_rec <- recipe(species ~ bill_length_mm + bill_depth_mm + sex, data = penguins_train) %>% step_dummy(sex) %>% step_normalize(all_numeric_predictors()) penguin_rec #> Recipe #> #> Inputs: #> #> role #variables #> outcome 1 #> predictor 3 #> #> Operations: #> #> Dummy variables from sex #> Centering and scaling for all_numeric_predictors() @j l
W er d es ou m de s ar a
d nd? svm_spec <- svm_linear(mode = "classification") workflow(penguin_rec, svm_spec) #> ══ Workflow ════════════════════════════════════════════════════════════════════════════ #> Preprocessor: Recipe #> Model: svm_linear() #> #> ── Preprocessor ──────────────────────────────────────────────────────────────────────── #> 2 Recipe Steps #> #> • step_dummy() #> • step_normalize() #> #> ── Model ─────────────────────────────────────────────────────────────────────────────── #> Linear Support Vector Machine Specification (classification) #> #> Computational engine: LiblineaR @j l
W er d es ou m de s ar a
d nd? penguin_fit <- workflow(penguin_rec, svm_spec) %>% fit(data = penguins_train) @j l
G t ou m de o y ur l pt
p @j l
v ti er h tp ://v ti er.r tu io.c
m @j l
@j l
@j l
G t ou m de o y ur ap op
library(vetiver) v <- vetiver_model(penguin_fit, "svm_penguins") v #> #> ── svm_penguins ─ <butchered_workflow> model for deployment #> A LiblineaR classification modeling workflow using 3 features @j l
G t ou m de o y ur ap op
library(plumber) pr() %>% vetiver_api(v) #> # Plumber router with 2 endpoints, 4 filters, and 1 sub-router. #> # Use `pr_run()` on this object to start the API. #> ├──[queryString] #> ├──[body] #> ├──[cookieParser] #> ├──[sharedSecret] #> ├──/logo #> ├──/ping (GET) #> └──/predict (POST) @j l
G t ou m de o y ur ap op
4 P -b d R C 4 G e D l o o d e t @j l
G t ou m de o y ur ap op
# Generated by the vetiver package; edit with care FROM rocker/r-ver:4.2.0 ENV RENV_CONFIG_REPOS_OVERRIDE https://packagemanager.rstudio.com/cran/latest RUN apt-get update -qq && apt-get install -y --no-install-recommends \ libcurl4-openssl-dev \ libicu-dev \ libsodium-dev \ libssl-dev \ make COPY vetiver_renv.lock renv.lock RUN Rscript -e "install.packages('renv')" RUN Rscript -e "renv::restore()" COPY plumber.R /opt/ml/plumber.R EXPOSE 8000 ENTRYPOINT ["R", "-e", "pr <- plumber::plumb('/opt/ml/plumber.R'); pr$run(host = '0.0.0.0', port = 8000)"] @j l
M re o ea n! @j l
T an y u! h ://y .c /j l /
h ://j l .c / h ://t e .o / h ://t .o / P a M U h