Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Automating Machine Learning
Search
Andreas Mueller
July 15, 2016
Science
1.2k
4
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
Automating Machine Learning
Andreas Mueller
July 15, 2016
More Decks by Andreas Mueller
See All by Andreas Mueller
PyCon India - Commodity Machine Learning; past, present and future
amueller
0
2.7k
Engineering Scikit-Learn V2
amueller
0
310
Advanced Machine Learning with Scikit-Learn for Pycon Amsterdam
amueller
0
300
Scikit-learn: New project features in 0.17
amueller
0
150
Bootstrapping machine learning
amueller
0
150
PyData Berlin 2014 Keynote: Commodity machine learnin
amueller
0
200
Advanced Machine Learning with Scikit-Learn
amueller
1
760
Machine Learning With Scikit-Learn ODSC SF 2015
amueller
4
1.8k
Machine Learning With Scikit-Learn - Pydata Strata NYC 2015
amueller
1
3k
Other Decks in Science
See All in Science
1. CPC理論の展開と集合的知能モデル(JSAI2026 KS-27 集合的予測符号化と新たな知性の時代)
hayashiyus884
1
200
Non-Gaussian, nonlinear causal discovery with hidden variables and application
sshimizu2006
0
140
機械学習 - SVM
trycycle
PRO
2
1.1k
HajimetenoLT vol.17
hashimoto_kei
1
240
検索と推論タスクに関する論文の紹介
ynakano
1
230
(メタ)科学コミュニケーターからみたAI for Scienceの同床異夢
rmaruy
0
250
Rashomon at the Sound: Reconstructing all possible paleoearthquake histories in the Puget Lowland through topological search
cossatot
0
1k
大黒市で発生した大規模インシデント の ポストモーテムから読み解く、 記憶媒体消去の大切さ
shucho0103
0
190
20260220 OpenIDファウンデーション・ジャパン ご紹介 / 20260220 OpenID Foundation Japan Intro
oidfj
0
360
Tensor Factorization Meets Deformed Information Geometry: Convex Relaxation under Deformed Algebra
gkazunii
0
110
Utiliser Bitcoin sans Internet
rlifchitz
0
200
因果推論と機械学習
sshimizu2006
1
1.2k
Featured
See All Featured
Test your architecture with Archunit
thirion
1
2.3k
Lightning talk: Run Django tests with GitHub Actions
sabderemane
0
200
How Software Deployment tools have changed in the past 20 years
geshan
0
34k
Claude Code のすすめ
schroneko
67
230k
Making Projects Easy
brettharned
120
6.7k
Unlocking the hidden potential of vector embeddings in international SEO
frankvandijk
0
840
ReactJS: Keep Simple. Everything can be a component!
pedronauck
666
130k
Digital Projects Gone Horribly Wrong (And the UX Pros Who Still Save the Day) - Dean Schuster
uxyall
1
1.7k
Odyssey Design
rkendrick25
PRO
2
700
Discover your Explorer Soul
emna__ayadi
2
1.1k
How to Build an AI Search Optimization Roadmap - Criteria and Steps to Take #SEOIRL
aleyda
1
2.1k
WENDY [Excerpt]
tessaabrams
11
38k
Transcript
Andreas Mueller (NYU Center for Data Science, scikit-learn) Automatic Machine
Learning?
Why?
Issues with current tools (scikit-learn)
Flow chart / selecting model
Selecting Hyper-Parameters
Scikit-learn: Explicit is better than implicit make_pipeline( OneHotEncoder(), Imputer(), StandardScaler(),
SVC())
What? from automl import AutoClassifier clf = AutoClassifier().fit(X_train, y_train) >
Current Accuracy: 70% (AUC .65) LinearSVC(C=1), 10sec > Current Accuracy: 76% (AUC .71) RandomForest(n_estimators=20) 30sec > Current Accuracy: 80% (AUC .74) RandomForest(n_estimators=500) 30sec
Step 1: Automate Parameter Selection
Step 2: Automate Model Selection
Step 3: Automate Pipeline Selection
How?
Formalizing the Search Space Discrete and Continuous Parameters Conditional Parameters
Fixed pipeline vs flexible pipeline
Formalizing the Search Space Discrete and Continuous Parameters Conditional Parameters
Fixed pipeline vs flexible pipeline
Search Methods
Exhaustive Search (Grid Search)
Randomized Search
Bayesian Optimization (SMBO)
None
None
None
Gaussian Processes
Random Forest Based (SMAC)
Non-parametric (TPE)
None
None
Warm-starting and Meta-learning
Meta-Learning optimization Algorithm + Parameters Dataset 1
Meta-Learning optimization Algorithm + Parameters Dataset 3 optimization Algorithm +
Parameters Dataset 2 optimization Algorithm + Parameters Dataset 1
Meta-Learning Meta-Features 1 optimization Algorithm + Parameters Dataset 3 optimization
Algorithm + Parameters Dataset 2 optimization Algorithm + Parameters Dataset 1 Meta-Features 2 Meta-Features 3 ML model
Meta-Learning Meta-Features 1 optimization Algorithm + Parameters Dataset 3 optimization
Algorithm + Parameters Dataset 2 optimization Algorithm + Parameters Dataset 1 Meta-Features 2 Meta-Features 3 ML model New Dataset ML model Algorithm + Parameters
Meta-Features
Existing Approaches
auto-sklearn (Hutter, Feurer, Eggensperger) http://automl.github.io/auto-sklearn/stable/
Autoweka
Hyperopt-sklearn
TPot
Spearmint https://github.com/HIPS/Spearmint
Scikit-optimize
Within Scikit-learn • GridSearchCV • RandomizedSearchCV • BayesianSearchCV (coming) •
Searching over Pipelines (coming) • Built-in parameter ranges (coming)
TODO Clean separation of: • Model Search Space • Pipeline
Search Space • Optimization Method • Meta-Learning • Exploit prior knowledge better! • Usability • Runtime consideration
TODO Clean separation of: • Model Search Space • Pipeline
Search Space • Optimization Method • Meta-Learning • Exploit prior knowledge better! • Usability • Runtime consideration • Data subsampling
Criticism
Randomized Search works well
Do we need 100 Classifiers? Do we need Complex pipelines?
I don’t want a black-box!
46 http://oreilly.com/pub/get/scipy
47 Material • Random Search for Hyper-Parameter Optimization (Bergstra, Bengio)
• Efficient and Robust Automated Machine Learning (Feurer et al) [autosklearn] • http://automl.github.io/auto-sklearn/stable/ • Efficient Hyperparameter Optimization and Infinitely Many Armed Bandits (Lie et. al) [hyperband] https://arxiv.org/abs/1603.06560 • Scalable Bayesian Optimization Using Deep Neural Networks [Snoek et al]
48 @amuellerml @amueller
[email protected]
http://amueller.io Thank you.