Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Intro to scikit-learn
Search
Sponsored
·
SiteGround - Reliable hosting with speed, security, and support you can count on.
→
Olivier Grisel
August 27, 2017
Technology
750
5
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
Intro to scikit-learn
EuroScipy 2017
Olivier Grisel
August 27, 2017
More Decks by Olivier Grisel
See All by Olivier Grisel
An Intro to Deep Learning
ogrisel
1
340
Predictive Modeling and Deep Learning
ogrisel
2
390
Intro to scikit-learn and what's new in 0.17
ogrisel
1
410
Big Data, Predictive Modeling and tools
ogrisel
2
340
Recent Developments in Deep Learning
ogrisel
3
720
Documentation
ogrisel
2
270
How to use scikit-learn to solve machine learning problems
ogrisel
0
1.1k
Build and test wheel packages on Linux, OSX and Windows
ogrisel
2
370
Big Data and Predictive Modeling
ogrisel
3
260
Other Decks in Technology
See All in Technology
エンジニアリング戦略の作り方 / Crafting Engineering Strategy
iwashi86
21
7k
非エンジニアがClaudeと挑んだ「1ヶ月間プロダクト30本ノック」
askokc
0
600
自宅LLMの話
jacopen
1
600
Snowflakeと仲良くなる第一歩
coco_se
4
490
小さくはじめるSLI/SLO ~育てながら組織に定着させる実践知~ / Starting Small with SLI/SLOs: Building Adoption Through Continuous Growth
nari_ex
7
2k
AmazonRoute 53ではじめてのドメイン取得!HTTPS化までの道のりを整理してみた
usanchuu
3
140
2026TECHFRESH畢業分享會 - 葬送的通靈師:化系統與用戶雜訊成行動訊號
line_developers_tw
PRO
0
1.1k
入門!AWS Blocks
ysuzuki
1
140
2026 TECHFRESH 畢業分享會 - AI-Native 重塑軟體工程與虛擬講師
line_developers_tw
PRO
0
1.1k
失敗を資産に変えるClaude Code
shinyasaita
0
680
Bedrock AgentCore RuntimeでAuth0 Changelog調査AIをアップグレードした話
t5u8a5a
1
160
日本 Fintech 未来予測レポート 2027〜2028年(オリジナル版)
8maki
0
2.3k
Featured
See All Featured
End of SEO as We Know It (SMX Advanced Version)
ipullrank
3
4.2k
Why Your Marketing Sucks and What You Can Do About It - Sophie Logan
marketingsoph
0
170
個人開発の失敗を避けるイケてる考え方 / tips for indie hackers
panda_program
123
22k
Introduction to Domain-Driven Design and Collaborative software design
baasie
1
840
Mozcon NYC 2025: Stop Losing SEO Traffic
samtorres
1
250
From π to Pie charts
rasagy
0
210
GraphQLの誤解/rethinking-graphql
sonatard
75
12k
GitHub's CSS Performance
jonrohan
1033
470k
16th Malabo Montpellier Forum Presentation
akademiya2063
PRO
0
150
Utilizing Notion as your number one productivity tool
mfonobong
4
320
Joys of Absence: A Defence of Solitary Play
codingconduct
1
390
[SF Ruby Conf 2025] Rails X
palkan
2
1.1k
Transcript
Intro to scikit-learn EuroScipy 2017 - Olivier Grisel - Tim
Head
Outline • Machine Learning refresher • scikit-learn • Hands on:
interactive predictive modeling on Census Data with Jupyter notebook / pandas / scikit-learn • Hands on: parameter tuning with scikit-optimize
Predictive modeling ~= machine learning • Make predictions of outcome
on new data • Extract the structure of historical data • Statistical tools to summarize the training data into a executable predictive model • Alternative to hard-coded rules written by experts
type (category) # rooms (int) surface (float m2) public trans
(boolean) Apartment 3 50 TRUE House 5 254 FALSE Duplex 4 68 TRUE Apartment 2 32 TRUE
type (category) # rooms (int) surface (float m2) public trans
(boolean) Apartment 3 50 TRUE House 5 254 FALSE Duplex 4 68 TRUE Apartment 2 32 TRUE sold (float k€) 450 430 712 234
type (category) # rooms (int) surface (float m2) public trans
(boolean) Apartment 3 50 TRUE House 5 254 FALSE Duplex 4 68 TRUE Apartment 2 32 TRUE sold (float k€) 450 430 712 234 features target samples (train)
type (category) # rooms (int) surface (float m2) public trans
(boolean) Apartment 3 50 TRUE House 5 254 FALSE Duplex 4 68 TRUE Apartment 2 32 TRUE sold (float k€) 450 430 712 234 features target samples (train) Apartment 2 33 TRUE House 4 210 TRUE samples (test) ? ?
Training text docs images sounds transactions Labels Machine Learning Algorithm
Model Predictive Modeling Data Flow Feature vectors
New text doc image sound transaction Model Expected Label Predictive
Modeling Data Flow Feature vector Training text docs images sounds transactions Labels Machine Learning Algorithm Feature vectors
Inventory forecasting & trends detection Predictive modeling in the wild
Personalized radios Fraud detection Virality and readers engagement Predictive maintenance Personality matching
• Library of Machine Learning algorithms • Focus on established
methods (e.g. ESL-II) • Open Source (BSD) • Simple fit / predict / transform API • Python / NumPy / SciPy / Cython • Model Assessment, Selection & Ensembles
Train data Train labels Model Fitted model Test data Predicted
labels Test labels Evaluation model = LogisticRegression(C=1) model.fit(X_train, y_train)
Train data Train labels Model Fitted model Test data Predicted
labels Test labels Evaluation model = LogisticRegression(C=1) model.fit(X_train, y_train) y_pred = model.predict(X_test)
Train data Train labels Model Fitted model Test data Predicted
labels Test labels Evaluation model = LogisticRegression(C=1) model.fit(X_train, y_train) y_pred = model.predict(X_test) accuracy_score(y_test, y_pred)
Support Vector Machine from sklearn.svm import SVC model = SVC(kernel="rbf",
C=1.0, gamma=1e-4) model.fit(X_train, y_train) y_predicted = model.predict(X_test) from sklearn.metrics import f1_score f1_score(y_test, y_predicted)
Linear Classifier from sklearn.linear_model import SGDClassifier model = SGDClassifier(alpha=1e-4, penalty="elasticnet")
model.fit(X_train, y_train) y_predicted = model.predict(X_test) from sklearn.metrics import f1_score f1_score(y_test, y_predicted)
Random Forests from sklearn.ensemble import RandomForestClassifier model = RandomForestClassifier(n_estimators=200) model.fit(X_train,
y_train) y_predicted = model.predict(X_test) from sklearn.metrics import f1_score f1_score(y_test, y_predicted)
None
None
Workshop time! https://github.com/ogrisel/euroscipy_2017_sklearn
Combining Models from sklearn.preprocessing import StandardScaler from sklearn.decomposition import RandomizedPCA
from sklearn.svm import SVC scaler = StandardScaler() X_train_scaled = scaler.fit_transform(X_train) pca = RandomizedPCA(n_components=10) X_train_pca = pca.fit_transform(X_train_scaled) svm = SVC(C=0.1, gamma=1e-3) svm.fit(X_train_pca, y_train)
Pipeline from sklearn.preprocessing import StandardScaler from sklearn.decomposition import RandomizedPCA from
sklearn.svm import SVC from sklearn.pipeline import make_pipeline pipeline = make_pipeline( StandardScaler(), RandomizedPCA(n_components=10), SVC(C=0.1, gamma=1e-3), ) pipeline.fit(X_train, y_train)
Scoring manually stacked models scaler = StandardScaler() X_train_scaled = scaler.fit_transform(X_train)
pca = RandomizedPCA(n_components=10) X_train_pca = pca.fit_transform(X_train_scaled) svm = SVC(C=0.1, gamma=1e-3) svm.fit(X_train_pca, y_train) X_test_scaled = scaler.transform(X_test) X_test_pca = pca.transform(X_test_scaled) y_pred = svm.predict(X_test_pca) accuracy_score(y_test, y_pred)
Scoring a pipeline pipeline = make_pipeline( RandomizedPCA(n_components=10), SVC(C=0.1, gamma=1e-3), )
pipeline.fit(X_train, y_train) y_pred = pipeline.predict(X_test) accuracy_score(y_test, y_pred)
Parameter search import numpy as np from sklearn.grid_search import RandomizedSearchCV
params = { 'randomizedpca__n_components': [5, 10, 20], 'svc__C': np.logspace(-3, 3, 7), 'svc__gamma': np.logspace(-6, 0, 7), } search = RandomizedSearchCV(pipeline, params, n_iter=30, cv=5) search.fit(X_train, y_train) # search.best_params_, search.grid_scores_
Thank you! • http://scikit-learn.org • https://github.com/scikit-learn/scikit-learn @ogrisel