Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
"Haute Couture" and "Prêt-à-Porter" Data Science
Search
Christophe Bourguignat
April 15, 2016
Technology
0
430
"Haute Couture" and "Prêt-à-Porter" Data Science
Talk given @ Telecom ParisTech on April 2016
Christophe Bourguignat
April 15, 2016
Tweet
Share
More Decks by Christophe Bourguignat
See All by Christophe Bourguignat
Adding Neurons to your Assistants
kriss
1
340
Software Engineers, the New Data Scientists
kriss
1
140
Machine Learning for Chief Future Officers
kriss
1
130
Whitening The Blackbox : Why And How To Explain Machine Learning Predictions ?
kriss
1
1.1k
Building a Data Science Team
kriss
2
400
Lean Machine Learning
kriss
5
740
Kaggle Criteo Challenge and Online Learning
kriss
1
260
The #FrenchData landscape
kriss
0
460
Other Decks in Technology
See All in Technology
プロダクト開発を加速させるためのQA文化の築き方 / How to build QA culture to accelerate product development
mii3king
1
290
APIとはなにか
mikanichinose
0
110
小学3年生夏休みの自由研究「夏休みに Copilot で遊んでみた」
taichinakamura
0
180
サーバーなしでWordPress運用、できますよ。
sogaoh
PRO
0
120
クレカ・銀行連携機能における “状態”との向き合い方 / SmartBank Engineer LT Event
smartbank
2
100
Microsoft Azure全冠になってみた ~アレを使い倒した者が試験を制す!?~/Obtained all Microsoft Azure certifications Those who use "that" to the full will win the exam! ?
yuj1osm
2
120
新機能VPCリソースエンドポイント機能検証から得られた考察
duelist2020jp
0
230
20241214_WACATE2024冬_テスト設計技法をチョット俯瞰してみよう
kzsuzuki
3
680
Google Cloud で始める Cloud Run 〜AWSとの比較と実例デモで解説〜
risatube
PRO
0
120
あの日俺達が夢見たサーバレスアーキテクチャ/the-serverless-architecture-we-dreamed-of
tomoki10
0
500
エンジニアカフェ忘年会2024「今年やらかしてしまったこと!」
keropiyo
0
100
[Ruby] Develop a Morse Code Learning Gem & Beep from Strings
oguressive
1
190
Featured
See All Featured
CoffeeScript is Beautiful & I Never Want to Write Plain JavaScript Again
sstephenson
159
15k
Side Projects
sachag
452
42k
Save Time (by Creating Custom Rails Generators)
garrettdimon
PRO
29
910
I Don’t Have Time: Getting Over the Fear to Launch Your Podcast
jcasabona
29
2k
Statistics for Hackers
jakevdp
796
220k
XXLCSS - How to scale CSS and keep your sanity
sugarenia
248
1.3M
CSS Pre-Processors: Stylus, Less & Sass
bermonpainter
356
29k
BBQ
matthewcrist
85
9.4k
We Have a Design System, Now What?
morganepeng
51
7.3k
Building an army of robots
kneath
302
44k
Building Adaptive Systems
keathley
38
2.3k
Responsive Adventures: Dirty Tricks From The Dark Corners of Front-End
smashingmag
251
21k
Transcript
Christophe Bourguignat zelros.com /
[email protected]
/ @zelrosHQ
None
Agenda Models interpretation Models production A short history of Kaggle
MODELS INTERPRETATION
WHY ? Models opacity is a major reject cause by
users Unfortunately, predictive models that are the most powerful are usually the least interpretable
None
None
None
FEATURE IMPORTANCE
None
None
None
AEROSOLVE (AirBnb) Prior = general belief, before looking at the
data Inform the model of our prior beliefs by adding them to a text configuration file during training
None
None
None
Scikit Learn
Scikit Learn March 2014
Scikit Learn March 2014 April 2015
Scikit Learn March 2014 April 2015
Scikit Learn March 2014 April 2015
Scikit Learn March 2014 April 2015
Scikit Learn https://github.com/andosa/treeinterpreter/blob/master/treeinterpreter/treeinterpreter.py
EXEMPLE ON BOSTON DATASET
None
http://blog.datadive.net/prediction-intervals-for-random-forests/ Prediction Intervals for Random Forests
None
None
PRODUCTION
None
None
TRADITIONAL B.I. DEPARTMENT DATA ANALYSTS ETL ENGINEER DBAs
“INFINITE LOOP OF SADNESS” DATA SCIENTISTS IT / DATA ENGINEERS
SOFTWARE ENGINEERS BUSINESS http://multithreaded.stitchfix.com/blog/2016/03/16/engineers-shouldnt-write-etl/
CODE http://treycausey.com/software_dev_skills.html
COMPLEXITY AND TECHNICAL DEBT Underutilized features Undeclared consumers Pipeline Jungles
- preparing data in a ML-friendly format http://static.googleusercontent.com/media/research.google.com/fr//pubs/archive/43146.pdf
PRODUCTION FAILS Unseen category Unreproductible feat eng workflow (PMML) Leakage
in DataBase fields (churn) Monitoring
A BRIEF HISTORY OF KAGGLE
June 2013 Sept 2013 Nov 2014 Apr 2015 Mar 2016
None
None
None
None
None
None
None
Refinements : - hashing function - adaptive learning rate (different
flavours) - Vowpal Wabbit - Dropout - PyPy
None
None
None
None
None
None
None
None
QUESTIONS ? zelros.com /
[email protected]
/ @zelrosHQ