Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
"Haute Couture" and "Prêt-à-Porter" Data Science
Search
Sponsored
·
Ship Features Fearlessly
Turn features on and off without deploys. Used by thousands of Ruby developers.
→
Christophe Bourguignat
April 15, 2016
Technology
490
0
Share
"Haute Couture" and "Prêt-à-Porter" Data Science
Talk given @ Telecom ParisTech on April 2016
Christophe Bourguignat
April 15, 2016
More Decks by Christophe Bourguignat
See All by Christophe Bourguignat
Adding Neurons to your Assistants
kriss
1
370
Software Engineers, the New Data Scientists
kriss
1
150
Machine Learning for Chief Future Officers
kriss
1
150
Whitening The Blackbox : Why And How To Explain Machine Learning Predictions ?
kriss
1
1.2k
Building a Data Science Team
kriss
2
420
Lean Machine Learning
kriss
5
800
Kaggle Criteo Challenge and Online Learning
kriss
1
300
The #FrenchData landscape
kriss
0
500
Other Decks in Technology
See All in Technology
20260515 ⾃分のアカウントとプライバシーを守る認証と認可の話〜利⽤者向け〜
oidfj
0
740
How to learn AWS Well-Architected with AWS BuilderCards: Security Edition
coosuke
PRO
0
150
なぜ、IAMロールのプリンシパルに*による部分マッチングが使えないのか? / 20260518-ssmjp-iam-role-principal
opelab
1
130
AI-Assisted Contributions and Maintainer Load - PyCon US 2026
pauloxnet
1
180
Oracle Base Database Service 技術詳細
oracle4engineer
PRO
15
100k
RedmineをAIで効率的に使う検証
yoshiokacb
0
150
ESP32 IoTを動かしながらメモリ使用量を観測してみた話
zozotech
PRO
0
150
AI対話分析の夢と、汚いデータの現実 Looker / Dataplex / Dataform で実現する品質ファーストな基盤設計
waiwai2111
0
650
JaSSTに関わることで変わった人生観 #jasstnano
makky_tyuyan
0
140
続 運用改善、不都合な真実 〜 物理制約のない運用改善はほとんど無価値 / 20260518-ssmjp-kaizen-no-value-without-physical-constraints
opelab
2
250
Sansan Engineering Unit 紹介資料
sansan33
PRO
1
4.5k
Gaussian Splattingの表現力を拡張する — 高周波再構成とインタラクションへのアプローチ —
gpuunite_official
0
190
Featured
See All Featured
The untapped power of vector embeddings
frankvandijk
2
1.7k
Design of three-dimensional binary manipulators for pick-and-place task avoiding obstacles (IECON2024)
konakalab
0
420
How to Align SEO within the Product Triangle To Get Buy-In & Support - #RIMC
aleyda
2
1.5k
Navigating Algorithm Shifts & AI Overviews - #SMXNext
aleyda
1
1.2k
The Pragmatic Product Professional
lauravandoore
37
7.3k
The Limits of Empathy - UXLibs8
cassininazir
1
330
技術選定の審美眼(2025年版) / Understanding the Spiral of Technologies 2025 edition
twada
PRO
118
110k
The Web Performance Landscape in 2024 [PerfNow 2024]
tammyeverts
12
1.1k
Into the Great Unknown - MozCon
thekraken
41
2.5k
Paper Plane
katiecoart
PRO
1
50k
JavaScript: Past, Present, and Future - NDC Porto 2020
reverentgeek
52
5.9k
SEO in 2025: How to Prepare for the Future of Search
ipullrank
3
3.4k
Transcript
Christophe Bourguignat zelros.com /
[email protected]
/ @zelrosHQ
None
Agenda Models interpretation Models production A short history of Kaggle
MODELS INTERPRETATION
WHY ? Models opacity is a major reject cause by
users Unfortunately, predictive models that are the most powerful are usually the least interpretable
None
None
None
FEATURE IMPORTANCE
None
None
None
AEROSOLVE (AirBnb) Prior = general belief, before looking at the
data Inform the model of our prior beliefs by adding them to a text configuration file during training
None
None
None
Scikit Learn
Scikit Learn March 2014
Scikit Learn March 2014 April 2015
Scikit Learn March 2014 April 2015
Scikit Learn March 2014 April 2015
Scikit Learn March 2014 April 2015
Scikit Learn https://github.com/andosa/treeinterpreter/blob/master/treeinterpreter/treeinterpreter.py
EXEMPLE ON BOSTON DATASET
None
http://blog.datadive.net/prediction-intervals-for-random-forests/ Prediction Intervals for Random Forests
None
None
PRODUCTION
None
None
TRADITIONAL B.I. DEPARTMENT DATA ANALYSTS ETL ENGINEER DBAs
“INFINITE LOOP OF SADNESS” DATA SCIENTISTS IT / DATA ENGINEERS
SOFTWARE ENGINEERS BUSINESS http://multithreaded.stitchfix.com/blog/2016/03/16/engineers-shouldnt-write-etl/
CODE http://treycausey.com/software_dev_skills.html
COMPLEXITY AND TECHNICAL DEBT Underutilized features Undeclared consumers Pipeline Jungles
- preparing data in a ML-friendly format http://static.googleusercontent.com/media/research.google.com/fr//pubs/archive/43146.pdf
PRODUCTION FAILS Unseen category Unreproductible feat eng workflow (PMML) Leakage
in DataBase fields (churn) Monitoring
A BRIEF HISTORY OF KAGGLE
June 2013 Sept 2013 Nov 2014 Apr 2015 Mar 2016
None
None
None
None
None
None
None
Refinements : - hashing function - adaptive learning rate (different
flavours) - Vowpal Wabbit - Dropout - PyPy
None
None
None
None
None
None
None
None
QUESTIONS ? zelros.com /
[email protected]
/ @zelrosHQ