Upgrade to PRO for Only $50/Year—Limited-Time Offer! 🔥
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Best Practices for Using Machine Learning in Bu...
Search
szilard
November 04, 2018
0
120
Best Practices for Using Machine Learning in Businesses in 2018 - Keynote at Budapest BI Forum Conference - Budapest, November 2018
szilard
November 04, 2018
Tweet
Share
More Decks by szilard
See All by szilard
Gradient Boosting Machines (GBM): From Zero to Hero (with R and Python Code) - Data Con LA - Oct 2020
szilard
0
200
Make Machine Learning Boring Again: Best Practices for Using Machine Learning in Businesses - Albuquerque Machine Learning Meetup (Online) - Aug 2020
szilard
0
150
Better than Deep Learning: Gradient Boosting Machines (GBM) - eRum conference - invited talk - June 2020
szilard
0
130
Gradient Boosting Machines (GBM): From Zero to Hero (with R and Python Code) - LA Data Science Meetup - February 2020
szilard
0
120
A Random Walk in Data Science and Machine Learning in Practice - CEU, Business Analytics Masters - Budapest, Febr 2020
szilard
0
310
Better than My Meetup/Conference Talks: Going Deeper in Various GBM Topics - GBM Advanced Workshop - Budapest, Nov 2019
szilard
0
90
Gradient Boosting Machines (GBM): From Zero to Hero (with R and Python Code) - Budapest BI Forum, Budapest, Nov 2019
szilard
0
150
Make Machine Learning Boring Again: Best Practices for Using Machine Learning in Businesses - LA Data Science Meetup - Playa Vista, August 2019
szilard
0
140
Better than Deep Learning: Gradient Boosting Machines (GBM) / 2019 edition - Budapest R and Data Science Meetups - Budapest, June 2019
szilard
0
100
Featured
See All Featured
How People are Using Generative and Agentic AI to Supercharge Their Products, Projects, Services and Value Streams Today
helenjbeal
1
82
Fantastic passwords and where to find them - at NoRuKo
philnash
52
3.5k
Public Speaking Without Barfing On Your Shoes - THAT 2023
reverentgeek
1
280
ラッコキーワード サービス紹介資料
rakko
0
1.8M
Connecting the Dots Between Site Speed, User Experience & Your Business [WebExpo 2025]
tammyeverts
10
760
Ethics towards AI in product and experience design
skipperchong
1
140
技術選定の審美眼(2025年版) / Understanding the Spiral of Technologies 2025 edition
twada
PRO
115
94k
Agile that works and the tools we love
rasmusluckow
331
21k
16th Malabo Montpellier Forum Presentation
akademiya2063
PRO
0
31
The Anti-SEO Checklist Checklist. Pubcon Cyber Week
ryanjones
0
28
Believing is Seeing
oripsolob
0
15
The B2B funnel & how to create a winning content strategy
katarinadahlin
PRO
0
190
Transcript
Best Practices for Using Machine Learning in Businesses in 2018
Szilárd Pafka, PhD Chief Scientist, Epoch (USA) Budapest BI Forum Conference November 2018
None
Disclaimer: I am not representing my employer (Epoch) in this
talk I cannot confirm nor deny if Epoch is using any of the methods, tools, results etc. mentioned in this talk
https://twitter.com/baroquepasa/
None
None
None
None
None
y = f (x1, x2, ... , xn) Source: Hastie
etal, ESL 2ed
y = f (x1, x2, ... , xn)
None
Source: Yann LeCun
None
2018?
2018?
#1 Use the Right Algo
Source: Andrew Ng
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
*
#2 Use Open Source
None
None
None
None
None
in 2006 - cost was not a factor! - data.frame
- [800] packages
None
None
None
None
None
#3 Simple > Complex
None
10x
None
None
None
None
None
None
None
None
#4 Incorporate Domain Knowledge Do Feature Engineering (Still) Explore Your
Data Clean Your Data
None
None
None
None
None
None
None
None
None
None
None
#5 Do Proper Validation Avoid: Overfitting, Data Leakage
None
None
None
None
None
None
None
None
None
None
None
None
None
None
#6 Batch or Real-Time Scoring?
None
https://medium.com/@HarlanH/patterns-for-connecting-predictive-models-to-software-products-f9b6e923f02d
https://medium.com/@dvelsner/deploying-a-simple-machine-learning-model-in-a-modern-web-application-flask-angular-docker-a657db075280 your app
None
None
R/Python: - Slow(er) - Encoding of categ. variables
#7 Do Online Validation as Well
None
https://www.oreilly.com/ideas/evaluating-machine-learning-models/page/2/orientation
https://www.oreilly.com/ideas/evaluating-machine-learning-models/page/2/orientation
https://www.oreilly.com/ideas/evaluating-machine-learning-models/page/2/orientation https://www.slideshare.net/FaisalZakariaSiddiqi/netflix-recommendations-feature-engineering-with-time-travel
#8 Monitor Your Models
None
https://www.retentionscience.com/blog/automating-machine-learning-monitoring-rs-labs/
https://www.retentionscience.com/blog/automating-machine-learning-monitoring-rs-labs/
None
20% 80% (my guess)
20% 80% (my guess)
#9 Business Value Seek / Measure / Sell
None
None
None
None
None
#10 Make it Reproducible
None
None
None
None
None
None
None
None
None
Cloud (servers)
ML training: lots of CPU cores lots of RAM limited
time
ML training: lots of CPU cores lots of RAM limited
time ML scoring: separated servers
ML (cloud) services (MLaaS)
None
“people that know what they’re doing just use open source
[...] the same open source tools that the MLaaS services offer” - Bradford Cross
Kaggle
None
already pre-processed data less domain knowledge (or deliberately hidden) AUC
0.0001 increases "relevant" no business metric no actual deployment models too complex no online evaluation no monitoring data leakage
Tuning and Auto ML
Ben Recht, Kevin Jamieson: http://www.argmin.net/2016/06/20/hypertuning/
GPUs
Aggregation 100M rows 1M groups Join 100M rows x 1M
rows time [s] time [s]
Aggregation 100M rows 1M groups Join 100M rows x 1M
rows time [s] time [s] “Motherfucka!”
None
API and GUIs
None
None
AI?
None
None
None
How to Start?
None
None