Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Big data and Machine learning APIs
Search
Sponsored
·
Ship Features Fearlessly
Turn features on and off without deploys. Used by thousands of Ruby developers.
→
Sam Bessalah
December 03, 2014
Technology
4
280
Big data and Machine learning APIs
Sam Bessalah
December 03, 2014
Tweet
Share
More Decks by Sam Bessalah
See All by Sam Bessalah
Streaming Platforms
samklr
0
360
Intro to Parquet (June 2015)
samklr
0
300
High Performance RPC with Finagle
samklr
1
210
Dotscale 2015 Lightning - Distributed Systems Research
samklr
1
810
Datageeks_27-05.pdf
samklr
0
63
Scalable Machine Learning
samklr
2
250
mesos.devoxx.2014
samklr
2
280
Algebird : Abstract Algebra for Big Data Analytics.
samklr
9
2.9k
Algebra for analytics
samklr
1
300
Other Decks in Technology
See All in Technology
AWS監視を「もっと楽する」ために
uechishingo
0
450
ドメイン駆動セキュリティへの道しるべ
pandayumi
0
180
最速で価値を出すための プロダクトエンジニアのツッコミ術
kaacun
1
300
OCI技術資料 : OS管理ハブ 概要
ocise
2
4.3k
Azure SRE Agent x PagerDutyによる近未来インシデント対応への期待 / The Future of Incident Response: Azure SRE Agent x PagerDuty
aeonpeople
0
220
習慣とAIと環境 — 技術探求を続ける3つの鍵
azukiazusa1
3
800
全員が「作り手」になる。職能の壁を溶かすプロトタイプ開発。
hokuo
1
600
Vivre en Bitcoin : le tutoriel que votre banquier ne veut pas que vous voyiez
rlifchitz
0
370
AI開発をスケールさせるデータ中心の仕組みづくり
kzykmyzw
0
170
3リポジトリーを2ヶ月でモノレポ化した話 / How I turned 3 repositories into a monorepo in 2 months
kubode
0
120
SMTP完全に理解した ✉️
yamatai1212
0
110
Regional_NAT_Gatewayについて_basicとの違い_試した内容スケールアウト_インについて_IPv6_dual_networkでの使い分けなど.pdf
cloudevcode
1
170
Featured
See All Featured
Large-scale JavaScript Application Architecture
addyosmani
515
110k
End of SEO as We Know It (SMX Advanced Version)
ipullrank
3
3.9k
Jess Joyce - The Pitfalls of Following Frameworks
techseoconnect
PRO
1
60
Building Applications with DynamoDB
mza
96
6.9k
The Curious Case for Waylosing
cassininazir
0
230
Fireside Chat
paigeccino
41
3.8k
Refactoring Trust on Your Teams (GOTO; Chicago 2020)
rmw
35
3.3k
Fashionably flexible responsive web design (full day workshop)
malarkey
408
66k
It's Worth the Effort
3n
188
29k
Designing for Performance
lara
610
70k
Dealing with People You Can't Stand - Big Design 2015
cassininazir
367
27k
Tips & Tricks on How to Get Your First Job In Tech
honzajavorek
0
420
Transcript
Big Data and Machine Learning APIs
Sam Bessalah @samklr Software Engineer, Freelance Data Engineering, Distributed systems,
Machine Learning Paris Data Geek Meetup @DataParis me :
None
None
None
Big Data Legends ….
Big Data Legends … Web logs Sensors Other Data source
.. . . .
A Big Data Legend … Web logs Sensors Other Data
sources .. . . .
A Big Data Legend … Web logs Sensors Other Data
sources .. . . .
A Big Data Legend … Web logs Sensors Other Data
sources .. . . .
A Big Data Legend … Web logs Sensors Other Data
sources .. . . .
A Big Data Legend … Web logs Sensors Other Data
sources .. . . . Data Driven Decisions Smart Applications
BUT ….
- Building big data infrastructures is no easy task. -
Leveraging data for decision making requires a mix of multiples skills : . System Engineering . Distributed computing . Statistics . Machine Learning
Solutions …. - Build Data platforms as a service. -
Build robust and consistent APIs to bring big data to the masses. - Leverages fluent APIs for fast data science
None
Big Data is not just about throwing data to Hadoop.
It’s also about data pipelines
Data Sources
Data Sources
Data Sources - High Throughput distributed mssaging platform - Publish
Subscribe Model - Modelled as a distributed replicated log - Persists messages to disk - Categorizes messages into Topics - Allows message retention for long specified amount of time - Allows stream replay in case of failure
Data Sources Machine Learning High Latency Batch Apps Real Time
Processing
How do you build an API around that?
None
/ingest REST API
/ingest
/ingest /query /trainModel /process
Things to be careful with - Multitenancy (Yarn, Mesos, Docker…)
- Job Scheduling - Security - Serialisation : ProtoBuf, Thrift, Avro - Storage Format : Optimize queries with columnar storage. - Compression : LZO, Snappy
Making sense of data …
None
What is Machine Learning?
http://dilbert.com/strips/comic/2013-02-02
None
https://speakerdeck.com/nivdul/lightning-fast-machine-learning-with-spark-1
Machine Learning workflow
Machine Learning workflow Text, Images, etc
Machine Learning workflow Text, Images, etc Feature Extraction
Machine Learning workflow Text, Images, etc Feature Extraction Learning algorithm
Training
Machine Learning workflow Text, Images, etc Feature Extraction Learning algorithm
Training Predictive Model
Machine Learning workflow Text, Images, etc Feature Extraction Learning algorithm
Training Predictive Model New Data Feature Vector Prediction
Machine Learning workflow Text, Images, etc Feature Extraction Learning algorithm
Predictive Model New Data Prediction
Machine Learning workflow Text, Images, etc Feature Extraction Learning algorithm
Predictive Model New Data Prediction BLACK BOX
Machine Learning Libraries and Frameworks
scikit-learn.org
Text, Images, etc Feature Extraction Predictive Model New Data Prediction
X = vect.fit_transform(input) clf.fit(X,y) X_new = vect.fit_transform(input) y_new= clf.predict(X_new)
http://arxiv.org/abs/1309.0238
From library to web APIs
Machine Learning workflow Text, Images, etc Feature Extraction Learning algorithm
Predictive Model New Data Prediction BLACK BOX
Machine Learning workflow Text, Images, etc Transformed Data Application Prediction
Predictive API
Predictive Web APIs
Some examples
Challenges of Predictive APIs
http://www.r-bloggers.com/data-science-toolbox-survey-results-surprise-r-and-python-win/
Modeling and Prediction are just a small part of the
process
- Data locality and data gravity - Support the full
workflow - Verticalization of platforms - Scalability - Collaboration and interoperability - Black boxing of implementations
Explore machine learning for APIs orchestration. Talk to Ori @OriPekelman
Next Frontier ? Or actual reality ?
None
http://speakerdeck.com/samklr