Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Big data and Machine learning APIs
Search
Sam Bessalah
December 03, 2014
Technology
4
250
Big data and Machine learning APIs
Sam Bessalah
December 03, 2014
Tweet
Share
More Decks by Sam Bessalah
See All by Sam Bessalah
Streaming Platforms
samklr
0
330
Intro to Parquet (June 2015)
samklr
0
260
High Performance RPC with Finagle
samklr
1
160
Dotscale 2015 Lightning - Distributed Systems Research
samklr
1
780
Datageeks_27-05.pdf
samklr
0
47
Scalable Machine Learning
samklr
2
210
mesos.devoxx.2014
samklr
2
240
Algebird : Abstract Algebra for Big Data Analytics.
samklr
9
2.8k
Algebra for analytics
samklr
1
270
Other Decks in Technology
See All in Technology
株式会社ログラス − エンジニア向け会社説明資料 / Loglass Comapany Deck for Engineer
loglass2019
3
32k
日本版とグローバル版のモバイルアプリ統合の開発の裏側と今後の展望
miichan
1
130
Qiita埋め込み用スライド
naoki_0531
0
5.1k
終了の危機にあった15年続くWebサービスを全力で存続させる - phpcon2024
yositosi
19
17k
Storage Browser for Amazon S3
miu_crescent
1
240
UI State設計とテスト方針
rmakiyama
2
650
Oracle Cloudの生成AIサービスって実際どこまで使えるの? エンジニア目線で試してみた
minorun365
PRO
4
290
DevFest 2024 Incheon / Songdo - Compose UI 조합 심화
wisemuji
0
120
サイボウズフロントエンドエキスパートチームについて / FrontendExpert Team
cybozuinsideout
PRO
5
38k
開発生産性向上! 育成を「改善」と捉えるエンジニア育成戦略
shoota
2
400
Amazon Kendra GenAI Index 登場でどう変わる? 評価から学ぶ最適なRAG構成
naoki_0531
0
120
非機能品質を作り込むための実践アーキテクチャ
knih
5
1.5k
Featured
See All Featured
GraphQLとの向き合い方2022年版
quramy
44
13k
Writing Fast Ruby
sferik
628
61k
How to train your dragon (web standard)
notwaldorf
88
5.7k
How STYLIGHT went responsive
nonsquared
95
5.2k
Building an army of robots
kneath
302
44k
Practical Orchestrator
shlominoach
186
10k
Scaling GitHub
holman
458
140k
The Cost Of JavaScript in 2023
addyosmani
45
7k
Navigating Team Friction
lara
183
15k
Fireside Chat
paigeccino
34
3.1k
The MySQL Ecosystem @ GitHub 2015
samlambert
250
12k
The World Runs on Bad Software
bkeepers
PRO
65
11k
Transcript
Big Data and Machine Learning APIs
Sam Bessalah @samklr Software Engineer, Freelance Data Engineering, Distributed systems,
Machine Learning Paris Data Geek Meetup @DataParis me :
None
None
None
Big Data Legends ….
Big Data Legends … Web logs Sensors Other Data source
.. . . .
A Big Data Legend … Web logs Sensors Other Data
sources .. . . .
A Big Data Legend … Web logs Sensors Other Data
sources .. . . .
A Big Data Legend … Web logs Sensors Other Data
sources .. . . .
A Big Data Legend … Web logs Sensors Other Data
sources .. . . .
A Big Data Legend … Web logs Sensors Other Data
sources .. . . . Data Driven Decisions Smart Applications
BUT ….
- Building big data infrastructures is no easy task. -
Leveraging data for decision making requires a mix of multiples skills : . System Engineering . Distributed computing . Statistics . Machine Learning
Solutions …. - Build Data platforms as a service. -
Build robust and consistent APIs to bring big data to the masses. - Leverages fluent APIs for fast data science
None
Big Data is not just about throwing data to Hadoop.
It’s also about data pipelines
Data Sources
Data Sources
Data Sources - High Throughput distributed mssaging platform - Publish
Subscribe Model - Modelled as a distributed replicated log - Persists messages to disk - Categorizes messages into Topics - Allows message retention for long specified amount of time - Allows stream replay in case of failure
Data Sources Machine Learning High Latency Batch Apps Real Time
Processing
How do you build an API around that?
None
/ingest REST API
/ingest
/ingest /query /trainModel /process
Things to be careful with - Multitenancy (Yarn, Mesos, Docker…)
- Job Scheduling - Security - Serialisation : ProtoBuf, Thrift, Avro - Storage Format : Optimize queries with columnar storage. - Compression : LZO, Snappy
Making sense of data …
None
What is Machine Learning?
http://dilbert.com/strips/comic/2013-02-02
None
https://speakerdeck.com/nivdul/lightning-fast-machine-learning-with-spark-1
Machine Learning workflow
Machine Learning workflow Text, Images, etc
Machine Learning workflow Text, Images, etc Feature Extraction
Machine Learning workflow Text, Images, etc Feature Extraction Learning algorithm
Training
Machine Learning workflow Text, Images, etc Feature Extraction Learning algorithm
Training Predictive Model
Machine Learning workflow Text, Images, etc Feature Extraction Learning algorithm
Training Predictive Model New Data Feature Vector Prediction
Machine Learning workflow Text, Images, etc Feature Extraction Learning algorithm
Predictive Model New Data Prediction
Machine Learning workflow Text, Images, etc Feature Extraction Learning algorithm
Predictive Model New Data Prediction BLACK BOX
Machine Learning Libraries and Frameworks
scikit-learn.org
Text, Images, etc Feature Extraction Predictive Model New Data Prediction
X = vect.fit_transform(input) clf.fit(X,y) X_new = vect.fit_transform(input) y_new= clf.predict(X_new)
http://arxiv.org/abs/1309.0238
From library to web APIs
Machine Learning workflow Text, Images, etc Feature Extraction Learning algorithm
Predictive Model New Data Prediction BLACK BOX
Machine Learning workflow Text, Images, etc Transformed Data Application Prediction
Predictive API
Predictive Web APIs
Some examples
Challenges of Predictive APIs
http://www.r-bloggers.com/data-science-toolbox-survey-results-surprise-r-and-python-win/
Modeling and Prediction are just a small part of the
process
- Data locality and data gravity - Support the full
workflow - Verticalization of platforms - Scalability - Collaboration and interoperability - Black boxing of implementations
Explore machine learning for APIs orchestration. Talk to Ori @OriPekelman
Next Frontier ? Or actual reality ?
None
http://speakerdeck.com/samklr