Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Big data and Machine learning APIs
Search
Sam Bessalah
December 03, 2014
Technology
4
280
Big data and Machine learning APIs
Sam Bessalah
December 03, 2014
Tweet
Share
More Decks by Sam Bessalah
See All by Sam Bessalah
Streaming Platforms
samklr
0
370
Intro to Parquet (June 2015)
samklr
0
310
High Performance RPC with Finagle
samklr
1
220
Dotscale 2015 Lightning - Distributed Systems Research
samklr
1
820
Datageeks_27-05.pdf
samklr
0
72
Scalable Machine Learning
samklr
2
260
mesos.devoxx.2014
samklr
2
290
Algebird : Abstract Algebra for Big Data Analytics.
samklr
9
3k
Algebra for analytics
samklr
1
300
Other Decks in Technology
See All in Technology
ソフトバンク流!プラットフォームエンジニアリング実現へのアプローチ
sbtechnight
1
250
開発チームとQAエンジニアの新しい協業モデル -年末調整開発チームで実践する【QAリード施策】-
kaomi_wombat
0
210
Phase11_戦略的AI経営
overflowinc
0
950
スピンアウト講座06_認証系(API-OAuth-MCP)入門
overflowinc
0
690
A4)シラバスを超えて語る、テストマネジメント
moritamasami
0
110
ADK + Gemini Enterprise で 外部 API 連携エージェント作るなら OAuth の仕組みを理解しておこう
kaz1437
0
130
形式手法特論:SMT ソルバで解く認可ポリシの静的解析 #kernelvm / Kernel VM Study Tsukuba No3
ytaka23
1
730
SaaSに宿る21g
kanyamaguc
2
110
俺の/私の最強アーキテクチャ決定戦開催 ― チームで新しいアーキテクチャに適合していくために / 20260322 Naoki Takahashi
shift_evolve
PRO
1
390
Kiro Powers 入門
k_adachi_01
0
130
_Architecture_Modernization_から学ぶ現状理解から設計への道のり.pdf
satohjohn
2
670
GitHub Copilot CLI で Azure Portal to Bicep
tsubakimoto_s
0
170
Featured
See All Featured
Ruling the World: When Life Gets Gamed
codingconduct
0
180
sira's awesome portfolio website redesign presentation
elsirapls
0
200
Heart Work Chapter 1 - Part 1
lfama
PRO
5
35k
[Rails World 2023 - Day 1 Closing Keynote] - The Magic of Rails
eileencodes
38
2.8k
Discover your Explorer Soul
emna__ayadi
2
1.1k
We Have a Design System, Now What?
morganepeng
55
8k
Fantastic passwords and where to find them - at NoRuKo
philnash
52
3.6k
How to optimise 3,500 product descriptions for ecommerce in one day using ChatGPT
katarinadahlin
PRO
1
3.5k
Why Your Marketing Sucks and What You Can Do About It - Sophie Logan
marketingsoph
0
120
From Legacy to Launchpad: Building Startup-Ready Communities
dugsong
0
180
Java REST API Framework Comparison - PWX 2021
mraible
34
9.2k
Self-Hosted WebAssembly Runtime for Runtime-Neutral Checkpoint/Restore in Edge–Cloud Continuum
chikuwait
0
410
Transcript
Big Data and Machine Learning APIs
Sam Bessalah @samklr Software Engineer, Freelance Data Engineering, Distributed systems,
Machine Learning Paris Data Geek Meetup @DataParis me :
None
None
None
Big Data Legends ….
Big Data Legends … Web logs Sensors Other Data source
.. . . .
A Big Data Legend … Web logs Sensors Other Data
sources .. . . .
A Big Data Legend … Web logs Sensors Other Data
sources .. . . .
A Big Data Legend … Web logs Sensors Other Data
sources .. . . .
A Big Data Legend … Web logs Sensors Other Data
sources .. . . .
A Big Data Legend … Web logs Sensors Other Data
sources .. . . . Data Driven Decisions Smart Applications
BUT ….
- Building big data infrastructures is no easy task. -
Leveraging data for decision making requires a mix of multiples skills : . System Engineering . Distributed computing . Statistics . Machine Learning
Solutions …. - Build Data platforms as a service. -
Build robust and consistent APIs to bring big data to the masses. - Leverages fluent APIs for fast data science
None
Big Data is not just about throwing data to Hadoop.
It’s also about data pipelines
Data Sources
Data Sources
Data Sources - High Throughput distributed mssaging platform - Publish
Subscribe Model - Modelled as a distributed replicated log - Persists messages to disk - Categorizes messages into Topics - Allows message retention for long specified amount of time - Allows stream replay in case of failure
Data Sources Machine Learning High Latency Batch Apps Real Time
Processing
How do you build an API around that?
None
/ingest REST API
/ingest
/ingest /query /trainModel /process
Things to be careful with - Multitenancy (Yarn, Mesos, Docker…)
- Job Scheduling - Security - Serialisation : ProtoBuf, Thrift, Avro - Storage Format : Optimize queries with columnar storage. - Compression : LZO, Snappy
Making sense of data …
None
What is Machine Learning?
http://dilbert.com/strips/comic/2013-02-02
None
https://speakerdeck.com/nivdul/lightning-fast-machine-learning-with-spark-1
Machine Learning workflow
Machine Learning workflow Text, Images, etc
Machine Learning workflow Text, Images, etc Feature Extraction
Machine Learning workflow Text, Images, etc Feature Extraction Learning algorithm
Training
Machine Learning workflow Text, Images, etc Feature Extraction Learning algorithm
Training Predictive Model
Machine Learning workflow Text, Images, etc Feature Extraction Learning algorithm
Training Predictive Model New Data Feature Vector Prediction
Machine Learning workflow Text, Images, etc Feature Extraction Learning algorithm
Predictive Model New Data Prediction
Machine Learning workflow Text, Images, etc Feature Extraction Learning algorithm
Predictive Model New Data Prediction BLACK BOX
Machine Learning Libraries and Frameworks
scikit-learn.org
Text, Images, etc Feature Extraction Predictive Model New Data Prediction
X = vect.fit_transform(input) clf.fit(X,y) X_new = vect.fit_transform(input) y_new= clf.predict(X_new)
http://arxiv.org/abs/1309.0238
From library to web APIs
Machine Learning workflow Text, Images, etc Feature Extraction Learning algorithm
Predictive Model New Data Prediction BLACK BOX
Machine Learning workflow Text, Images, etc Transformed Data Application Prediction
Predictive API
Predictive Web APIs
Some examples
Challenges of Predictive APIs
http://www.r-bloggers.com/data-science-toolbox-survey-results-surprise-r-and-python-win/
Modeling and Prediction are just a small part of the
process
- Data locality and data gravity - Support the full
workflow - Verticalization of platforms - Scalability - Collaboration and interoperability - Black boxing of implementations
Explore machine learning for APIs orchestration. Talk to Ori @OriPekelman
Next Frontier ? Or actual reality ?
None
http://speakerdeck.com/samklr