Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Big data and Machine learning APIs
Search
Sam Bessalah
December 03, 2014
Technology
4
280
Big data and Machine learning APIs
Sam Bessalah
December 03, 2014
Tweet
Share
More Decks by Sam Bessalah
See All by Sam Bessalah
Streaming Platforms
samklr
0
360
Intro to Parquet (June 2015)
samklr
0
310
High Performance RPC with Finagle
samklr
1
210
Dotscale 2015 Lightning - Distributed Systems Research
samklr
1
810
Datageeks_27-05.pdf
samklr
0
66
Scalable Machine Learning
samklr
2
250
mesos.devoxx.2014
samklr
2
280
Algebird : Abstract Algebra for Big Data Analytics.
samklr
9
3k
Algebra for analytics
samklr
1
300
Other Decks in Technology
See All in Technology
StrandsとNeptuneを使ってナレッジグラフを構築する
yakumo
1
150
AIが実装する時代、人間は仕様と検証を設計する
gotalab555
5
930
意外と知ってそうでしらない、Reserved Instances の世界
mappie_kochi
0
110
20260208_第66回 コンピュータビジョン勉強会
keiichiito1978
0
240
xDS を活用したサービスディスカバリーで実現するブランチ別 QA 環境の構築手法
knwoop
1
120
AWS Network Firewall Proxyを触ってみた
nagisa53
1
270
登壇駆動学習のすすめ — CfPのネタの見つけ方と書くときに意識していること
bicstone
3
200
生成AIの研究活用_AILab2025研修
cyberagentdevelopers
PRO
2
1.4k
ECSネイティブのBlue/Green デプロイを攻略しよう ~CodeDeployとの違いから、デプロイフロー実装まで~
ideaws
2
260
30分でわかる「ネットワーク図の描き方入門」/infraengbooks56
corestate55
1
320
Oracle Database@Azure:サービス概要のご紹介
oracle4engineer
PRO
3
620
Bill One急成長の舞台裏 開発組織が直面した失敗と教訓
sansantech
PRO
2
440
Featured
See All Featured
Optimising Largest Contentful Paint
csswizardry
37
3.6k
The Spectacular Lies of Maps
axbom
PRO
1
540
Design and Strategy: How to Deal with People Who Don’t "Get" Design
morganepeng
133
19k
Accessibility Awareness
sabderemane
0
63
[RailsConf 2023] Rails as a piece of cake
palkan
59
6.3k
Being A Developer After 40
akosma
91
590k
職位にかかわらず全員がリーダーシップを発揮するチーム作り / Building a team where everyone can demonstrate leadership regardless of position
madoxten
59
50k
実際に使うSQLの書き方 徹底解説 / pgcon21j-tutorial
soudai
PRO
197
72k
Ecommerce SEO: The Keys for Success Now & Beyond - #SERPConf2024
aleyda
1
1.8k
Applied NLP in the Age of Generative AI
inesmontani
PRO
4
2.1k
Automating Front-end Workflow
addyosmani
1371
200k
How to Build an AI Search Optimization Roadmap - Criteria and Steps to Take #SEOIRL
aleyda
1
1.9k
Transcript
Big Data and Machine Learning APIs
Sam Bessalah @samklr Software Engineer, Freelance Data Engineering, Distributed systems,
Machine Learning Paris Data Geek Meetup @DataParis me :
None
None
None
Big Data Legends ….
Big Data Legends … Web logs Sensors Other Data source
.. . . .
A Big Data Legend … Web logs Sensors Other Data
sources .. . . .
A Big Data Legend … Web logs Sensors Other Data
sources .. . . .
A Big Data Legend … Web logs Sensors Other Data
sources .. . . .
A Big Data Legend … Web logs Sensors Other Data
sources .. . . .
A Big Data Legend … Web logs Sensors Other Data
sources .. . . . Data Driven Decisions Smart Applications
BUT ….
- Building big data infrastructures is no easy task. -
Leveraging data for decision making requires a mix of multiples skills : . System Engineering . Distributed computing . Statistics . Machine Learning
Solutions …. - Build Data platforms as a service. -
Build robust and consistent APIs to bring big data to the masses. - Leverages fluent APIs for fast data science
None
Big Data is not just about throwing data to Hadoop.
It’s also about data pipelines
Data Sources
Data Sources
Data Sources - High Throughput distributed mssaging platform - Publish
Subscribe Model - Modelled as a distributed replicated log - Persists messages to disk - Categorizes messages into Topics - Allows message retention for long specified amount of time - Allows stream replay in case of failure
Data Sources Machine Learning High Latency Batch Apps Real Time
Processing
How do you build an API around that?
None
/ingest REST API
/ingest
/ingest /query /trainModel /process
Things to be careful with - Multitenancy (Yarn, Mesos, Docker…)
- Job Scheduling - Security - Serialisation : ProtoBuf, Thrift, Avro - Storage Format : Optimize queries with columnar storage. - Compression : LZO, Snappy
Making sense of data …
None
What is Machine Learning?
http://dilbert.com/strips/comic/2013-02-02
None
https://speakerdeck.com/nivdul/lightning-fast-machine-learning-with-spark-1
Machine Learning workflow
Machine Learning workflow Text, Images, etc
Machine Learning workflow Text, Images, etc Feature Extraction
Machine Learning workflow Text, Images, etc Feature Extraction Learning algorithm
Training
Machine Learning workflow Text, Images, etc Feature Extraction Learning algorithm
Training Predictive Model
Machine Learning workflow Text, Images, etc Feature Extraction Learning algorithm
Training Predictive Model New Data Feature Vector Prediction
Machine Learning workflow Text, Images, etc Feature Extraction Learning algorithm
Predictive Model New Data Prediction
Machine Learning workflow Text, Images, etc Feature Extraction Learning algorithm
Predictive Model New Data Prediction BLACK BOX
Machine Learning Libraries and Frameworks
scikit-learn.org
Text, Images, etc Feature Extraction Predictive Model New Data Prediction
X = vect.fit_transform(input) clf.fit(X,y) X_new = vect.fit_transform(input) y_new= clf.predict(X_new)
http://arxiv.org/abs/1309.0238
From library to web APIs
Machine Learning workflow Text, Images, etc Feature Extraction Learning algorithm
Predictive Model New Data Prediction BLACK BOX
Machine Learning workflow Text, Images, etc Transformed Data Application Prediction
Predictive API
Predictive Web APIs
Some examples
Challenges of Predictive APIs
http://www.r-bloggers.com/data-science-toolbox-survey-results-surprise-r-and-python-win/
Modeling and Prediction are just a small part of the
process
- Data locality and data gravity - Support the full
workflow - Verticalization of platforms - Scalability - Collaboration and interoperability - Black boxing of implementations
Explore machine learning for APIs orchestration. Talk to Ori @OriPekelman
Next Frontier ? Or actual reality ?
None
http://speakerdeck.com/samklr