Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Big data and Machine learning APIs
Search
Sam Bessalah
December 03, 2014
Technology
4
260
Big data and Machine learning APIs
Sam Bessalah
December 03, 2014
Tweet
Share
More Decks by Sam Bessalah
See All by Sam Bessalah
Streaming Platforms
samklr
0
350
Intro to Parquet (June 2015)
samklr
0
290
High Performance RPC with Finagle
samklr
1
180
Dotscale 2015 Lightning - Distributed Systems Research
samklr
1
790
Datageeks_27-05.pdf
samklr
0
53
Scalable Machine Learning
samklr
2
230
mesos.devoxx.2014
samklr
2
260
Algebird : Abstract Algebra for Big Data Analytics.
samklr
9
2.9k
Algebra for analytics
samklr
1
290
Other Decks in Technology
See All in Technology
CDK CLIで使ってたあの機能、CDK Toolkit Libraryではどうやるの?
smt7174
4
190
LLM時代のパフォーマンスチューニング:MongoDB運用で試したコンテキスト活用の工夫
ishikawa_pro
0
160
COVESA VSSによる車両データモデルの標準化とAWS IoT FleetWiseの活用
osawa
1
330
Generative AI Japan 第一回生成AI実践研究会「AI駆動開発の現在地──ブレイクスルーの鍵を握るのはデータ領域」
shisyu_gaku
0
300
Terraformで構築する セルフサービス型データプラットフォーム / terraform-self-service-data-platform
pei0804
1
190
機械学習を扱うプラットフォーム開発と運用事例
lycorptech_jp
PRO
0
530
LLMを搭載したプロダクトの品質保証の模索と学び
qa
0
1.1k
roppongirb_20250911
igaiga
1
240
「どこから読む?」コードとカルチャーに最速で馴染むための実践ガイド
zozotech
PRO
0
520
Autonomous Database - Dedicated 技術詳細 / adb-d_technical_detail_jp
oracle4engineer
PRO
4
10k
AIエージェントで90秒の広告動画を制作!台本・音声・映像・編集をつなぐAWS最新アーキテクチャの実践
nasuvitz
0
240
現場で効くClaude Code ─ 最新動向と企業導入
takaakikakei
1
250
Featured
See All Featured
For a Future-Friendly Web
brad_frost
180
9.9k
Speed Design
sergeychernyshev
32
1.1k
Become a Pro
speakerdeck
PRO
29
5.5k
A better future with KSS
kneath
239
17k
Let's Do A Bunch of Simple Stuff to Make Websites Faster
chriscoyier
507
140k
Designing Experiences People Love
moore
142
24k
Designing for Performance
lara
610
69k
How to Ace a Technical Interview
jacobian
279
23k
Raft: Consensus for Rubyists
vanstee
140
7.1k
The Straight Up "How To Draw Better" Workshop
denniskardys
236
140k
Building a Scalable Design System with Sketch
lauravandoore
462
33k
Agile that works and the tools we love
rasmusluckow
330
21k
Transcript
Big Data and Machine Learning APIs
Sam Bessalah @samklr Software Engineer, Freelance Data Engineering, Distributed systems,
Machine Learning Paris Data Geek Meetup @DataParis me :
None
None
None
Big Data Legends ….
Big Data Legends … Web logs Sensors Other Data source
.. . . .
A Big Data Legend … Web logs Sensors Other Data
sources .. . . .
A Big Data Legend … Web logs Sensors Other Data
sources .. . . .
A Big Data Legend … Web logs Sensors Other Data
sources .. . . .
A Big Data Legend … Web logs Sensors Other Data
sources .. . . .
A Big Data Legend … Web logs Sensors Other Data
sources .. . . . Data Driven Decisions Smart Applications
BUT ….
- Building big data infrastructures is no easy task. -
Leveraging data for decision making requires a mix of multiples skills : . System Engineering . Distributed computing . Statistics . Machine Learning
Solutions …. - Build Data platforms as a service. -
Build robust and consistent APIs to bring big data to the masses. - Leverages fluent APIs for fast data science
None
Big Data is not just about throwing data to Hadoop.
It’s also about data pipelines
Data Sources
Data Sources
Data Sources - High Throughput distributed mssaging platform - Publish
Subscribe Model - Modelled as a distributed replicated log - Persists messages to disk - Categorizes messages into Topics - Allows message retention for long specified amount of time - Allows stream replay in case of failure
Data Sources Machine Learning High Latency Batch Apps Real Time
Processing
How do you build an API around that?
None
/ingest REST API
/ingest
/ingest /query /trainModel /process
Things to be careful with - Multitenancy (Yarn, Mesos, Docker…)
- Job Scheduling - Security - Serialisation : ProtoBuf, Thrift, Avro - Storage Format : Optimize queries with columnar storage. - Compression : LZO, Snappy
Making sense of data …
None
What is Machine Learning?
http://dilbert.com/strips/comic/2013-02-02
None
https://speakerdeck.com/nivdul/lightning-fast-machine-learning-with-spark-1
Machine Learning workflow
Machine Learning workflow Text, Images, etc
Machine Learning workflow Text, Images, etc Feature Extraction
Machine Learning workflow Text, Images, etc Feature Extraction Learning algorithm
Training
Machine Learning workflow Text, Images, etc Feature Extraction Learning algorithm
Training Predictive Model
Machine Learning workflow Text, Images, etc Feature Extraction Learning algorithm
Training Predictive Model New Data Feature Vector Prediction
Machine Learning workflow Text, Images, etc Feature Extraction Learning algorithm
Predictive Model New Data Prediction
Machine Learning workflow Text, Images, etc Feature Extraction Learning algorithm
Predictive Model New Data Prediction BLACK BOX
Machine Learning Libraries and Frameworks
scikit-learn.org
Text, Images, etc Feature Extraction Predictive Model New Data Prediction
X = vect.fit_transform(input) clf.fit(X,y) X_new = vect.fit_transform(input) y_new= clf.predict(X_new)
http://arxiv.org/abs/1309.0238
From library to web APIs
Machine Learning workflow Text, Images, etc Feature Extraction Learning algorithm
Predictive Model New Data Prediction BLACK BOX
Machine Learning workflow Text, Images, etc Transformed Data Application Prediction
Predictive API
Predictive Web APIs
Some examples
Challenges of Predictive APIs
http://www.r-bloggers.com/data-science-toolbox-survey-results-surprise-r-and-python-win/
Modeling and Prediction are just a small part of the
process
- Data locality and data gravity - Support the full
workflow - Verticalization of platforms - Scalability - Collaboration and interoperability - Black boxing of implementations
Explore machine learning for APIs orchestration. Talk to Ori @OriPekelman
Next Frontier ? Or actual reality ?
None
http://speakerdeck.com/samklr