Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Big data and Machine learning APIs
Search
Sam Bessalah
December 03, 2014
Technology
4
250
Big data and Machine learning APIs
Sam Bessalah
December 03, 2014
Tweet
Share
More Decks by Sam Bessalah
See All by Sam Bessalah
Streaming Platforms
samklr
0
340
Intro to Parquet (June 2015)
samklr
0
270
High Performance RPC with Finagle
samklr
1
160
Dotscale 2015 Lightning - Distributed Systems Research
samklr
1
780
Datageeks_27-05.pdf
samklr
0
49
Scalable Machine Learning
samklr
2
210
mesos.devoxx.2014
samklr
2
240
Algebird : Abstract Algebra for Big Data Analytics.
samklr
9
2.8k
Algebra for analytics
samklr
1
270
Other Decks in Technology
See All in Technology
Developer Summit 2025 [14-D-1] Yuki Hattori
yuhattor
19
5.1k
Culture Deck
optfit
0
330
個人開発から公式機能へ: PlaywrightとRailsをつなげた3年の軌跡
yusukeiwaki
11
2.7k
インフラをつくるとはどういうことなのか、 あるいはPlatform Engineeringについて
nwiizo
5
2.1k
The 5 Obstacles to High-Performing Teams
mdalmijn
0
270
君も受託系GISエンジニアにならないか
sudataka
2
370
データ資産をシームレスに伝達するためのイベント駆動型アーキテクチャ
kakehashi
PRO
2
230
テストアーキテクチャ設計で実現する高品質で高スピードな開発の実践 / Test Architecture Design in Practice
ropqa
3
710
PL900試験から学ぶ Power Platform 基礎知識講座
kumikeyy
0
110
スタートアップ1人目QAエンジニアが QAチームを立ち上げ、“個”からチーム、 そして“組織”に成長するまで / How to set up QA team at reiwatravel
mii3king
1
1.1k
『AWS Distinguished Engineerに学ぶ リトライの技術』 #ARC403/Marc Brooker on Try again: The tools and techniques behind resilient systems
quiver
0
130
トラシューアニマルになろう ~開発者だからこそできる、安定したサービス作りの秘訣~
jacopen
2
1.5k
Featured
See All Featured
The Success of Rails: Ensuring Growth for the Next 100 Years
eileencodes
44
7k
The Power of CSS Pseudo Elements
geoffreycrofte
75
5.5k
Making Projects Easy
brettharned
116
6k
How to Create Impact in a Changing Tech Landscape [PerfNow 2023]
tammyeverts
49
2.3k
[RailsConf 2023] Rails as a piece of cake
palkan
53
5.2k
Fashionably flexible responsive web design (full day workshop)
malarkey
406
66k
Chrome DevTools: State of the Union 2024 - Debugging React & Beyond
addyosmani
3
310
Save Time (by Creating Custom Rails Generators)
garrettdimon
PRO
29
1k
Cheating the UX When There Is Nothing More to Optimize - PixelPioneers
stephaniewalter
280
13k
Stop Working from a Prison Cell
hatefulcrawdad
267
20k
Fantastic passwords and where to find them - at NoRuKo
philnash
51
3k
XXLCSS - How to scale CSS and keep your sanity
sugarenia
248
1.3M
Transcript
Big Data and Machine Learning APIs
Sam Bessalah @samklr Software Engineer, Freelance Data Engineering, Distributed systems,
Machine Learning Paris Data Geek Meetup @DataParis me :
None
None
None
Big Data Legends ….
Big Data Legends … Web logs Sensors Other Data source
.. . . .
A Big Data Legend … Web logs Sensors Other Data
sources .. . . .
A Big Data Legend … Web logs Sensors Other Data
sources .. . . .
A Big Data Legend … Web logs Sensors Other Data
sources .. . . .
A Big Data Legend … Web logs Sensors Other Data
sources .. . . .
A Big Data Legend … Web logs Sensors Other Data
sources .. . . . Data Driven Decisions Smart Applications
BUT ….
- Building big data infrastructures is no easy task. -
Leveraging data for decision making requires a mix of multiples skills : . System Engineering . Distributed computing . Statistics . Machine Learning
Solutions …. - Build Data platforms as a service. -
Build robust and consistent APIs to bring big data to the masses. - Leverages fluent APIs for fast data science
None
Big Data is not just about throwing data to Hadoop.
It’s also about data pipelines
Data Sources
Data Sources
Data Sources - High Throughput distributed mssaging platform - Publish
Subscribe Model - Modelled as a distributed replicated log - Persists messages to disk - Categorizes messages into Topics - Allows message retention for long specified amount of time - Allows stream replay in case of failure
Data Sources Machine Learning High Latency Batch Apps Real Time
Processing
How do you build an API around that?
None
/ingest REST API
/ingest
/ingest /query /trainModel /process
Things to be careful with - Multitenancy (Yarn, Mesos, Docker…)
- Job Scheduling - Security - Serialisation : ProtoBuf, Thrift, Avro - Storage Format : Optimize queries with columnar storage. - Compression : LZO, Snappy
Making sense of data …
None
What is Machine Learning?
http://dilbert.com/strips/comic/2013-02-02
None
https://speakerdeck.com/nivdul/lightning-fast-machine-learning-with-spark-1
Machine Learning workflow
Machine Learning workflow Text, Images, etc
Machine Learning workflow Text, Images, etc Feature Extraction
Machine Learning workflow Text, Images, etc Feature Extraction Learning algorithm
Training
Machine Learning workflow Text, Images, etc Feature Extraction Learning algorithm
Training Predictive Model
Machine Learning workflow Text, Images, etc Feature Extraction Learning algorithm
Training Predictive Model New Data Feature Vector Prediction
Machine Learning workflow Text, Images, etc Feature Extraction Learning algorithm
Predictive Model New Data Prediction
Machine Learning workflow Text, Images, etc Feature Extraction Learning algorithm
Predictive Model New Data Prediction BLACK BOX
Machine Learning Libraries and Frameworks
scikit-learn.org
Text, Images, etc Feature Extraction Predictive Model New Data Prediction
X = vect.fit_transform(input) clf.fit(X,y) X_new = vect.fit_transform(input) y_new= clf.predict(X_new)
http://arxiv.org/abs/1309.0238
From library to web APIs
Machine Learning workflow Text, Images, etc Feature Extraction Learning algorithm
Predictive Model New Data Prediction BLACK BOX
Machine Learning workflow Text, Images, etc Transformed Data Application Prediction
Predictive API
Predictive Web APIs
Some examples
Challenges of Predictive APIs
http://www.r-bloggers.com/data-science-toolbox-survey-results-surprise-r-and-python-win/
Modeling and Prediction are just a small part of the
process
- Data locality and data gravity - Support the full
workflow - Verticalization of platforms - Scalability - Collaboration and interoperability - Black boxing of implementations
Explore machine learning for APIs orchestration. Talk to Ori @OriPekelman
Next Frontier ? Or actual reality ?
None
http://speakerdeck.com/samklr