Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Making Deployments Easy with TF Serving | TF Ev...
Search
Rishit Dagli
May 11, 2021
Programming
200
1
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
Making Deployments Easy with TF Serving | TF Everywhere India
My talk at TensorFlow Everywhere India
Rishit Dagli
May 11, 2021
More Decks by Rishit Dagli
See All by Rishit Dagli
Fantastic Models and Where to Find Them
rishitdagli
0
98
Plant AI: Project Showcase
rishitdagli
0
170
Deploying an ML Model as an API | Postman Student Summit
rishitdagli
0
120
APIs 101 with Postman
rishitdagli
0
120
Deploying Models to production with Azure ML | Scottish Summit
rishitdagli
1
110
Computer Vision with TensorFlow, Getting Started
rishitdagli
0
340
Teaching Your Models to Play Fair | Global AI Student Conf
rishitdagli
1
200
Deploying Models to Production with TF Serving
rishitdagli
1
230
Superpower Your Android apps with ML: Android 11 | Devfest 2020
rishitdagli
1
100
Other Decks in Programming
See All in Programming
Hunting Vulnerabilities in Symfony with LLMs
vinceamstoutz
0
550
その問い、本当に正しいですか?AI時代のエンジニアに必要な哲学と認知科学 / ai-philosophy-cognitive-science
minodriven
11
5.8k
フロントエンドとバックエンドで「1文字」を揃えよう
youkidearitai
PRO
0
710
Lemonade + Foundry Toolkit でお手軽アプリ開発
seosoft
1
360
技術記事、AIに書かせるか、自分で書くか? 〜それでも私が自分の手で書く理由〜 / #QiitaConference
jnchito
2
1.4k
ローカルLLMでどこまでコードが書けるか -拡張版 / How much code can be written on a local LLM Extended
kishida
11
4.3k
そのテスト、説明できますか?~LWテスト戦略FW~のご紹介
nakahara
0
150
jQueryをバージョンアップする前に使いたいjQuery Migrate
matsuo_atsushi
0
560
決定論的オーケストレーションの設計と実装 / Design and Implementation of Deterministic Orchestration
nrslib
4
1.4k
気づいたらRubyで100作品 ー クリエイティブコーディングが生活の一部になるまで / 100 Ruby Sketches Later: How Creative Coding Became Part of My Life
chobishiba
3
590
ADKを使って簡単にAIエージェントを作ってみよう
k1mu21
0
270
「エンジニアインターン、どうやって取った?」準備のリアルを語るLT会 Progate BAR
akiomatic
0
140
Featured
See All Featured
The Straight Up "How To Draw Better" Workshop
denniskardys
239
140k
Designing for humans not robots
tammielis
254
26k
Efficient Content Optimization with Google Search Console & Apps Script
katarinadahlin
PRO
1
630
Bootstrapping a Software Product
garrettdimon
PRO
307
120k
Building a A Zero-Code AI SEO Workflow
portentint
PRO
0
600
Rails Girls Zürich Keynote
gr2m
96
14k
Are puppies a ranking factor?
jonoalderson
1
3.6k
The Psychology of Web Performance [Beyond Tellerrand 2023]
tammyeverts
49
3.5k
Statistics for Hackers
jakevdp
799
230k
Why Your Marketing Sucks and What You Can Do About It - Sophie Logan
marketingsoph
0
170
What's in a price? How to price your products and services
michaelherold
247
13k
Claude Code のすすめ
schroneko
67
230k
Transcript
Making Deployments Easy with TF Serving Rishit Dagli High School
TEDx, TED-Ed Speaker rishit_dagli Rishit-dagli
“Most models don’t get deployed.”
of models don’t get deployed. 90%
Source: Laurence Moroney
Source: Laurence Moroney
• High School Student • TEDx and Ted-Ed Speaker •
♡ Hackathons and competitions • ♡ Research • My coordinates - www.rishit.tech $whoami rishit_dagli Rishit-dagli
• Devs who have worked on Deep Learning Models (Keras)
• Devs looking for ways to put their model into production ready manner Ideal Audience
Why care about ML deployments? Source: memegenerator.net
None
• Package the model What things to take care of?
• Package the model • Post the model on Server
What things to take care of?
• Package the model • Post the model on Server
• Maintain the server What things to take care of?
• Package the model • Post the model on Server
• Maintain the server Auto-scale What things to take care of?
• Package the model • Post the model on Server
• Maintain the server Auto-scale What things to take care of?
• Package the model • Post the model on Server
• Maintain the server Auto-scale Global availability What things to take care of?
• Package the model • Post the model on Server
• Maintain the server Auto-scale Global availability Latency What things to take care of?
• Package the model • Post the model on Server
• Maintain the server • API What things to take care of?
• Package the model • Post the model on Server
• Maintain the server • API • Model Versioning What things to take care of?
Simple Deployments Why are they inefficient?
None
Simple Deployments Why are they inefficient? • No consistent API
• No model versioning • No mini-batching • Inefficient for large models Source: Hannes Hapke
TensorFlow Serving
TensorFlow Serving TensorFlow Data validation TensorFlow Transform TensorFlow Model Analysis
TensorFlow Serving TensorFlow Extended
• Part of TensorFlow Extended TensorFlow Serving
• Part of TensorFlow Extended • Used Internally at Google
TensorFlow Serving
• Part of TensorFlow Extended • Used Internally at Google
• Makes deployment a lot easier TensorFlow Serving
The Process
• The SavedModel format • Graph definitions as protocol buffer
Export Model
SavedModel Directory
auxiliary files e.g. vocabularies SavedModel Directory
auxiliary files e.g. vocabularies SavedModel Directory Variables
auxiliary files e.g. vocabularies SavedModel Directory Variables Graph definitions
TensorFlow Serving
TensorFlow Serving
TensorFlow Serving Also supports gRPC
TensorFlow Serving
TensorFlow Serving
TensorFlow Serving
TensorFlow Serving
Inference
• Consistent APIs • Supports simultaneously gRPC: 8500 REST: 8501
• No lists but lists of lists Inference
• No lists but lists of lists Inference
• JSON response • Can specify a particular version Inference
with REST Default URL http://{HOST}:8501/v1/ models/test Model Version http://{HOST}:8501/v1/ models/test/versions/ {MODEL_VERSION}: predict
• JSON response • Can specify a particular version Inference
with REST Default URL http://{HOST}:8501/v1/ models/test Model Version http://{HOST}:8501/v1/ models/test/versions/ {MODEL_VERSION}: predict Port Model name
Inference with REST
• Better connections • Data converted to protocol buffer •
Request types have designated type • Payload converted to base64 • Use gRPC stubs Inference with gRPC
Model Meta Information
• You have an API to get meta info •
Useful for model tracking in telementry systems • Provides model input/ outputs, signatures Model Meta Information
Model Meta Information http://{HOST}:8501/ v1/models/{MODEL_NAME} /versions/{MODEL_VERSION} /metadata
Batch Inferences
• Use hardware efficiently • Save costs and compute resources
• Take multiple requests process them together • Super cool😎 for large models Batch inferences
• max_batch_size • batch_timeout_micros • num_batch_threads • max_enqueued_batches • file_system_poll_wait
_seconds • tensorflow_session _paralellism • tensorflow_intra_op _parallelism Batch Inference Highly customizable
• Load configuration file on startup • Change parameters according
to use cases Batch Inference
Also take a look at...
• Kubeflow deployments • Data pre-processing on server🚅 • AI
Platform Predictions • Deployment on edge devices • Federated learning Also take a look at...
bit.ly/tf-everywhere-ind Demos!
bit.ly/serving-deck Slides
Thank You rishit_dagli Rishit-dagli