Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
mlct.pdf
Search
Sponsored
·
SiteGround - Reliable hosting with speed, security, and support you can count on.
→
Hirofumi Nakagawa/中河 宏文
July 23, 2018
Programming
2.1k
2
Share
mlct.pdf
Hirofumi Nakagawa/中河 宏文
July 23, 2018
More Decks by Hirofumi Nakagawa/中河 宏文
See All by Hirofumi Nakagawa/中河 宏文
IoTデバイスでMLモデルを動かす技術
hnakagawa
0
220
Kanazawa_AI.pdf
hnakagawa
0
210
メルカリ写真検索における Amazon EKS の活用事例と プロダクトにおけるEdgeAI technologyの展望
hnakagawa
5
9.1k
メルカリの写真検索を支えるバックエンド CCSE 2019 version
hnakagawa
0
360
メルカリ写真検索における Amazon EKS の活用事例
hnakagawa
6
29k
メルカリの写真検索を支えるバックエンド
hnakagawa
1
1.2k
Mercari ML Platform
hnakagawa
1
17k
機械学習によるマーケット健全化施策を支える技術
hnakagawa
0
270
メルカリのマーケット健全化施策を支えるML基盤
hnakagawa
10
9.2k
Other Decks in Programming
See All in Programming
SkillがSkillを生む:QA観点出しを自動化した
sontixyou
5
2.4k
Codex CLI でつくる、Issue から merge までの開発フロー
amata1219
0
300
20260320登壇資料
pharct
0
160
Java 21/25 Virtual Threads 소개
debop
0
320
「速くなった気がする」をデータで疑う
senleaf24
0
130
Kubernetes上でAgentを動かすための最新動向と押さえるべき概念まとめ
sotamaki0421
2
380
PHPのバージョンアップ時にも役立ったAST(2026年版)
matsuo_atsushi
0
280
AI活用のコスパを最大化する方法
ochtum
0
370
Xdebug と IDE による デバッグ実行の仕組みを見る / Exploring-How-Debugging-Works-with-Xdebug-and-an-IDE
shin1x1
0
340
感情を設計する
ichimichi
4
630
テレメトリーシグナルが導くパフォーマンス最適化 / Performance Optimization Driven by Telemetry Signals
seike460
PRO
2
220
Ruby and LLM Ecosystem 2nd
koic
1
1.5k
Featured
See All Featured
Visual Storytelling: How to be a Superhuman Communicator
reverentgeek
2
490
We Are The Robots
honzajavorek
0
210
Efficient Content Optimization with Google Search Console & Apps Script
katarinadahlin
PRO
1
470
ReactJS: Keep Simple. Everything can be a component!
pedronauck
666
130k
Discover your Explorer Soul
emna__ayadi
2
1.1k
JAMstack: Web Apps at Ludicrous Speed - All Things Open 2022
reverentgeek
1
410
Code Review Best Practice
trishagee
74
20k
Build your cross-platform service in a week with App Engine
jlugia
234
18k
Writing Fast Ruby
sferik
630
63k
Faster Mobile Websites
deanohume
310
31k
The Illustrated Children's Guide to Kubernetes
chrisshort
51
52k
Information Architects: The Missing Link in Design Systems
soysaucechin
0
860
Transcript
ϝϧΧϦͷMLج൫ MLCT vol.5 hnakagawa
ࣗݾհ • Hirofumi Nakagawa (hnakagawa) • 20177݄ೖࣾ • ॴଐSRE •
σόΠευϥΠό։ൃ͔Βϑϩϯ τΤϯυ։ൃ·ͰΔԿͰ • NOT σʔλαΠΤϯςΟετ • https://github.com/hnakagawa
͓ࣄ • ML Platform։ൃ • σʔλαΠΤϯςΟετͱSREͷεΩϧΪϟο ϓΛຒΊΔ • ML Reliability,
SysML?, MLOps? • SREͷཱ͔ΒMLγεςϜͷࣗಈԽΛߦ͏
ML Platform • ͷML Platform • kubernetesϕʔε • طଘͷML FrameworkΛ༻͠
؆୯ʹTraining/ServingΛߦ͏ ڥΛఏڙ
ͦͷ͏ͪOSSͰެ։༧ఆ(ଟ
ϝϧΧϦͷMLར༻ࣄྫ • ײಈग़ • ҧग़ݕ • Ձ֨αδΣετ • ΤΠταδΣετ ʑ…
̍ઍສpredictionΛߦ͍ͬͯΔ
ML Platform Architecture ,VCFSOFUFT $POUSPMMFS $-* $MVTUFS8PSLGMPX %BTICPBSE 4UPSBHF(BUFXBZ .FUSJDT
3VOOFS $PNQPOFOU .FSDBSJ.- $PNQPOFOU &YUFSOBM .JEEMFXBSF
Model Training & Serving Workflow
.-1MBUGPSN USBJOJOHDMVTUFS Workflow for Production $* .-1MBUGPSN TFSWJOHDMVTUFSGPSUFTU .PEFM3FHJTUSZ +PC
+PC ɾɾ 3&45 "1* 4USFBNJOH 5'4FSW JOH ɾɾɾ
.-1MBUGPSN USBJOJOHDMVTUFS Training Workflow $* .PEFM3FHJTUSZ +PC +PC ɾɾɾ 1.
GitHubͷpushΛτϦΨʹtrainingΛىಈ 2. Training͞ΕͨModelModel Registry ্͕Δ
Serving Workflow .-1MBUGPSN TFSWJOHDMVTUFSGPSUFTU .PEFM3FHJTUSZ ɾɾ 3&45 "1* 4USFBNJOH 5'
4FSWJOH 1. Model RegistryΛࢹͯࣗ͠ಈͰModel ΛServing 2. Serving&Test͕ޭ͢Δͱຊ൪༻k8s manifestΛग़ྗ
Container Workflow %BUB4PVSDF *NBHF 5FYUɹ 1SFQSPDFT TJOH *NBHF &TUJNBUPS *NBHF
17 17 1JDUVSF 1SFQSPDFT TJOH *NBHF 17 It’s own implementation
Model Serving APIͷߏྫ 5FOTPS'MPX 4FSWJOH 5' .PEFM 5' .PEFM 'MBTL
4, .PEFM 4, .PEFM 4, .PEFM gRPC .FSDBSJ"1* REST FlaskͰલॲཧΛߦ͍ ཪͷTensorFlow Servingʹ͍͛ͯΔ
Model Serving API Streaming ver ͷߏྫ 5FOTPS'MPX 4FSWJOH 5' .PEFM
5' .PEFM .-1MBUGPSN 'SBNFXPSL PS "QBDIF#FBN 4, .PEFM 4, .PEFM 4, .PEFM gRPC PubSub
ModelͱίϯςφɾΠϝʔδ • ڊେͳML ModelΛίϯςφɾΠϝʔδʹؚΊ Δ͔൱͔ • ؚΊͳ͍ͷͰ͋ΕԿॲʹஔ͢Δ͔ • ϙʔλϏϦςΟੑͱϩʔυ࣌ؒͷτϨʔυΦϑ •
ྑ͍ΞΠσΟΞ͕͋Εڭ͑ͯԼ͍͞…
௨ৗͷAPIͱಛੑ͕ҧ͏ • ѻ͏ϦιʔεɺModelαΠζ͕େ͖͘ͳΔ ߹͕ଟ͍(ඦMBʙGB) • CPUɾϝϞϦϦιʔεͷফඅ͕ܹ͍͠ • ߹ʹΑͬͯGPU͏
ϝϞϦফඅ • ҧݕγεςϜͷPython࣮෦࣮ߦ࣌ ʹ2GBϝϞϦΛফඅ͢Δˠࠓޙ͞Βʹ૿͑ Δ༧ఆ͋Δ • Scikit-learnͰهड़͞Εͨલॲཧ෦͕େ͖͘ ͳΓ͕ͪ
Pythonͱฒྻੑ • વThread͕͑ͳ͍(GILͷͨΊ) • ϓϩηεຖʹModelΛϩʔυ͢Δͱඞཁͳϝ ϞϦαΠζ͕େ͖͘ͳΔˠ Blue-Green DeployͷোʹͳΔ
ਖ਼PythonͰͷServing Πϯϑϥతʹਏ͍ࣄ͕ଟ͍…
ϝϞϦΛݡ͘͏ • fork͢ΔલʹmodelΛϩʔυ͠Copy on Write Λޮ͔͢ • k8sͷone process per
containerηΦϦ͋ ͑ͯഁ͍ͬͯΔ
Copy On Writeͷ෮श ϝϞϦ ϓϩηε ࢠϓϩηε 2.fork 1BHF" 1.allocation ಉ͡ྖҬΛࢀর
ϓϩηε͕ϝϞϦͷ༰Λ ॻ͖͑Δͱ… ϝϞϦ ϓϩηε ࢠϓϩηε 1BHF" 1BHF# OS͕ผͷྖҬΛAllocationͯ͠ݩσʔλΛίϐʔ͢Δ ผͷྖҬΛࢀর
Current Issues
ߴͳܧଓతϝϯςφϯε͕ඞཁ • MLػೳσʔλͷ͕มΘͬͨΓɺ༧֎ ͷ͕ൃੜͨ͠Γͯ͠ɺͦΕΒʹରԠ͠ଓ ͚Δඞཁ͕͋Δ MLػೳϦϦʔεޙେ͖ͳ ίετ͕͔͔Γଓ͚Δ
େ෯ͳࣗಈԽ͕ඞਢ
In Progress
ߴͳࣗಈԽ • ࣾͷσʔλ͔ΒFeature Extraction͢Δ࣮ ΛίϯϙʔωϯτԽ • ಛఆͷΛղܾ͢ΔϞσϧߏஙΛ͋Δఔ ࣗಈԽ • ϦϦʔεޙͷRe-TrainingɺHyper
parameter optimizationɺDeployΛࣗಈԽ
AutoFlow 'FBUVSF&YUSBDUJPO $PNQPOFOUT $MBTTJGJDBUJPO $PNQPOFOUT $PODBUFOBUJPO $PNQPOFOUT .PEFM #VJMEFS $PNQPOFOUT
3FHJTUSZ Ϋϥελ্ͰϞσϧͷࣗಈߏஙͱϋΠύʔύϥ ϝʔλͷࣗಈௐΛߦ͏
AutoServing %FQMPZ ϦϦʔεޙͷਫ਼ࢹɾRe-TrainingɾRe-Deploy ΛࣗಈͰߦ͏ .POJUPSJOH &WBMVBUJPO )ZQFS QBSBNFUFS PQUJNJ[BUJPO 3F5SBJOJOH
·ͱΊ • MLʹগ͠௨ৗͱҧ͏Πϯϑϥ͕ඞཁʹͳΔ ˠ·ͩϕετɾϓϥΫςΟε͔Βͳ͍ • ͦͦMLͳػೳΛຊ֨ӡ༻͠Α͏ͱ͢Δ ͱɺେ෯ͳࣗಈԽɾΈԽΛਐΊͳ͍ͱ্ ख͘ߦ͔ͳ͍
͝ਗ਼ௌ͋Γ͕ͱ͏͍͟͝·ͨ͠!!