Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
IVRyエンジニア忘年LT大会2024 LLM監視の最前線
Search
Hiroyuki Moriya
December 11, 2024
1
290
IVRyエンジニア忘年LT大会2024 LLM監視の最前線
Hiroyuki Moriya
December 11, 2024
Tweet
Share
More Decks by Hiroyuki Moriya
See All by Hiroyuki Moriya
kueueに新しいPriorityClassを足した話
gekko0114
0
670
JobSet超入門
gekko0114
1
880
Featured
See All Featured
Music & Morning Musume
bryan
46
6.4k
Performance Is Good for Brains [We Love Speed 2024]
tammyeverts
8
720
10 Git Anti Patterns You Should be Aware of
lemiorhan
PRO
656
60k
A Modern Web Designer's Workflow
chriscoyier
693
190k
The MySQL Ecosystem @ GitHub 2015
samlambert
251
12k
Improving Core Web Vitals using Speculation Rules API
sergeychernyshev
12
630
The Art of Programming - Codeland 2020
erikaheidi
53
13k
Become a Pro
speakerdeck
PRO
27
5.2k
The Language of Interfaces
destraynor
157
24k
Site-Speed That Sticks
csswizardry
4
460
Optimising Largest Contentful Paint
csswizardry
35
3.2k
Bootstrapping a Software Product
garrettdimon
PRO
307
110k
Transcript
confidencial LLMࢹͷ࠷લઢ IVRy ΤϯδχΞLTେձ 2024/12/11 Moriya Hiroyuki
confidencial 2 Զͷ໊લɺߴߍੜ୳ఁ౻৽Ұ
confidencial 3 Զͷ໊લɺߴߍੜ୳ఁ౻৽Ұ AIΤϯδχΞͱͯ͠ಇ͖࢝ΊͨԶɺLLMΛͬͯɺͨ͘͞Μ͓ۚΛՔ͍Ͱ͍Δε λʔτΞοϓΛܸͨ͠ɻ
confidencial 4 Զͷ໊લɺߴߍੜ୳ఁ౻৽Ұ AIΤϯδχΞͱͯ͠ಇ͖࢝ΊͨԶɺLLMΛͬͯɺͨ͘͞Μ͓ۚΛՔ͍Ͱ͍Δε λʔτΞοϓΛܸͨ͠ɻ ʮͪΐͪΐͬͱ։ൃͨ͠ΒϘϩṶ͚Ͱ͖ΔΜʂʯͱؾ͍ͮͨԶɺىۀͯ͠ɺͻͨ ͢ΒPoCϓϩδΣΫτΛΫϥΠΞϯτʹఏڙ͢Δ͜ͱʹͨ͠ɻ
confidencial 5 Զͷ໊લɺߴߍੜ୳ఁ౻৽Ұ AIΤϯδχΞͱͯ͠ಇ͖࢝ΊͨԶɺLLMΛͬͯɺͨ͘͞Μ͓ۚΛՔ͍Ͱ͍Δε λʔτΞοϓΛܸͨ͠ɻ ʮͪΐͪΐͬͱ։ൃͨ͠ΒϘϩṶ͚Ͱ͖ΔΜʂʯͱؾ͍ͮͨԶɺىۀͯ͠ɺͻͨ ͢ΒPoCϓϩδΣΫτΛΫϥΠΞϯτʹఏڙ͢Δ͜ͱʹͨ͠ɻ Զɺഎޙ͔Β͍ۙͮͯ͘Δrate limit੍ݶͱɺLatencyͷѱԽʹؾ͕͍͍ͭͯͳ͔ͬ ͨɻ
confidencial 6 Զͷ໊લɺߴߍੜ୳ఁ౻৽Ұ AIΤϯδχΞͱͯ͠ಇ͖࢝ΊͨԶɺLLMΛͬͯɺͨ͘͞Μ͓ۚΛՔ͍Ͱ͍Δε λʔτΞοϓΛܸͨ͠ɻ ʮͪΐͪΐͬͱ։ൃͨ͠ΒϘϩṶ͚Ͱ͖ΔΜʂʯͱؾ͍ͮͨԶɺىۀͯ͠ɺͻͨ ͢ΒPoCϓϩδΣΫτΛΫϥΠΞϯτʹఏڙ͢Δ͜ͱʹͨ͠ɻ Զɺഎޙ͔Β͍ۙͮͯ͘Δrate limit੍ݶͱɺLatencyͷѱԽʹؾ͕͍͍ͭͯͳ͔ͬ ͨɻ
ؾ͕͍ͭͨΒɺԶͷϓϩμΫτղ͕૬࣍͗ɺձࣾ࢈ͯ͠͠·͍ͬͯͨ...
confidencial 7
confidencial 8 ࠓɺ౻৽Ұ܅͕ɺ͜Μͳ݁Λܴ͑ͳ͍ͨΊʹͰ͖Δ͜ͱΛ͓͠͠·͢ɻ
confidencial ࣗݾհ 2024/08 ೖࣾ SWEɾػցֶशΤϯδχΞͳͲΛܦݧ LLM͕ίΞʹͳΓͦ͏ͳαʔϏεͩͱࢥͬͯIVRyʹೖࣾ Moriya Hiroyuki 9 AI
engineer
confidencial IVRyͰͷLLMΛར༻ͨ͠AIର 10 WebsocketΛར༻͠ΤϯυϢʔβʔͱLLM͕ϦΞϧλΠϜʹΓऔΓ͍ͯ͠Δ
confidencial LLM Fallback 11 ෳͷLLMΛར༻͢Δ͜ͱΛલఏʹFallbackػߏΛߏங APIͷStatus, Ratelimitσʔλ੍(ཧ੍)ΛͱʹৼΓ͚
confidencial LLM Fallback 12 ෳͷLLMΛར༻͢Δ͜ͱΛલఏʹFallbackػߏΛߏங APIͷStatus, Ratelimitσʔλ੍(ཧ੍)ΛͱʹৼΓ͚ ࢹ͢Ε ྑ͍ͷ͡Ό
confidencial ํ๏ 1ɿDataDog LLM observability 13 DataDog͕Ӷҙ։ൃதͷLLMࢹʹಛԽͨ͠ػೳɻ Latency, token, promptͳͲΛऔಘͰ͖Δɻ
confidencial 14 ʮ͜ΕͰɺOpenAIͷlatency͕ࢹͰ͖ΔΑ͏ʹͳͬͨͥʂʯ
confidencial 15 ͋ΕΕʙɺ͓͔͍͠Αʙ
confidencial 16 ͋ΕΕʙɺ͓͔͍͠Αʙ ΒͷϓϩμΫτɺfallbackػߏΛ ࣮͍ͯ͠Δͷʹɺ OpenAIͷlatency͔͠ࢹͰ͖ͯͳ͍Αʙ
confidencial ํ๏ 2ɿOpenLIT (OpenTelemetry) 17 OpenTelemetryن֨ʹଇͬͨɺLLMࢹʹಛԽͨ͠πʔϧɻ ༷ʑͳLLMΛࢹ͢Δ͜ͱ͕Ͱ͖Δɻ
confidencial 18 ʮ͜ΕͰɺ৭ʑͳmodelͷlatency͕ࢹͰ͖ΔΑ͏ʹͳͬͨͥʂʯ
confidencial 19 ͋ΕΕʙɺ͓͔͍͠Αʙ
confidencial 20 ͋ΕΕʙɺ͓͔͍͠Αʙ Βɺ৭ʑͳϞσϧΛ͏Μ͔ͩΒɺ provider͝ͱʹɺlatencyΛܭଌ͢Δඞཁ͕͋Δͷʹ LiteLLMશମͰͷlatency͔͠औΕͯͳ͍Αʙ
confidencial ํ๏ 3ɿDataDog Inferred services 21 DataDogʹࡌ͞ΕͨɺApp֎ͷϦΫΤετΛࢹͯ͘͠ΕΔػߏ
confidencial 22 ʮ͜ΕͰɺLiteLLMͰ͍ͬͯΔͯ͢ͷmodelΛࢹͰ͖ΔΑ͏ʹͳͬͨͥʂʯ
confidencial 23 ͋ΕΕʙɺ͓͔͍͠Αʙ
confidencial 24 ͋ΕΕʙɺ͓͔͍͠Αʙ ΒɺGeminiɺOpenAIͰ̍ͭͷmodelΛ ͏ͱݶΒͳ͍ͷʹɺ ݸผͷmodelͷlatencyΛऔಘ͢Δ͜ͱ Ͱ͖ͯͳ͍Αʙ
confidencial ·ͱΊ LLMࢹɺ·ͩ·ͩൃల్্Ͱݟ͕͋Γ·ͤΜʂ AIɾLLMΛ͍͜ͳͯ͠ϓϩμΫτʹೖΕ͍ͯ͘աఔͰɺ ࣗΒ͕Γ։͍͍ͯ͘ඞཁ͕͋Γ·͢ɻ ͥͻҰॹʹAIࢹΛ͍͖ͬͯ·͠ΐ͏ʂ 25