Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Go言語での実装を通して学ぶ、高速なベクトル検索を支えるクラスタリング技術/fukuokago...
Search
monochromegane
March 11, 2025
Programming
1
180
Go言語での実装を通して学ぶ、高速なベクトル検索を支えるクラスタリング技術/fukuokago-kmeans
2025.03.11 Fukuoka.go#21
https://fukuokago.connpass.com/event/344467/
monochromegane
March 11, 2025
Tweet
Share
More Decks by monochromegane
See All by monochromegane
ベクトル検索システムの気持ち
monochromegane
33
11k
Go言語でターミナルフレンドリーなAIコマンド、afaを作った/fukuokago20_afa
monochromegane
2
250
多様かつ継続的に変化する環境に適応する情報システム/thesis-defense-presentation
monochromegane
1
900
Online Nonstationary and Nonlinear Bandits with Recursive Weighted Gaussian Process
monochromegane
0
550
AIを前提とした体験の実現に向けて/toward_ai_based_experiences
monochromegane
2
960
Go言語でMac GPUプログラミング
monochromegane
1
610
Contextual and Nonstationary Multi-armed Bandits Using the Linear Gaussian State Space Model for the Meta-Recommender System
monochromegane
1
1.1k
迅速な学習機構を用いて逐次適応性を損なうことなく非線形性を扱う文脈付き多腕バンディット手法/extreme_neural_linear_bandits
monochromegane
0
2.2k
再帰化への認知的転回/the-turn-to-recursive-system
monochromegane
0
810
Other Decks in Programming
See All in Programming
C++20 射影変換
faithandbrave
0
390
Create a website using Spatial Web
akkeylab
0
260
AWS CDKの推しポイント 〜CloudFormationと比較してみた〜
akihisaikeda
3
210
漸進。
ssssota
0
1.9k
SODA - FACT BOOK
sodainc
1
840
Javaに鉄道指向プログラミング (Railway Oriented Pro gramming) のエッセンスを取り入れる/Bringing the Essence of Railway-Oriented Programming to Java
cocet33000
2
540
生成AIで日々のエラー調査を進めたい
yuyaabo
0
520
Cloudflare Realtime と Workers でつくるサーバーレス WebRTC
nekoya3
0
390
Using AI Tools Around Software Development
inouehi
0
1.2k
関数型まつり2025登壇資料「関数プログラミングと再帰」
taisontsukada
2
790
PT AI без купюр
v0lka
0
230
ワンバイナリWebサービスのススメ
mackee
10
7.7k
Featured
See All Featured
Principles of Awesome APIs and How to Build Them.
keavy
126
17k
Stop Working from a Prison Cell
hatefulcrawdad
269
20k
Art, The Web, and Tiny UX
lynnandtonic
299
21k
How to Create Impact in a Changing Tech Landscape [PerfNow 2023]
tammyeverts
52
2.8k
jQuery: Nuts, Bolts and Bling
dougneiner
63
7.8k
Build The Right Thing And Hit Your Dates
maggiecrowley
36
2.7k
Docker and Python
trallard
44
3.4k
The Power of CSS Pseudo Elements
geoffreycrofte
77
5.8k
Unsuck your backbone
ammeep
671
58k
Gamification - CAS2011
davidbonilla
81
5.3k
BBQ
matthewcrist
89
9.7k
個人開発の失敗を避けるイケてる考え方 / tips for indie hackers
panda_program
107
19k
Transcript
ࡾ༔հ / Pepabo R&D Institute, GMO Pepabo, Inc. 2025.03.11 Fukuoka.go#21
GoݴޠͰͷ࣮Λ௨ֶͯ͠Ϳ ߴͳϕΫτϧݕࡧΛࢧ͑Δ ΫϥελϦϯάٕज़
1. ͡Ίʹ 2. ϕΫτϧݕࡧΤϯδϯΛࢧ͑ΔΫϥελϦϯάٕज़ 3. GoݴޠͰk-meansΛ࣮͢Δ 4. ධՁ 5. ·ͱΊ
 2 ࣍
1. ͡Ίʹ
• AIͱ֎෦ใͷՍ͚ڮͱͳΔɺRAGʢRetrieval-Augmented Generationʣʹ ද͞ΕΔΑ͏ʹɺඇߏԽσʔλΛϕΫτϧʹมͯ͠ݕࡧ͢ΔɺϕΫτϧ ݕࡧΤϯδϯͷ༗༻ੑ͕ݟ͞Ε͍ͯΔ • ૉͳϕΫτϧݕࡧΤϯδϯɺϕΫτϧू߹͔ΒΫΤϦͱͳΔϕΫτϧͷۙ ʹҐஔ͢Δ෦ू߹ΛಘΔͨΊʹɺू߹ͷશཁૉʹରͯ͠ྨࣅڑͷ ईΛܭࢉ͢Δ •
ݕࡧରͱͳΔϕΫτϧ͕ߴ࣍ݩʢ  ʣ͔ͭσʔλ͕ଟ͍ ʢ  ʣ߹ɺ૯ͨΓͰ࣮༻తͳݕࡧੑೳΛಘΒΕͳ͍ͨΊɺਫ਼ͱ ͷτϨʔυΦϑΛڐ༰ͨ͠ɺۙࣅۙ୳ࡧͷΞϓϩʔν͕࠾༻͞ΕΔ D > 103 N > 104  4 ͡Ίʹʢ1/2ʣ
• ۙࣅۙ୳ࡧΛ࣮ݱ͢ΔϕΫτϧݕࡧΤϯδϯଟ͘ఏҊ͞Ε͍ͯΔ ʢAnnoyɺFaissɺQdrantɺChromaʣ • ҰํͰɺ͜ΕΒͷΤϯδϯͷੑೳΛҾ͖ग़ͨ͢Ίʹɺۙࣅۙ୳ࡧΞϧΰϦ ζϜΛɺѻ͏σʔλͱͷੑΛؚΊͯཧղ͢Δඞཁ͕͋Δ • ͳΜ͔Α͘Θ͔ΒΜ͕IVFPQͰσϑΥϧτύϥϝʔλͰϤγ • ຊൃදͰɺ͡Ίʹදతͳۙࣅۙ୳ࡧΞϧΰϦζϜΛհ͢Δɻ
࣍ʹɺͦ͜Ͱڞ௨ͯ͠࠾༻͞ΕΔΫϥελϦϯάٕज़ʹண͠ɺGoݴޠͰͷ ࣮Λ௨ͯ͠ɺͦͷಛੑΛཧղ͢Δ  5 ͡Ίʹʢ2/2ʣ
2. ߴͳ ϕΫτϧݕࡧΤϯδϯΛࢧ͑Δ ΫϥελϦϯάٕज़
• ϕΫτϧෳͷ͔ΒͳΔҰͭͷʮྔʯ • ͭ·ΓɺൺΔͨΊͷදݱܗࣜͷҰछ • ϕΫτϧಉ࢜ͷൺֱͷई • ϢʔΫϦουڑ:  •
ίαΠϯྨࣅ:  d(xi , xj ) = ∥xi − xj ∥2 = D ∑ d=1 (xi,d − xj,d )2 cos(θ) = xi ⋅ xj ∥xi ∥∥xj ∥ = ∑D d=1 xi,d xj,d ∑D d=1 x2 i,d ⋅ ∑D d=1 x2 j,d  7 ϕΫτϧݕࡧ
• ϕΫτϧू߹  ʹରͯ͠ΫΤϦϕΫτϧ  ͷۙ  ϕΫτϧΛಘ͍ͨ • 
• ૯ͨΓʢBrute forceʣͰɺσʔλ  ͱ࣍ݩ  ʹԠͯ͡ܭࢉྔ͕૿Ճ • ಉ༷ʹɺσʔλαΠζ͕૿Ճ͠ɺϝϞϦ্ͷల։͕ࠔʹͳΔ • ਫ਼ͱͷτϨʔυΦϑΛڐ༰ͯ͠ɺݕࡧͷ্ͱσʔλαΠζͷݮΛਤ Δۙࣅۙ୳ࡧΞϧΰϦζϜ͕ଟ͘ఏҊ͞Ε͍ͯΔ X q k 𝒩 k (q, X) = argminS⊂X,|S|=k ∑ x∈S d(q, x) N D  8 ۙ୳ࡧ
• ϕΫτϧू߹Λ  ݸͷදϕΫτϧ  Ͱදݱ͢Δ • ͜͜ͰͷྔࢠԽɺࢄԽʢάϧʔϐϯάʣͱଊ͑ͯΑ͍ • ϕΫτϧू߹
 ɺදϕΫτϧͷΠϯσοΫεͷू߹ͱͳΓɺ 256ύλʔϯͰ͋ΕϕΫτϧ͋ͨΓ8bitsͰදݱͰ͖Δ • ݕࡧ࣌ʹɺΫΤϦϕΫτϧͷ࠷͍ۙදϕΫτϧΛ୳ͨ͢Ίɺ୳ࡧେ ෯ʹݮͰ͖Δ • ҰํͰྔࢠԽޡࠩʢΫϥελͰͷࠩҟ͕ͳ͍ɺΫϥελॴଐޡΓʣ͕ൃੜ ͠ɺߴ࣍ݩʹͳΔ΄Ͳɺ͜ΕΛ͑ΔͨΊʹඞཁͳදϕΫτϧ͕૿Ճ͢Δ K C = {c1 , …cK } X  9 ϕΫτϧྔࢠԽʢVector Quantization: VQʣ
• ߴ࣍ݩϕΫτϧΛ  ݸͷ࣍ݩαϒϕΫτϧʹׂ͠ɺಠཱͯ͠VQ͢Δ • ͜͜Ͱੵͱɺू߹ಉ࢜ͷΈ߹ΘͤͰ৽͍͠ू߹ΛಘΔ͜ͱ •  ࣍ݩϕΫτϧΛ 
ຊͷ  ࣍ݩαϒϕΫτϧʹׂ͠ɺͦΕͧΕͷ VQͰ  ύλʔϯʹྔࢠԽͨ͠ͳΒɺ  ύλʔϯΛදݱͰ͖Δ M D M = 4 D/M 28 (28)4 = 232  10 ੵྔࢠԽʢProduct Quantization: PQʣʢ1/2ʣ  X ∈ RN×D  X1 ∈ RN×D/M  X2 ∈ RN×D/M  XM ∈ RN×D/M  …  ×  2K/M  2K/M  2K/M  2K  ≃
• PQʹ͓͚ΔݕࡧɺΫΤϦϕΫτϧΛ  αϒϕΫτϧʹׂ͠ɺରԠ͢Δα ϒϕΫτϧू߹ʹ͓͚Δ࠷͍ۙදϕΫτϧͱͷڑΛՃࢉ͢Δ •  • ΫΤϦαϒϕΫτϧͱදαϒϕΫτϧಉ࢜ͷΈ߹ΘͤࣄલʹϧοΫΞο ϓςʔϒϧͱͯ͠ܭࢉՄೳͰ͋ΓɺશϕΫτϧू߹ʹର͢ΔڑܭࢉͷޮԽ
Խ͕ՄೳʢͪΖΜϕΫτϧׂʹΑΔޡࠩ͋Δʣ • ҰํͰɺڑܭࢉશϕΫτϧू߹ͷσʔλ  ճൃੜ͢Δ M d(q, x) = M ∑ m=1 d(q(m), C(m)) N  11 ੵྔࢠԽʢProduct Quantization: PQʣʢ2/2ʣ
• ϕΫτϧू߹Λߥ͘ྨ͠ɺΫϥελ͝ͱʹΠϯσοΫεΛ࡞ • ΫΤϦ࣌ʹɺݕࡧରΛߜΓࠐΜͰݕࡧͰ͖ΔͨΊܭࢉྔͷݮ͕Մೳ • PQͱΈ߹ΘͤΔ͜ͱͰɺPQͷશ݅ݕࡧͷ՝Λ؇͢Δ  12 సஔΠϯσοΫεʢInVerted File:
IVFʣ  X ∈ RN′  ×D  X1 ∈ RN′  ×D/M  X2 ∈ RN′  ×D/M  XM ∈ RN′  ×D/M  …  ×  ≃  X ∈ RN′  ×D  X1 ∈ RN′  ×D/M  X2 ∈ RN′  ×D/M  XM ∈ RN′  ×D/M  …  ×  ≃  X ∈ RN′  ×D  X1 ∈ RN′  ×D/M  X2 ∈ RN′  ×D/M  XM ∈ RN′  ×D/M  …  ×  ≃ ⋮  X ∈ RN×D
• VQɺPQɺIVFΛ௨ͯ͠ɺσʔλྔͷݮͱݕࡧͷߴԽΛਤΔͨΊͷ४උͱ ͯ͠ɺΫϥελϦϯά͕ߦΘΕ͍ͯΔ͜ͱ͕͔Δ • FaissͰΫϥελϦϯάͱͯ͠k-meansΞϧΰϦζϜ͕ΘΕ͓ͯΓɺߴͳ ࣮ͱͳ͍ͬͯΔͱͷ͜ͱ • GoݴޠͰk-meansͷ࣮ͷߴԽΛ௨ͯ͠ɺͦͷಛੑΛཧղ͢Δ  13
ۙࣅۙ୳ࡧͱΫϥελϦϯάٕज़
3. GoݴޠͰk-meansΛ࣮͢Δ
• k-means ɺڭࢣͳֶ͠शͷҰछͰ͋ΓɺσʔλΛ  ݸͷΫϥελʹׂ͢Δ ΫϥελϦϯάख๏ • ֤Ϋϥελɺͦͷத৺ʢηϯτϩΠυʣΛ࣋ͪɺσʔλ࠷͍ۙηϯτ ϩΠυʹׂΓͯΒΕΔɻ •
ΞϧΰϦζϜɺσʔλͷׂΓͯͱηϯτϩΠυͷߋ৽Λ܁Γฦ͠ɺऩଋ ͢Δ·Ͱ࣮ߦ͞ΕΔɻ K  15 k-means → → ⋯
• ηϯτϩΠυͷσʔλͷׂΓͯͱηϯτϩΠυͷߋ৽ • શσʔλʹରͯ͠ݱࡏͷ֤ηϯτϩΠυͱͷڑΛܭࢉ •  • ࠷͍ۙηϯτϩΠυͷΫϥελ͕͔ΔͷͰɺΫϥελ͝ͱʹσʔλΛ ͠ࠐΉ •
શσʔλͷܭࢉޙʹΫϥελ͝ͱͷσʔλͰ͠ࠐΜͩσʔλΛׂΔ͜ͱ Ͱ৽͍͠ηϯτϩΠυΛಘΔ N × K × D  16 ૉͳ࣮
• ηϯτϩΠυͷσʔλͷׂΓͯͱηϯτϩΠυͷߋ৽  17 ૉͳ࣮
• ઢܗϥΠϒϥϦBLASʢBasic Linear Algebra SubprogramsʣΛར༻͢Δ GonumΛ͏͜ͱͰޮతͳܭࢉ • ϚϧνεϨουSIMDΛۦͯ͠ߦྻܭࢉΛߴԽͯ͘͠ΕΔ • ͨͩ͠ߦྻܗࣜͰҰׅͰॲཧ
͢ΔͨΊϝϞϦͷ༻ྔ େ͖͍ɻ ·ͨΦʔόʔϔουଘࡏ ͢Δʢͣʣ  18 ߴͳ࣮ 9 ⽷⽹ ⎢ ⎥ ⎢ ⎥ ⽸⽺ $ ⽷⽹ ⽸⽺ 9$5 ⽷⽹ ⎢ ⎥ ⎢ ⎥ ⽸⽺
• શσʔλʹରͯ͠ݱࡏͷ֤ηϯτϩΠυͱͷڑΛܭࢉΛҰׅͰΔ •  • ͨͩ͠ɺ  Ͱɺ֤ߦͷฏํϢʔΫϦουڑʢXCͷ Ճࢉ֤ྻɾ֤ߦͷ܁Γฦ͠ͱͯ͠ߟ͑Δʣ •
 Λ࠶ར༻Ͱ͖Δͷ͕خ͍͠ • ηϯτϩΠυͷߋ৽ •  ɻͨͩ͠  ֤σʔλ͕ͲͷΫϥελʹ ଐ͢Δ͔Λදݱ͢Δߦྻ ∥X − C∥2 2 = ∥X∥2 2 − 2XC⊤ + ∥C∥2 2 ∥X∥2 2 ∈ RN,∥C∥2 2 ∈ RK ∥X∥2 2 C = (diag(E⊤E))−1E⊤X E ∈ RN×K  19 ߴͳ࣮
• Ͱ͖Δ͚ͩGonumΛͬͯߦྻϕΫτϧ୯ҐͰॲཧ • গͳ͘ͱίʔυ্  ʹରԠ͢Δ܁Γฦ͠ফ͑ͨ D  20 ߴͳ࣮
͜ͷลҰׅͰ͏ ·͘Γ͔ͨͬͨ
4. ධՁ
• Gonum࣮ͷk-meansΛ࣮ߦͯ͠୯ ७ͳΫϥελϦϯά͕͏·͍͘͘͜ͱ Λ֬ೝ • x͕ηϯτϩΠυ • ఆ͢ΔΫϥελͷσʔλΛ༧ଌ  22
ՄࢹԽ
• 10,000ݸͷσʔλϙΠϯτʹ͍ͭͯɺ2࣍ݩͱ1024࣍ݩͷσʔλΛ4Ϋϥελ ʹྨ͢ΔࡍͷɺॳظԽʢk-means++ʣɾҰճ͋ͨΓͷߋ৽ɾॳظԽΛؚΊ ͨऩଋ·Ͱͷֶशʹ͍ͭͯ؆қͳൺֱΛ࣮ࢪͨ͠  23 ϕϯνϚʔΫʢ1/2ʣ # 2࣍ݩʢ֤ΧςΰϦʹ্͓͍ͯஈ͕φΠʔϒ࣮ɺԼஈ͕Gonum࣮ʣ ##
ॳظԽ BenchmarkNaiveKMeansClusters4Datapoints10000Features2InitKMeansPlusPlus-11 10000 1157458 ns/op 409769 B/op 8 allocs/op BenchmarkLinearAlgebraKMeansClusters4Datapoints10000Features2InitKMeansPlusPlus-11 8242 1535928 ns/op 2748619 B/op 11009 allocs/op ## Ұճ͋ͨΓͷߋ৽ BenchmarkNaiveKMeansClusters4Datapoints10000Features2Iter1-11 9415 1285607 ns/op 192 B/op 6 allocs/op BenchmarkLinearAlgebraKMeansClusters4Datapoints10000Features2Iter1-11 10000 1157688 ns/op 2364080 B/op 10357 allocs/op ## ऩଋ·Ͱͷֶश BenchmarkNaiveKMeansClusters4Datapoints10000Features2InitKMeansPlusPlusTol1e6-11 1606 6893775 ns/op 410593 B/op 33 allocs/op BenchmarkLinearAlgebraKMeansClusters4Datapoints10000Features2InitKMeansPlusPlusTol1e6-11 1900 6157529 ns/op 7579239 B/op 13279 allocs/op • ࣍ݩͷΫϥελϦϯάͰɺ͍ͣΕGonumΛར༻͢Δ͜ͱͰͷมԽݟ ΒΕͳ͍ɻҰํͰϝϞϦ༻ྔ૿Ճ͢Δ
• 10,000ݸͷσʔλϙΠϯτʹ͍ͭͯɺ2࣍ݩͱ1024࣍ݩͷσʔλΛ4Ϋϥελ ʹྨ͢ΔࡍͷɺॳظԽʢk-means++ʣɾҰճ͋ͨΓͷߋ৽ɾॳظԽΛؚΊ ͨऩଋ·Ͱͷֶशʹ͍ͭͯ؆қͳൺֱΛ࣮ࢪͨ͠  24 ϕϯνϚʔΫʢ2/2ʣ # 1024࣍ݩʢ֤ΧςΰϦʹ্͓͍ͯஈ͕φΠʔϒ࣮ɺԼஈ͕Gonum࣮ʣ ##
ॳظԽ BenchmarkNaiveKMeansClusters4Datapoints10000Features1024InitKMeansPlusPlus-11 22 502432970 ns/op 409768 B/op 8 allocs/op BenchmarkLinearAlgebraKMeansClusters4Datapoints10000Features1024InitKMeansPlusPlus-11 720 16233131 ns/op 2499008 B/op 11002 allocs/op ## Ұճ͋ͨΓͷߋ৽ BenchmarkNaiveKMeansClusters4Datapoints10000Features1024Iter1-11 16 639431708 ns/op 32896 B/op 6 allocs/op BenchmarkLinearAlgebraKMeansClusters4Datapoints10000Features1024Iter1-11 639 18590832 ns/op 1932983 B/op 10380 allocs/op ## ऩଋ·Ͱͷֶश BenchmarkNaiveKMeansClusters4Datapoints10000Features1024InitKMeansPlusPlusTol1e6-11 5 2858952383 ns/op 528193 B/op 29 allocs/op BenchmarkLinearAlgebraKMeansClusters4Datapoints10000Features1024InitKMeansPlusPlusTol1e6-11 249 48778582 ns/op 4435664 B/op 12892 allocs/op • ߴ࣍ݩͷΫϥελϦϯάͰɺ࣍ݩʹൺͯφΠʔϒͳ࣮100ഒɺ GonumͰ10ഒఔͷมԽͰ͋ΓɺGonum࣮ͷ༏Ґੑ͕ग़ͨɻ
5. ·ͱΊ
• ࣮༻తͳϕΫτϧݕࡧΤϯδϯΛࢧ͑ΔΫϥελϦϯάٕज़ʹண͠ɺGoݴ ޠͰͷ࣮Λ௨ͯ͠ɺͦͷಛੑΛཧղͨ͠ • ߴԽσʔλαΠζͷݮͷͨΊͷΞϧΰϦζϜΛલఏͱͯ͠ɺ࣮ʹΑͬ ͯɺʹ͕ࠩग़Δ͜ͱ͕Θ͔ͬͨ • ࣍ݩͰߴԽ࣮ͷΦʔόʔϔου͕ߴԽΛଧͪফ͢Մೳੑ͕͋Γɺ ϝϞϦ༻ྔͳͲͷ؍͔Βɺಛʹ࣍ݩద༻͕ՄೳͳPQͳͲͰφΠʔϒ ͳ࣮ͱͷ͍͚༗༻͔͠Εͳ͍͜ͱࣔࠦ͞Εͨ
• ࠓޙPQ͔ΒϕΫτϧݕࡧΤϯδϯͷ։ൃਐΊ͍ͯ͘ • Go ࡾ  26 ·ͱΊ
None