Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Go言語でMac GPUプログラミング
Search
monochromegane
December 18, 2023
Programming
1
630
Go言語でMac GPUプログラミング
2023.12.18 Fukuoka.go#19 Reboot
https://fukuokago.connpass.com/event/302717/
monochromegane
December 18, 2023
Tweet
Share
More Decks by monochromegane
See All by monochromegane
なめらかなシステムと運用維持の終わらぬ未来 / dicomo2025_coherently_fittable_system
monochromegane
0
780
ベクトル検索システムの気持ち
monochromegane
34
11k
Go言語での実装を通して学ぶ、高速なベクトル検索を支えるクラスタリング技術/fukuokago-kmeans
monochromegane
1
190
Go言語でターミナルフレンドリーなAIコマンド、afaを作った/fukuokago20_afa
monochromegane
2
260
多様かつ継続的に変化する環境に適応する情報システム/thesis-defense-presentation
monochromegane
1
930
Online Nonstationary and Nonlinear Bandits with Recursive Weighted Gaussian Process
monochromegane
0
570
AIを前提とした体験の実現に向けて/toward_ai_based_experiences
monochromegane
2
970
Contextual and Nonstationary Multi-armed Bandits Using the Linear Gaussian State Space Model for the Meta-Recommender System
monochromegane
1
1.1k
迅速な学習機構を用いて逐次適応性を損なうことなく非線形性を扱う文脈付き多腕バンディット手法/extreme_neural_linear_bandits
monochromegane
0
2.3k
Other Decks in Programming
See All in Programming
Modern Angular with Signals and Signal Store:New Rules for Your Architecture @enterJS Advanced Angular Day 2025
manfredsteyer
PRO
0
240
Agentic Coding: The Future of Software Development with Agents
mitsuhiko
0
120
#QiitaBash MCPのセキュリティ
ryosukedtomita
1
1.5k
Rails Frontend Evolution: It Was a Setup All Along
skryukov
0
240
AI時代のソフトウェア開発を考える(2025/07版) / Agentic Software Engineering Findy 2025-07 Edition
twada
PRO
97
34k
AI駆動のマルチエージェントによる業務フロー自動化の設計と実践
h_okkah
0
210
Goで作る、開発・CI環境
sin392
0
260
Claude Code + Container Use と Cursor で作る ローカル並列開発環境のススメ / ccc local dev
kaelaela
12
6.7k
PipeCDのプラグイン化で目指すところ
warashi
1
290
なんとなくわかった気になるブロックテーマ入門/contents.nagoya 2025 6.28
chiilog
1
280
生成AI時代のコンポーネントライブラリの作り方
touyou
1
260
Git Sync を超える!OSS で実現する CDK Pull 型デプロイ / Deploying CDK with PipeCD in Pull-style
tkikuc
4
240
Featured
See All Featured
Stop Working from a Prison Cell
hatefulcrawdad
271
21k
Building Adaptive Systems
keathley
43
2.7k
Automating Front-end Workflow
addyosmani
1370
200k
VelocityConf: Rendering Performance Case Studies
addyosmani
332
24k
Building Flexible Design Systems
yeseniaperezcruz
328
39k
Embracing the Ebb and Flow
colly
86
4.7k
Rebuilding a faster, lazier Slack
samanthasiow
83
9.1k
The World Runs on Bad Software
bkeepers
PRO
69
11k
Chrome DevTools: State of the Union 2024 - Debugging React & Beyond
addyosmani
7
740
GitHub's CSS Performance
jonrohan
1031
460k
Building a Modern Day E-commerce SEO Strategy
aleyda
42
7.4k
No one is an island. Learnings from fostering a developers community.
thoeni
21
3.4k
Transcript
Lightning Talks ࡾ༔հ / Pepabo R&D Institute, GMO Pepabo, Inc.
2023.12.18 Fukuoka.go#19 Reboot GoݴޠͰMac GPUϓϩάϥϛϯά
ϓϦϯγύϧΤϯδχΞ ࡾ ༔հ / @monochromegane 2 https://blog.monochromegane.com Yusuke Miyake ϖύϘݚڀॴ
ݚڀһ
• ଟมྔਖ਼ن ʹै͏ཚੜʹ͕͔͔࣌ؒΔ • ͜ͷཚੜͷखॱʢͷҰͭʣҎԼͷ௨Γ 1. ֤ཁૉ͕ඪ४ਖ਼نʹै͏ཚ ΛಘΔ 2. ڞࢄߦྻ
ΛίϨεΩʔղʢ ʣͯ͠ࡾ֯ߦྻ ΛಘΔ 3. ΛٻΊΔ • ಛʹɺ֬ͷύϥϝʔλʢ ͱ ʣ͕ҟͳͬͨΓɺ࣍ݩ ͕େ͖͍ ߹ʹɺཚੜʹ͕͔͔࣌ؒͬͯ͠·͏ y ∼ 𝒩 (μ, Σ), μ ∈ ℝD, Σ ∈ ℝD×D z = {zi }1≤i≤D , zi ∼ 𝒩 (0,1) Σ Σ = LL⊤ L y = μ + Lz μ Σ D 3 ͡Ίʹ
• ߦྻܭࢉಠཱ͔ͭฒߦͨ͠λεΫΛଟؚ͘ΉͨΊɺߴԽʹฒྻԽ͕༗ޮ • ͢ͳΘͪɺSIMDɺCPUͷϚϧνίΞɺGPUͳͲʹΑΔฒྻԽ • CPUόϯυͰλεΫཻখ͍͞ͷͰgoroutine͔ͳ͍ʢͱࢥ͏ʣ • GoݴޠͰͷߦྻܭࢉϥΠϒϥϦGonumCPUͷϚϧνίΞΛαϙʔτ͢Δ BLASͷόΠϯσΟϯάΛఏڙ͍ͯ͠Δ •
Apple silicon (M1) ʹGPU͕ࡌ͞Ε͍ͯΔͷͰɺͦͪΒ׆༻͍ͨ͠ 4 ͡Ίʹ
• GPUͷΞΫηεΛఏڙ͢ΔOSඪ४ࡌͷϑϨʔϜϫʔΫ • άϥϑΟοΫεॲཧҎ֎ʹɺGPU্Ͱͷฒྻܭࢉॲཧѻ͑Δ • Objective-C·ͨSwift͔ΒɺGPU্ͷॲཧΛهड़ͨ͠γΣʔμʔؔΛݺͿ • γΣʔμʔؔC++ϕʔεͷMetal Shader Language
(MSL) Ͱهड़ • Metal Performance Shaders (MPS) ͱ͍͏γΣʔμʔؔ܈ఏڙ͞ΕΔ 5 Metal: MacͰGPUϓϩάϥϛϯά
6 Metal: MacͰGPUϓϩάϥϛϯά • جຊతͳྲྀΕɺσόΠεʢGPUʣͷίϚϯυΩϡʔʹର͠ɺίϚϯυόο ϑΝͱ͍͏୯ҐͰγΣʔμʔؔΛొ͠ɺ݁ՌΛड͚औΔͱ͍͏ͷ • ͳ͓ɺCPUͱGPUͷͷΓऔΓʹઐ༻ͷόοϑΝ͕༻ҙ͞Ε͍ͯΔ ίϚϯυΩϡʔ ͷ४උ
ΓऔΓ༻ͷ όοϑΝͷ४උ όοϑΝͷσʔλ͔Β ߦྻΠϯελϯεੜ .14ͷγΣʔμʔؔ ΛॳظԽɺίϚϯυ όοϑΝͱͯ͠Τϯ ίʔυɺΩϡʔʹొ όοϑΝ͔Β݁Ռͷड ͚औΓ 0CKFDUJW$Ͱͷ ࣮ྫ
• Goݴޠ͔ΒcgoΛ͑͜ΕΒͷObjective-CͷίʔυΛݺΔ༷ࢠ • https://github.com/a-h/gpu ϥΠϒϥϦͱͯ͠ར༻Ͱ͖Δ͕MPSʹରԠ͍ͯ͠ͳ͍ • https://github.com/mikecvet/go-mm MPSͷݺͼग़͠Λ࣮͍ͯ͠Δ͕ϕϯνϚʔΫͷίʔυͷΈ • ্هΛࢀߟʹͭͭ͠ɺGoݴޠ্ͰͷGPUΛ༻͍ͨଟมྔਖ਼نʹै͏ཚ
ੜ͕Ͱ͖ͦ͏ 7 Cgo: GoݴޠͰMac GPUϓϩάϥϛϯά
1. Objective-CͷϔομϑΝΠϧΛinclude͠ɺLDFLAGSʹMetalϑϨʔϜϫʔΫΛࢦఆ͢Δ 2. ʢඞཁʹԠͯ͡ʣࣗલͷγΣʔμʔؔΛgo:embedͰΈࠐΜͰ͓͘ 3. C.xxͱͯ͠Objective-CͰهड़ͨ͠ॳظԽγΣʔμʔؔΛ࣮ߦ͢ΔؔΛݺͿɻ࣮ߦ࣌ ͷύϥϝʔλ݁ՌunsafeύοέʔδΛͬͯΞΫηεɻ 8 Cgo: GoݴޠͰMac
GPUϓϩάϥϛϯά (PͰͷ࣮ྫ
• Goݴޠ্ͰͷGPUΛ༻͍ͨଟมྔਖ਼نʹै͏ཚੜ • Goͷίʔυ͔Β ΛcgoΛܦ༝ͯ͠Objective-Cͷؔʹ͢ • MPSͷMPSMatrixDecompositionCholeskyΛ༻͍ͯίϨεΩʔղ • ࣗલγΣʔμʔؔΛ༻͍ͯԼࡾ֯ߦྻҎ֎Λ0ʹຒΊΔ •
MPSͷMPSMatrixVectorMultiplicationΛ༻͍ͯ Λܭࢉ • MPSͷMPSMatrixSumΛ༻͍ͯ Λܭࢉ • GoͷίʔυͰ݁ՌΛड͚औΔ z, μ, Σ Lz μ + Lz 9 Cgo: GoݴޠͰMac GPUϓϩάϥϛϯά
• GonumͱMetal࣮ͷ࣮ߦΛൺֱʢ1000࣍ݩʣ 10 ඪ४ਖ਼نཚͷมͷൺֱ BenchmarkTransformNormMetal-8 9 117668310 ns/op BenchmarkTransformNormGonumBLAS-8 55
21494668 ns/op BenchmarkTransformNormGonum-8 21 54288034 ns/op BenchmarkTransformNormCholMetal-8 1140 1070094 ns/op BenchmarkTransformNormCholGonumBLAS-8 15124 78960 ns/op BenchmarkTransformNormCholGonum-8 6712 177368 ns/op • ίϨεΩʔղͷ݁ՌΛผ్͢Α͏ʹͨ͠߹ͷൺֱ • MPSͷίϨεΩʔղগ͍͔͠͠Εͳ͍͕ɺͦͷଞͷࠩԿ͔
• ߦྻʢ1000x1000ʣͱߦྻʢ1000x1000ʣͷࢉΛൺֱ 11 ߦྻࢉͷൺֱ BenchmarkMatrixMultipicationMetal-8 494 2222134 ns/op BenchmarkMatrixMultipicationGonumBLAS-8 60
22063894 ns/op BenchmarkMatrixMultipicationGonum-8 19 59507228 ns/op BenchmarkMatrixVectorMultipicationMetal-8 1497 792244 ns/op BenchmarkMatrixVectorMultipicationGonumBLAS-8 10000 114843 ns/op BenchmarkMatrixVectorMultipicationGonum-8 972 1239177 ns/op • ߦྻʢ1000x1000ʣͱϕΫτϧʢ1000x1ʣͷࢉΛൺֱ • ߦྻಉ࢜ͷΑ͏ͳܭࢉྔͰGPUͷํ͕ߴɻ ͷΑ͏ͳߦྻͱϕΫτϧͷࢉͰ͜ ͷ࣍ݩʹ͓͍ͯGPUҠৡͷΦʔόʔϔουͷํ͕େ͖͔ͬͨͱߟ͑ΒΕΔ Lz
• GoݴޠͰMac GPUϓϩάϥϛϯά͢Δํ๏Λհͨ͠ • ؆қతͳͷൺֱධՁΛ௨ͯ͠ɺ͍ॴͷഽײΛಘΔ͜ͱ͕Ͱ͖ͨ • MPSͷϚχϡΞϧΛಡΉͱχϡʔϥϧωοτϫʔΫͷαϙʔτ͋ΓɺΞΠ σΟΞ࣍ୈͰ໘ന͍͜ͱ͕Ͱ͖ͦ͏ • MPSͷݺͼग़͠ՄೳͳϥΠϒϥϦΛ࡞ͬͯΈ͍ͨ
• Ͳ͏ϝϞϦϦʔΫͯͦ͠͏ͳͷͰͦͷลΓվળ͍ͨ͠ • ࡾGo 12 ·ͱΊ
None