Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
cl-waffe2
Search
hikettei
September 25, 2023
Programming
0
200
cl-waffe2
2023/09/28日 Lisp Meetup発表用の資料です
hikettei
September 25, 2023
Tweet
Share
More Decks by hikettei
See All by hikettei
2024_1_17_ローカルLLMに向き合う会_LT会発表資料
hikettei
0
190
Other Decks in Programming
See All in Programming
KawaiiLT 登壇資料 キャリアとモチベーション
hiiragi
0
160
20250429 - CNTUG Meetup #67 / DevOps Taiwan Meetup #69 - Deep Dive into Tetragon: Building Runtime Security and Observability with eBPF
tico88612
0
160
カオスに立ち向かう小規模チームの装備の選択〜フルスタックTSという装備の強み _ 弱み〜/Choosing equipment for a small team facing chaos ~ Strengths and weaknesses of full-stack TS~
bitkey
1
130
RubyKaigi Dev Meeting 2025
tenderlove
1
1.3k
音声プラットフォームのアーキテクチャ変遷から学ぶ、クラウドネイティブなバッチ処理 (20250422_CNDS2025_Batch_Architecture)
thousanda
0
380
ASP.NETアプリケーションのモダナイゼーションについて
tomokusaba
0
240
RuboCop: Modularity and AST Insights
koic
2
2.3k
Qiita Bash
mercury_dev0517
2
220
設計の本質:コード、システム、そして組織へ / The Essence of Design: To Code, Systems, and Organizations
nrslib
10
3.7k
インプロセスQAにおいて大事にしていること / In-process QA Meetup
medley
0
130
監視 やばい
syossan27
12
10k
Bedrock×MCPで社内ブログ執筆文化を育てたい!
har1101
7
1.4k
Featured
See All Featured
[RailsConf 2023] Rails as a piece of cake
palkan
54
5.5k
Fight the Zombie Pattern Library - RWD Summit 2016
marcelosomers
233
17k
Music & Morning Musume
bryan
47
6.5k
Being A Developer After 40
akosma
91
590k
jQuery: Nuts, Bolts and Bling
dougneiner
63
7.7k
Raft: Consensus for Rubyists
vanstee
137
6.9k
Why You Should Never Use an ORM
jnunemaker
PRO
56
9.3k
Cheating the UX When There Is Nothing More to Optimize - PixelPioneers
stephaniewalter
280
13k
Making Projects Easy
brettharned
116
6.2k
Fireside Chat
paigeccino
37
3.4k
We Have a Design System, Now What?
morganepeng
52
7.5k
Chrome DevTools: State of the Union 2024 - Debugging React & Beyond
addyosmani
5
590
Transcript
Github: https://github.com/hikettei/cl-wa ff e2 Common Lisp Programmable Deep Learning Framework
Lisp Meetup(2023/09/28) hikettei ϑΥϩʔͯ͠>< @hikettei @ichndm @hikettei
Common LispͰσʔλαΠΤϯεʁ ັྗతͳϥΠϒϥϦ܈ ਂֶश = େنͰਫ਼ͳߦྻԋࢉ ɾɾɾ →ʹൺΔͱ͋Μ·Γ׆ൃ͡Όͳ͍
Common LispͰσʔλαΠΤϯεʁ ັྗతͳϥΠϒϥϦ܈ ਂֶश = େنͰਫ਼ͳߦྻԋࢉ ɾɾɾ →ʹൺΔͱ͋Μ·Γ׆ൃ͡Όͳ͍ ฒྻϓϩάϥϛϯά SIMD
CUDA খͷܭࢉɾɾɾ →େϥΠϒϥϦΛհ͢Δ
Common LispͰσʔλαΠΤϯεʁ ັྗతͳϥΠϒϥϦ܈ ਂֶश = େنͰਫ਼ͳߦྻԋࢉ ɾɾɾ →ʹൺΔͱ͋Μ·Γ׆ൃ͡Όͳ͍ ฒྻϓϩάϥϛϯά SIMD
CUDA খͷܭࢉɾɾɾ →େϥΠϒϥϦΛհ͢Δ PythonͳΒNumpy JuliaͳΒAbstractArrays CLք۾ → simple-array? ΈΜͳ͕͏ߦྻͷσʔλܕ ͦͦϥΠϒϥϦ։ൃͷ͕ͳ͍ɾɾɾ
Common LispͰσʔλαΠΤϯεʁ ັྗతͳϥΠϒϥϦ܈ ਂֶश = େنͰਫ਼ͳߦྻԋࢉ ɾɾɾ →ʹൺΔͱ͋Μ·Γ׆ൃ͡Όͳ͍ ฒྻϓϩάϥϛϯά SIMD
CUDA খͷܭࢉɾɾɾ →େϥΠϒϥϦΛհ͢Δ PythonͳΒNumpy JuliaͳΒAbstractArrays CLք۾ → simple-array? ΈΜͳ͕͏ߦྻͷσʔλܕ ͦͦϥΠϒϥϦ։ൃͷ͕ͳ͍ɾɾɾ ANSI Common Lispͷྻ͡Όݶք GPUb fl oat16ରԠʹ͢Δͱ0͔Βॻ͖͢͜ͱʹͳΔʁ →Ͳ͏͍ͬͨϥΠϒϥϦ͕ඞཁʹͳΔ͔ɾɾɾ
Common LispͰσʔλαΠΤϯεʁ Solution: நతͳTensorͱάϥϑͷϥΠϒϥϦΛ࡞ͬͪΌ͓͏ - ֦ு: શͯͷػೳΛϢʔβʔ֦ுՄೳʹ - ࢄ: CUDA/Metal…ͷόοΫΤϯυ͕ཉ͍͠ʁ
→ ίϛϡχςΟʹͤΑ͏ - : ୭Ͱ࠷খݶͷίʔυͰ֦ு͕ॻ͚ΔΑ͏ʹ͠Α͏ - cl-wa ff e2 = நϊʔυ(AbstractTensor)ͱநςϯιϧ(AbstractTensor)ͰԆධՁΛ͠ͳ͕Βܭࢉ͢ΔΈ - Petalisp/Numcl/Numericals/MGL-MAT/LLA/magiCLͷࢿ࢈ … Ҿ͖ܧ͛Δ - ίϯύΫτ: ͨͬͨ20000ߦͷCommon LispίʔυͰಈ࡞
֓ཁ[cl-wa ff e2] ਂֶशͷͨΊͷܭࢉநԽϥΠϒϥϦͱϑϨʔϜϫʔΫ - ਂֶशϑϨʔϜϫʔΫ = ͜͏͍͏ػೳ - ࠓͷ݄͘Β͍͔Βॻ͖࢝Ίͨݸਓ։ൃͷϓϩδΣΫτ
- Common LispʹϞμϯͳਂֶशڥΛ࣋ͬͯ͘Δ͜ͱΛඪ (WIP) - ςϯιϧͷԋࢉ(AbstractTensor), άϥϑॲཧ(AbstractNode) - JITίϯύΠϥ, VM, Symbolic Di ff erentiation, ֬ͷߴۙࣅ - ඪ४࣮(Ϟσϧ, ׆ੑԽؔ, ଛࣦؔ ࠷దԽؔ etc…) - Tape Based Reverse Mode Automatic Di ff erentiation ࣗಈඍ - ࠓ͢ϝΠϯ: AbstractTensor/NodeͱͦͷίϯύΠϥ Tape Based Reverse Mode Automatic Di ff erentiation - ࣅͨΑ͏ͳ: PyTorch MGL Petalisp Aesera(a fork of Theano) ɾɾɾ - DAG(ඇ८ճ༗άϥϑ)ઐ͚ͩͲ… - Numpy LikeͳߦྻԋࢉϥΠϒϥϦͱͯ͠͏͜ͱՄೳ - ֶઐ༻ͷϓϩάϥϛϯάݴޠΛ࡞Δͩͱࢥͬͯฉ͍͍ͯͩ͘͞
جຊతͳ͍ํ(1/4) - σʔλߏ AbstractTensorΛ࡞͢Δ - ػցֶशͰΑ͘༻͍Δ֬ͷߴαϯϓϦϯά͕ඪ४࣮ - randn … Ziggurat๏Λ༻͍ͯΨεΛαϯϓϦϯά
- ࡞ͬͨΒଈׂΓͯ: Allocationͷίετ͕͏
جຊతͳ͍ํ(2/4) - ԆධՁςϯιϧͷ࡞ - ࡞Γ์: 0ίετͰ࡞ΕΔͷͰͨ͘͞Μ࡞ͬͯOK - ԆධՁ: ίϯύΠϧ͢Δ͔ετϨʔδʹΞΫηε͠Α͏ͱͨ͠ॠؒAllocation͞ΕΔ -
ܗঢ়ʹSymbolΛࢦఆ͠ɺޙ͔Βมߋ͢Δ͜ͱ͕Ͱ͖Δɻ - ໊લΛ͚ͭΔ͜ͱ͕Ͱ͖Δ NILͰGensym - make-inputͰInputTensorΛ࡞Ͱ͖Δ
جຊతͳ͍ํ(3/4) - ܭࢉϊʔυͷߏங - forwardए͘͠callϝιουΛ༻͍ͯϊʔυͷॱΛݺͼग़͢ - AbstractNodeԆධՁ: άϥϑΛίϯύΠϧ͠ͳ͍ͱ࣮ߦͰ͖ͳ͍ - ςϯιϧಉ࢜Ͱܭࢉͨ݁͠ՌΛ֨ೲ͢Δςϯιϧݪଇmake-inputͰ࡞͢Δ(ޙड़)
جຊతͳ͍ํ(4/4) - ίϯύΠϧ - Loopͷ࠷దԽ άϥϑͷ࠷దԽ ΦϑηοτܭࢉͷΠϯϥΠϯԽ ฒྻԽͷScheduling etc… -
(build ऴͷTensor)ͰίϯύΠϧΛ࣮ߦ - (proceed tensor)ͰίϯύΠϧ࣮ͯ͠ߦ ͦͷޙܭࢉϊʔυΛଓͰ͖Δ - In-placeԋࢉͷએݴ༻͍ΔTensorʹҕͶΔ͜ͱͰ100%ࣗಈఆ
Key Concepts 1. Runtime Code Generation - ࠷దԽ͞Εͨ(loop for …)ͷίʔυΛࣗಈੜੜͰ͖Δ
- CFFIͳͲΛհͯ͠֎෦ͷϥΠϒϥϦͱInteroperation͕͍͢͠ - ҰίϯύΠϧͨ͠ΒҎ߱Ωϟογϡ͞ΕΔ 2. High Level IR - શͯͷԋࢉԆධՁ͞ΕΔ - AbstractNodeΫϥεΛ༻͍ͯܭࢉϊʔυΛදݱ -> Ұ࣍ݩͷIR (Wengert List)ʹίϯύΠϧ࣮ͯ͠ߦ - ೖྗͱҰ࣌ྖҬͷTensorΛ໌ࣔతʹએݴ -> In-PlaceͳԋࢉΛશͯࣗಈఆ͢Δ - άϥϑϨϕϧͷ࠷దԽ (ϝϞϦہॴੑ ࢬמΓ ͳͲɾɾɾ) 3. Elegant User Interface - Numpy Likeͳ͍͍͢API x Common Lispͷϝλϓϩάϥϛϯά - REPLۦಈ։ൃͰσόοά͕͍͢͠ - શͯͷػೳΛϢʔβʔ֦ுՄೳʹ͢Δ - நͷߴ͍ઃܭ
Ϟνϕʔγϣϯ(1/4) 1. ंྠͷ࠶ൃ໌ͨ͘͠ͳ͍ - Q. cl-wa ff e2ͷࣄʁ - oneDNN
GGML cuDNN ͳͲɾɾɾ σόΠεಛԽͰߴͳϑϨʔϜϫʔΫ͕ͨ͘͞Μଘࡏ͢Δ - ֎෦ͷϥΠϒϥϦΛͨ͘͞ΜཔΔ - 1. CPUʹGPUʹґଘ͠ͳ͍நతͳσʔλܕAbstractTensorΛఆٛ͢Δ - 2. CFFIΛհͯ͠֎෦ϥΠϒϥϦΛݺͼग़͢ A. গͳ͍هड़ྔͰ֦ுΛॻ͖ ಡΈ͍͢ڞ௨ͷAPIͰ REPLͱϝλϓϩΛੜ͔ͯ͠ APIΛߴʹݺͼग़͢ ͜͜ͷڑΛ͘
Ϟνϕʔγϣϯ(2/4) 2. REPL x ԆධՁ - ԆධՁ ໋ྩ(add, sin, matmulͳͲɾɾɾ)Λ࣮ߦͯͦ͠ͷͰ࣮ߦ͞Εͳ͍ɻ
(i.e.: cl-wa ff e2 = DSL) - DSLΛίϯύΠϧ + άϥϑͷ࠷దԽ → ੍ͷগͳ͍ԾϚγϯͰಈ࡞ (IR=Extended Wengert ListΛ༻͍Δ) - Extended Wengert List: ࣗಈඍͷཧͰΑ͘༻͍Δσʔλߏ ଟΛฦͤΔ ɹop=lambdaؔ ޙड़͢ΔϚΫϩͰࣗ༝ʹ࡞ͬͯΒ͏ out = ax+b - ଞϥΠϒϥϦͱͷࠩҟ: ͕͑ཉ͍࣌͠ʹ(proceed tensor)ͰғΉ͚ͩ
Ϟνϕʔγϣϯ(2/4) ɹop=lambdaؔ ޙड़͢ΔϚΫϩͰࣗ༝ʹ࡞ͬͯΒ͏ proceed(out = ax+b) 2. REPL x ԆධՁ
- ԆධՁ ໋ྩ(add, sin, matmulͳͲɾɾɾ)Λ࣮ߦͯͦ͠ͷͰ࣮ߦ͞Εͳ͍ɻ (i.e.: cl-wa ff e2 = DSL) - DSLΛίϯύΠϧ + άϥϑͷ࠷దԽ → ੍ͷগͳ͍ԾϚγϯͰಈ࡞ (IR=Extended Wengert ListΛ༻͍Δ) - Extended Wengert List: ࣗಈඍͷཧͰΑ͘༻͍Δσʔλߏ ଟΛฦͤΔ - ଞϥΠϒϥϦͱͷࠩҟ: ͕͑ཉ͍࣌͠ʹ(proceed tensor)ͰғΉ͚ͩ
Ϟνϕʔγϣϯ(2/4) ɹop=lambdaؔ ޙड़͢ΔϚΫϩͰࣗ༝ʹ࡞ͬͯΒ͏ proceed(sum(proceed(ax+b))) 2. REPL x ԆධՁ - ԆධՁ
໋ྩ(add, sin, matmulͳͲɾɾɾ)Λ࣮ߦͯͦ͠ͷͰ࣮ߦ͞Εͳ͍ɻ (i.e.: cl-wa ff e2 = DSL) - DSLΛίϯύΠϧ + άϥϑͷ࠷దԽ → ੍ͷগͳ͍ԾϚγϯͰಈ࡞ (IR=Extended Wengert ListΛ༻͍Δ) - Extended Wengert List: ࣗಈඍͷཧͰΑ͘༻͍Δσʔλߏ ଟΛฦͤΔ - ଞϥΠϒϥϦͱͷࠩҟ: ͕͑ཉ͍࣌͠ʹ(proceed tensor)ͰғΉ͚ͩ
Ϟνϕʔγϣϯ(3/4) CFFI Only / Python ͡ΌͩΊʁ (Ҿ༻: The Deep Learning
Compiler: A Comprehensive Survey https://arxiv.org/pdf/2002.03794.pdf) - CFFI Only - Common LispΛ͏ҙຯ͕ͳ͍ - APIͷ༷มߋͱ͔ʹ͑ΒΕͳ͍ - σόοά͕ΊΜͲ͘ͳΔ - ͳΜͰCommon Lisp(SBCL)? - Loopؔ࿈ͷ࠷దԽ͕Γ͍͢ - ίϯύΠϥΛॻ͔ͳͯ͘ɺϚΫϩΛॻ͚ͩ͘Ͱ͍͍ - (compile nil body) … gccͷ20ഒૣ͍ίϯύΠϧ࣌ؒ - CLͰΔͳΒCLͰಈ͔͢ࢥͷϑϨʔϜϫʔΫʹ͍ͨ͠ - ࢥ: ΊΜͲ͍ڞ௨߲શ෦ϚΫϩʹॻ͔ͤΔ - Common LispΛѪͯ͠Δ
Ϟνϕʔγϣϯ(4/4) ਂֶश༻ͷDSL͕ཉ͍͠ - ϓϩάϥϛϯάݴޠͷ੍Λ͑ͯωοτϫʔΫͷදݱΛ୯७Խ
࣮ [ܭࢉϊʔυͷߏங] (1/15) 1. ܭࢉϊʔυͷߏங 2. Sort + Pruning 3.
ෆཁͳίϐʔͷࠜઈ 4. ϝϞϦہॴੑͷ࠷దԽ 5. ϝϞϦׂΓͯ 6. ٯͷߏங 7. ٯͷAdjointͷ࠷దԽ 8. ઃܭਤͷίϯύΠϧ 9. άϥϑͷॻ͖͑ 10. Ωϟογϡ ܭࢉϊʔυͷجຊ୯Ґ (AbstractNode) 11. ࣮ߦ 1. ͲͷσόΠεͰڞ௨ͷ༷ΛdefnodeͰఆٛ 1ߦͷ:where࣍ͷίʔυͱՁ - Shape ErrorͷAssertionͱɺΤϥʔ༰ͷGenerator (ίϯύΠϧ࣌ʹల։) - ͲͷϙΠϯλ͕In-place͔Λཧʢ࠷దԽʹ༻͍ΒΕΔʣ - ࣍ͷԆධՁςϯιϧΛੜ(next_inputs in Aesera/Theano) - Optional BroadcastingͷϊʔυଆͰͷએݴ - TensorͷϥϯΫΛͱʹLoopͷ࠷దԽ
࣮ [ܭࢉϊʔυͷߏங] (2/15) 1. ܭࢉϊʔυͷߏங 2. Sort + Pruning 3.
ෆཁͳίϐʔͷࠜઈ 4. ϝϞϦہॴੑͷ࠷దԽ 5. ϝϞϦׂΓͯ 6. ٯͷߏங 7. ٯͷAdjointͷ࠷దԽ 8. ઃܭਤͷίϯύΠϧ 9. άϥϑͷॻ͖͑ 10. Ωϟογϡ 11. ࣮ߦ ܭࢉϊʔυͷجຊ୯Ґ (AbstractNode) 2. σόΠε(AbstractTensor)Λ͏ ܧঝؔ: MyTensor << LispTensor << AbstractTensor ↑όοΫΤϯυͷҰཡΛදࣔ͢Δ
࣮ [ܭࢉϊʔυͷߏங] (3/15) 1. ܭࢉϊʔυͷߏங 2. Sort + Pruning 3.
ෆཁͳίϐʔͷࠜઈ 4. ϝϞϦہॴੑͷ࠷దԽ 5. ϝϞϦׂΓͯ 6. ٯͷߏங 7. ٯͷAdjointͷ࠷దԽ 8. ઃܭਤͷίϯύΠϧ 9. άϥϑͷॻ͖͑ 10. Ωϟογϡ 11. ࣮ߦ ܭࢉϊʔυͷجຊ୯Ґ (AbstractNode) 3. AbstractNode(=opΛ࡞͢ΔΫϥε)ͷ࣮Λ࡞͢Δ ԾϚγϯͰ͏໋ྩྻ(Extended Wengert List)λؔͰ໋ྩΛද͢ - de fi ne-impl : SࣜΛ(compile nil body)ͯ͠λؔΛಘΔ - de fi ne-impl-op: λؔΛఆٛͯ͠VMʹͬͯΒ͏ ↑֤σόΠεʹNodeͷ࣮͕ඞཁ 1. call-with-viewʹඞཁͳใΛೖྗ͢ΕࣗಈͰloopΛॻ͍ͯ͘ΕΔ 2. Unrollͱ͔͢ΔͨΊʹdefmacroͱಉ͡ελΠϧͰॻ͘ඞཁ͕͋Δ
࣮ [ܭࢉϊʔυͷߏங] (4/15) 1. ܭࢉϊʔυͷߏங 2. Sort + Pruning 3.
ෆཁͳίϐʔͷࠜઈ 4. ϝϞϦہॴੑͷ࠷దԽ 5. ϝϞϦׂΓͯ 6. ٯͷߏங 7. ٯͷAdjointͷ࠷దԽ 8. ઃܭਤͷίϯύΠϧ 9. άϥϑͷॻ͖͑ 10. Ωϟογϡ 11. ࣮ߦ ܭࢉϊʔυͷߏங/߹ 4. Shape Error - ShapeErrorશͯࣄલݕࠪ - એݴແ͠ͷBroadcastingېࢭ ShapeError(࣮ߦલࣗಈੜ) - ϊʔυ Tensor ྆ํએݴ
࣮ [ܭࢉϊʔυͷߏங] (5/15) 1. ܭࢉϊʔυͷߏங 2. Sort + Pruning 3.
ෆཁͳίϐʔͷࠜઈ 4. ϝϞϦہॴੑͷ࠷దԽ 5. ϝϞϦׂΓͯ 6. ٯͷߏங 7. ٯͷAdjointͷ࠷దԽ 8. ઃܭਤͷίϯύΠϧ 9. άϥϑͷॻ͖͑ 10. Ωϟογϡ 11. ࣮ߦ ܭࢉϊʔυͷߏங/߹ 5. ԆධՁ - forward͔callͰωοτϫʔΫΛଓ - AbstractNode/TensorCLOSΫϥε - ͕͑ඞཁͳՕॴͰ: - build (ϊʔυͷऴ) - proceed (ͦͷ··ϊʔυܨ͛ΕΔ) Λ༻͍ͯίϯύΠϧ/࣮ߦ͢Δ - call->ؔͰෳͷϊʔυΛ߹ - asnodeͰؔΛϞσϧͱͯ͠ѻ͏ ↑νϡʔτϦΞϧ͔ΒҾͬு͖ͬͯͨͷͰͪ͝Όͪ͝Όͯ͠·͕͢ ຊདྷwith-devices͚ͩͰσόΠεͷมߋ͕ՄೳͰ͢
࣮ [άϥϑ࠷దԽฤ] (6/15) 1. ܭࢉϊʔυͷߏங 2. Sort + Pruning 3.
ෆཁͳίϐʔͷࠜઈ 4. ϝϞϦہॴੑͷ࠷దԽ 5. ϝϞϦׂΓͯ 6. ٯͷߏங 7. ٯͷAdjointͷ࠷దԽ 8. ઃܭਤͷίϯύΠϧ 9. άϥϑͷॻ͖͑ 10. Ωϟογϡ 11. ࣮ߦ ߏ → Ұ࣍ݩͷϦετ (ॻ͘͜ͱͳ͍) - Forward/BackwardͷͨΊʹτϙϩδΧϧιʔτ͢Δඞཁ͕͋Δ - (Backward࣌)ޯܭࢉʹඞཁͳํ͔͠ܭࢉ͠ͳ͍ ↑ͷάϥϑ͔Βԫ৭ͷIRͷίϯύΠϥʹ͍ͭͯͷΛࠓ͔Β͢Δ - ͜ΕʹΑͬͯॏෳͨ͠ܭࢉϊʔυͷϧʔτ͕ͳ͘ͳΔ
࣮ [άϥϑ࠷దԽฤ] (7/15) 1. ܭࢉϊʔυͷߏங 2. Sort + Pruning 3.
ෆཁͳίϐʔͷࠜઈ 4. ϝϞϦہॴੑͷ࠷దԽ 5. ϝϞϦׂΓͯ 6. ٯͷߏங 7. ٯͷAdjointͷ࠷దԽ 8. ઃܭਤͷίϯύΠϧ 9. άϥϑͷॻ͖͑ 10. Ωϟογϡ 11. ࣮ߦ In-place mutation Ұ࣌ྖҬͷ࠷খԽ - ExistTensor{ֶशσʔλ ޯ ϞσϧͷॏΈ}ઈରʹഁյతʹܭࢉ͍͚ͯ͠ͳ͍ - ExistTensor͔ΒܭࢉΛ͢Δ͜ͱͰੜ͡Δ݁ՌͷTensor(InputTensor)ഁյతʹͯ͠ྑ͍ !sin(࣮࣭)෭࡞༻͕ͳ͍ؔͱͯ͠ѻ͑Δɻ ഁյత(In-place)ԋࢉ → ܭࢉͷ݁ՌΛࣗʹ֨ೲ͢Δԋࢉ ߴԽʹΊͬͪΌେࣄ ԼͷϧʔϧΛجʹ͢ΕϓϩάϥϚʔಛผͳίʔυΛॻ͔ͳͯ͘ྑ͍: ഁյతʹ͢Δ݅ → IRΛԼ͔ΒḷͬͯҰ൪࠷ޙͷࢀর͚ͩഁյతʹ͢Δ O(n) → ϊʔυΛ߹͢Δͱ͖ॻ͍ͯ͋ΔࣜΛͦͷ··Ҡ͚ͩ͢ͰOK ਂֶशͰܭࢉ్தͷมͷ99%ֶशதʹ؍ଌ͞Εͳ͍ → άϥϑϨϕϧͷ࠷దԽ͕େࣄ (: ֶशதʹ100%؍ଌ͢ΔTensor … ExistTensor. ͦΕҎ֎ͷTensor … InputTensorͷ͍͚) (i.e.: !sin͕ؔͪΌΜͱ࣮͞Ε͍ͯΕɺ͜ͷΈ100%ಈ࡞͢ΔΑ͏ʹɺೋछྨͷTensor͕͋Δ)
࣮ [άϥϑ࠷దԽฤ] (8/15) 1. ܭࢉϊʔυͷߏங 2. Sort + Pruning 3.
ෆཁͳίϐʔͷࠜઈ 4. ϝϞϦہॴੑͷ࠷దԽ 5. ϝϞϦׂΓͯ 6. ٯͷߏங 7. ٯͷAdjointͷ࠷దԽ 8. ઃܭਤͷίϯύΠϧ 9. άϥϑͷॻ͖͑ 10. Ωϟογϡ 11. ࣮ߦ Memory Locality Ұ࣌ྖҬͷ࠶ར༻ InputTensor - ઐΒܭࢉ݁Ռอଘ༻ͷྖҬ ίϯύΠϥ͕ܨ͔͗͑ͯͳ͍ɻ ͜͜Ͱ֬อͨ͠Ұ࣌ྖҬɾɾɾ ͬͪ͜Ͱ͏ Memory Locality ࠷దԽແ͠: 8 Tensors ࠷దԽ͋Γ: 5 Tensors. ҰճͷSoftmax(x)Ͱ5 Tensors
࣮ [άϥϑ࠷దԽฤ] (9/15) 1. ܭࢉϊʔυͷߏங 2. Sort + Pruning 3.
ෆཁͳίϐʔͷࠜઈ 4. ϝϞϦہॴੑͷ࠷దԽ 5. ϝϞϦׂΓͯ 6. ٯͷߏங 7. ٯͷAdjointͷ࠷దԽ 8. ઃܭਤͷίϯύΠϧ 9. άϥϑͷॻ͖͑ 10. Ωϟογϡ 11. ࣮ߦ ϝϞϦׂΓͯ - 1ͭͷCompiled-Compositeʹ͖ͭ1ͭVMAllocationߏମ͕༻ҙ͞ΕΔ - ͜ͷߏମ͕ɺϞσϧͰ༻͍ΔϝϞϦϓʔϧΛཧ͢Δ - ͜ͷϝϞϦϓʔϧ ଞͷϞσϧʹׯব͞Εͳ͍ Thread-Safe - ίϯύΠϧ͞ΕͨϞσϧҰཡΛcl-wa ff e2͕ه͍ͯͯ͠ɺͦͬͪΛgc-reachableʹ͍ͯ͠Δ BuildؔͰίϯύΠϧ
࣮ [Reverse Mode] (10/15) 1. ܭࢉϊʔυͷߏங 2. Sort + Pruning
3. ෆཁͳίϐʔͷࠜઈ 4. ϝϞϦہॴੑͷ࠷దԽ 5. ϝϞϦׂΓͯ 6. ٯͷߏங 7. ٯͷAdjointͷ࠷దԽ 8. ઃܭਤͷίϯύΠϧ 9. άϥϑͷॻ͖͑ 10. Ωϟογϡ 11. ࣮ߦ ࣗಈඍ(Reverse Mode) - ༷: ೋ֊ඍະରԠ Reverse Mode (େ͖͍ߦྻ -> খ͍͞ߦྻʹͳΔ߹ߴ) ↑நϊʔυͷ:backwardɺଞͷநϊʔυͷ:forwardΛΈ߹ΘͤͯಋؔΛදݱ͢Δ ͪΖΜλؔຒΊࠐΈՄೳ
࣮ [Reverse Mode] (11/15) 1. ܭࢉϊʔυͷߏங 2. Sort + Pruning
3. ෆཁͳίϐʔͷࠜઈ 4. ϝϞϦہॴੑͷ࠷దԽ 5. ϝϞϦׂΓͯ 6. ٯͷߏங 7. ٯͷAdjointͷ࠷దԽ 8. ઃܭਤͷίϯύΠϧ 9. άϥϑͷॻ͖͑ 10. Ωϟογϡ 11. ࣮ߦ ϊʔυͷذ - ٯΛInline͔͍ͯͬ͠In-place mutationΛΒͤΔ - Gradients͕Permute͞Εͯٻ·Δ߹࠷దԽؔͷͨΊʹcontiguousʹ͓ͯ͘͠
࣮ [Reverse Mode] (12/15) 1. ܭࢉϊʔυͷߏங 2. Sort + Pruning
3. ෆཁͳίϐʔͷࠜઈ 4. ϝϞϦہॴੑͷ࠷దԽ 5. ϝϞϦׂΓͯ 6. ٯͷߏங 7. ٯͷAdjointͷ࠷దԽ 8. ઃܭਤͷίϯύΠϧ 9. άϥϑͷॻ͖͑ 10. Ωϟογϡ 11. ࣮ߦ call-with-view͔Βλؔ - solve-loop-order: ֎෦ϥΠϒϥϦ͕SIMDϨδελશ෦͑ΔΑ͏ʹ࠷దԽ - Loop Collapse, Loop Reordering(Experimental), Unrolling… - ϚΫϩల։ʹຒΊࠐΉ → call-with-view - ࣮ߦ࣌ʹ͏ → do-compiled-loop
࣮ [Reverse Mode] (13/15) 1. ܭࢉϊʔυͷߏங 2. Sort + Pruning
3. ෆཁͳίϐʔͷࠜઈ 4. ϝϞϦہॴੑͷ࠷దԽ 5. ϝϞϦׂΓͯ 6. ٯͷߏங 7. ٯͷAdjointͷ࠷దԽ 8. ઃܭਤͷίϯύΠϧ 9. άϥϑͷॻ͖͑ 10. Ωϟογϡ 11. ࣮ߦ Symbolic Di f - ͋ΔΈ߹ΘͤͷܭࢉΛɺΑΓ҆ఆͰߴͳͷʹॻ͖͑Δ - log(1+x) ChainRuleͰඍͰ͖ͳ͍ - σόΠεಛԽͷ࠷దԽ (ྫ: Conv2DCPUͳΒoneDNNͰɾɾɾ) - ReLU/GeLUͳΜ໋͔ྩΛݮ͢ΔͨΊʹҰͭͷ໋ྩʹ͍ͨ͠(FusionOps) - Theanoͷ࣮: ίϯύΠϧ͞ΕͨܭࢉϊʔυΛ·ͨτϨεͯ͠ݕࡧ͢Δ Traceͨ͠ܭࢉϊʔυΛޙ͔Βݕࡧ͢Δ
࣮ [Reverse Mode] (13/15) 1. ܭࢉϊʔυͷߏங 2. Sort + Pruning
3. ෆཁͳίϐʔͷࠜઈ 4. ϝϞϦہॴੑͷ࠷దԽ 5. ϝϞϦׂΓͯ 6. ٯͷߏங 7. ٯͷAdjointͷ࠷దԽ 8. ઃܭਤͷίϯύΠϧ 9. άϥϑͷॻ͖͑ 10. Ωϟογϡ 11. ࣮ߦ Symbolic Di ff (ຊ) - cl-wa ff e2ͷ߹ - ͦͦͦΜͳάϥϑΛ࡞Βͳ͚Ε͍͍ (<-> ͩͬͯݕࡧίετॏ͍ͨΜ) - de fi ne-compiler-macroΛ༻͍ͯࣄલʹFusion͓ͯ͘͠ Pattern MatchͰίϯύΠϧ࣌ʹReplace ↑disassemble͢ΔͱFusion͞Ε͍ͯΔͷ͕Θ͔Δ - ίϯύΠϧ࣌ؒʹؚ·Εͳ͍
࣮ [Reverse Mode] (14/15) 1. ܭࢉϊʔυͷߏங 2. Sort + Pruning
3. ෆཁͳίϐʔͷࠜઈ 4. ϝϞϦہॴੑͷ࠷దԽ 5. ϝϞϦׂΓͯ 6. ٯͷߏங 7. ٯͷAdjointͷ࠷దԽ 8. ઃܭਤͷίϯύΠϧ 9. άϥϑͷॻ͖͑ 10. Ωϟογϡ 11. ࣮ߦ Ωϟογϡ - call-with-viewͰੜ͞ΕͨؔΩϟογϡ͞ΕΔ - ೋճͷ(compile nil body)͠ͳ͍ͷͰͲΜͲΜ(build toplevel)ͯ͠OK - 12ͷGPT-2Ͱ0.3sec͘Β͍ - ཧ: ࣮ߦ࣌ʹશͯΩϟογϡ͞ΕΕେ2msҎʹऴΘΔ ࠷దԽޙͷ໋ྩ͕100Ͱ2ms͘Β͍
࣮ [Reverse Mode] (15/15) 1. ܭࢉϊʔυͷߏங 2. Sort + Pruning
3. ෆཁͳίϐʔͷࠜઈ 4. ϝϞϦہॴੑͷ࠷దԽ 5. ϝϞϦׂΓͯ 6. ٯͷߏங 7. ٯͷAdjointͷ࠷దԽ 8. ઃܭਤͷίϯύΠϧ 9. άϥϑͷॻ͖͑ 10. Ωϟογϡ 11. ࣮ߦ Tadah~ - (forward model Ҿ)ͰForward ModeΛ࣮ߦ - (backward model) ͰReverse ModeΛ࣮ߦ ར༻ऀ: defnode+de fi ne-impl+defclass͚ͩͰ͜ͷ࠷దԽ͕શ෦͑Δ
ϕϯνϚʔΫ ݅ - MacBook Pro (13-inch, 2017) 8GB 2.3 GHz
Dual-Core Intel Core i5 - CPU, શͯγϯάϧεϨου - SIMD֦ு໋ྩ=AVX2·Ͱ - MNISTΛࡾMLPͰֶश - SLEEF OpenBLAS Backend - Keras=Tensor fl ow Backend (No AVX2, buildΊΜͲ͔ͬͨ…) ׆ੑԽؔ=ReLU, ࠷దԽؔ=Adam, lr=1e-3) ॳճͷ࣮ߦ=ίϯύΠϧ࣌ؒΛؚΉ ࣮ݧڥ͕దͳͷͰ͋Μ·Γ͋ͯʹ͠ͳ͍Α͏ʹ ଛࣦؔ=CrossEntropy
·ͱΊ 1. Loop࠷దԽ + ֎෦ͷAPI - ΦϑηοτΛ࡞ͯ͠ฒྻԽ/SIMDͷϨδελΛͪΌΜͱ͑Δϧʔτʹमਖ਼ - Ωϟογϡ͞ΕΔͷͰೋճͷίϯύΠϧ࣌ؒZero Cost
2. ڧྗͳάϥϑ࠷దԽ ҎԼͷࡾͭͷઓུͰSoTAੑೳΛࢦ͢ɿ - ͲΜͳॻ͖ํΛͯ͠100% in-placeʹͳΔ 3. Symbolic Di ff (ຊ) - ϝϞϦͷہॴੑͷ࠷దԽ (<-> C/C++Ͱॻ͖͡ΌಘΒΕͳ͍) - Common LispͷϝλϓϩάϥϛϯάͰ࣮ߦલʹSymbolic Di f - ίϯύΠϧͨ͠ϊʔυΛԿճݕࡧ͠͞ͳ͍͍ͯ͘ ϊʔυߏங + (1. 2. 3.)Λ20msҎͷΦʔόʔϔουʹऩΊΔ PyTorch Likeͳ͍ํͰ͑ΔΑ͏ʹ͢Δ 1Ҏͷΰʔϧ:
Tracing JITͷ IfMapͷදݱΛͲ͏͢Δ͔ʁ (RNNGatingͷදݱͳͲ) - ղܾࡦ1 IfNodeMapNodeΛ࡞͢Δ - ίϯύΠϧ࣌ؒͷ࠷దԽΛؾʹ͠ͳ͍Ͱ͍͍ -
͍ํ͕ײత͡Όͳ͍ - ղܾࡦ2 ίϯύΠϧ࣌ؒΛݮ͢Δ(༗ྗ) - ίϯύΠϧͷܭࢉྔO(nlogn + 2N)ʹൺྫ͢Δ - ͏·͘ίϯύΠϧͷίετΛ20msҎʹऩΊΒΕͨΒܭࢉ࣌ؒ >>> ίϯύΠϧ࣌ؒʹͳΔ - de fi ned-by-runͬΆ͘ಈ͔ͤΔΑ͏ʹͳΔͷͰͬͪ͜ͰΖ͏ͱࢥͬͯΔ - 2.ͷํ๏ͰϞσϧΛهड़͠ͳ͕Βɺ͕ཉ͘͠ͳͬͨΒ1.ʹॻ͖ͤΔͷ͕ཧ
݁/ࠓޙ - oneDNN/GGMLͷόοΫΤϯυ - ίϯύΠϧ࣌ؒݮ de fi ned-by-run style -
গͳ͍ίετͰCommon LispʹSoTAͷDeep Learning͕ߦ͑ΔΑ͏ʹؤுΓ͍ͨ - ͜Ε͔Β GGMLoneDNNΛڞ௨ͷAPIͰݺͼग़ͤΔΑ͏ʹWrapperΛॻ͘ͷ͕ඪ - ৽͍͠ઃܭΛऔΓೖΕ͍ͯͯ ֦ுతͰ Common Lisp͕͑Δ - հͨ͠cl-wa ff e2ͷύϥμΠϜͰ࣮༻తʹେنͳՊֶܭࢉ͕ߦ͑Δ - VMͷ࣮(=ϕʔεϥΠϯ)ʹؔͯ͠OK - ػೳͱίϛϡχςΟ͕ශऑ - CUDA OpenCL MetalͷରԠ͍ͭʹͳΔΒɾɾɾ - ਪʹ͓͍ͯॏཁͳ ਫ਼Ͱߴͳߦྻܭࢉͷαϙʔτ͕ෆे - Ϟσϧͷอଘ·࣮ͩͯ͠ͳ͍ɾɾɾ - ࣍ͷΰʔϧ - ContributorΛ૿͍ͨ͠: ઃܭࢥͷڭ υΩϡϝϯτͷॆ࣮ - GPT2ͷਪΛ҆ఆԽ - PyTorchϨϕϧͷAPIͷ༷Λࡦఆ͢Δ
࠷ޙʹ ɹɹ ͝ਗ਼ௌ͋Γ͕ͱ͏͍͟͝·ͨ͠ɹ ࣭ ϑΟʔυόοΫΛ͍͚ͨͩΔͱخ͍͠Ͱ͢ʂ