Upgrade to Pro — share decks privately, control downloads, hide ads and more …

cl-waffe2

Avatar for hikettei hikettei
September 25, 2023

 cl-waffe2

2023/09/28日 Lisp Meetup発表用の資料です

Avatar for hikettei

hikettei

September 25, 2023
Tweet

More Decks by hikettei

Other Decks in Programming

Transcript

  1. Github: https://github.com/hikettei/cl-wa ff e2 Common Lisp Programmable Deep Learning Framework

    Lisp Meetup(2023/09/28) hikettei ϑΥϩʔͯ͠>< @hikettei @ichndm @hikettei
  2. Common LispͰσʔλαΠΤϯεʁ ັྗతͳϥΠϒϥϦ܈ ਂ૚ֶश = େن໛Ͱ௿ਫ਼౓ͳߦྻԋࢉ ɾɾɾ →ʹൺ΂Δͱ͋Μ·Γ׆ൃ͡Όͳ͍ ฒྻϓϩάϥϛϯά SIMD

    CUDA খ਺఺ͷܭࢉɾɾɾ →େ఍͸ϥΠϒϥϦΛհ͢Δ PythonͳΒNumpy JuliaͳΒAbstractArrays CLք۾ → simple-array? ΈΜͳ͕࢖͏ߦྻͷσʔλܕ ͦ΋ͦ΋ϥΠϒϥϦ։ൃͷ౔୆͕ͳ͍ɾɾɾ
  3. Common LispͰσʔλαΠΤϯεʁ ັྗతͳϥΠϒϥϦ܈ ਂ૚ֶश = େن໛Ͱ௿ਫ਼౓ͳߦྻԋࢉ ɾɾɾ →ʹൺ΂Δͱ͋Μ·Γ׆ൃ͡Όͳ͍ ฒྻϓϩάϥϛϯά SIMD

    CUDA খ਺఺ͷܭࢉɾɾɾ →େ఍͸ϥΠϒϥϦΛհ͢Δ PythonͳΒNumpy JuliaͳΒAbstractArrays CLք۾ → simple-array? ΈΜͳ͕࢖͏ߦྻͷσʔλܕ ͦ΋ͦ΋ϥΠϒϥϦ։ൃͷ౔୆͕ͳ͍ɾɾɾ ANSI Common Lispͷ഑ྻ͡Όݶք GPU΍b fl oat16ରԠʹ͢Δͱ0͔Βॻ͖௚͢͜ͱʹͳΔʁ →Ͳ͏͍ͬͨϥΠϒϥϦ͕ඞཁʹͳΔ͔ɾɾɾ
  4. Common LispͰσʔλαΠΤϯεʁ Solution: ந৅తͳTensorͱάϥϑͷϥΠϒϥϦΛ࡞ͬͪΌ͓͏ - ֦ு: શͯͷػೳΛϢʔβʔ֦ுՄೳʹ - ෼ࢄ: CUDA/Metal…ͷόοΫΤϯυ͕ཉ͍͠ʁ

    → ίϛϡχςΟʹ೚ͤΑ͏ - ౔୆: ୭Ͱ΋࠷খݶͷίʔυͰ֦ு͕ॻ͚ΔΑ͏ʹ͠Α͏ - cl-wa ff e2 = ந৅ϊʔυ(AbstractTensor)ͱந৅ςϯιϧ(AbstractTensor)Ͱ஗ԆධՁΛ͠ͳ͕Βܭࢉ͢Δ࿮૊Έ - Petalisp/Numcl/Numericals/MGL-MAT/LLA/magiCLͷࢿ࢈ … Ҿ͖ܧ͛Δ - ίϯύΫτ: ͨͬͨ20000ߦͷCommon LispίʔυͰಈ࡞
  5. ֓ཁ[cl-wa ff e2] ਂ૚ֶशͷͨΊͷܭࢉந৅ԽϥΠϒϥϦͱϑϨʔϜϫʔΫ - ਂ૚ֶशϑϨʔϜϫʔΫ = ͜͏͍͏ػೳ - ࠓ೥ͷ࿡݄͘Β͍͔Βॻ͖࢝Ίͨݸਓ։ൃͷϓϩδΣΫτ

    - Common LispʹϞμϯͳਂ૚ֶश؀ڥΛ࣋ͬͯ͘Δ͜ͱΛ໨ඪ (WIP) - ςϯιϧͷԋࢉ(AbstractTensor), άϥϑॲཧ(AbstractNode) - JITίϯύΠϥ, VM, Symbolic Di ff erentiation, ֬཰෼෍ͷߴ଎ۙࣅ - ඪ४࣮૷(Ϟσϧ, ׆ੑԽؔ਺, ଛࣦؔ਺ ࠷దԽؔ਺ etc…) - Tape Based Reverse Mode Automatic Di ff erentiation ࣗಈඍ෼ - ࠓ೔࿩͢ϝΠϯ: AbstractTensor/NodeͱͦͷίϯύΠϥ Tape Based Reverse Mode Automatic Di ff erentiation - ࣅͨΑ͏ͳ࿩: PyTorch MGL Petalisp Aesera(a fork of Theano) ౳ɾɾɾ - DAG(ඇ८ճ༗޲άϥϑ)ઐ໳͚ͩͲ… - Numpy LikeͳߦྻԋࢉϥΠϒϥϦͱͯ͠࢖͏͜ͱ΋Մೳ - ਺ֶઐ༻ͷϓϩάϥϛϯάݴޠΛ࡞Δ࿩ͩͱࢥͬͯฉ͍͍ͯͩ͘͞
  6. جຊతͳ࢖͍ํ(2/4) - ஗ԆධՁςϯιϧͷ࡞੒ - ࡞Γ์୊: 0ίετͰ࡞ΕΔͷͰͨ͘͞Μ࡞ͬͯOK - ஗ԆධՁ: ίϯύΠϧ͢Δ͔ετϨʔδʹΞΫηε͠Α͏ͱͨ͠ॠؒAllocation͞ΕΔ -

    ܗঢ়ʹ͸SymbolΛࢦఆ͠ɺޙ͔Βมߋ͢Δ͜ͱ͕Ͱ͖Δɻ - ໊લΛ͚ͭΔ͜ͱ͕Ͱ͖Δ NILͰGensym - make-inputͰInputTensorΛ࡞੒Ͱ͖Δ
  7. جຊతͳ࢖͍ํ(4/4) - ίϯύΠϧ - Loopͷ࠷దԽ άϥϑͷ࠷దԽ ΦϑηοτܭࢉͷΠϯϥΠϯԽ ฒྻԽͷScheduling etc… -

    (build ऴ୺ͷTensor)ͰίϯύΠϧΛ࣮ߦ - (proceed tensor)ͰίϯύΠϧ࣮ͯ͠ߦ ͦͷޙܭࢉϊʔυΛ઀ଓͰ͖Δ - In-placeԋࢉͷએݴ͸༻͍ΔTensorʹҕͶΔ͜ͱͰ100%ࣗಈ൑ఆ
  8. Key Concepts 1. Runtime Code Generation - ࠷దԽ͞Εͨ(loop for …)ͷίʔυΛࣗಈੜ੒ੜ੒Ͱ͖Δ

    - CFFIͳͲΛհͯ͠֎෦ͷϥΠϒϥϦͱInteroperation͕௒͠΍͍͢ - Ұ౓ίϯύΠϧͨ͠ΒҎ߱Ωϟογϡ͞ΕΔ 2. High Level IR - શͯͷԋࢉ͸஗ԆධՁ͞ΕΔ - AbstractNodeΫϥεΛ༻͍ͯܭࢉϊʔυΛදݱ -> Ұ࣍ݩͷIR (Wengert List)ʹίϯύΠϧ࣮ͯ͠ߦ - ೖྗͱҰ࣌ྖҬͷTensorΛ໌ࣔతʹએݴ -> In-PlaceͳԋࢉΛશͯࣗಈ൑ఆ͢Δ - άϥϑϨϕϧͷ࠷దԽ (ϝϞϦہॴੑ ࢬמΓ ͳͲɾɾɾ) 3. Elegant User Interface - Numpy Likeͳ࢖͍΍͍͢API x Common Lispͷϝλϓϩάϥϛϯά - REPLۦಈ։ൃͰσόοά͕͠΍͍͢ - શͯͷػೳΛϢʔβʔ֦ுՄೳʹ͢Δ - ந৅౓ͷߴ͍ઃܭ
  9. Ϟνϕʔγϣϯ(1/4) 1. ंྠͷ࠶ൃ໌͸ͨ͘͠ͳ͍ - Q. cl-wa ff e2ͷ࢓ࣄ͸ʁ - oneDNN

    GGML cuDNN ͳͲɾɾɾ σόΠεಛԽͰߴ଎ͳϑϨʔϜϫʔΫ͕ͨ͘͞Μଘࡏ͢Δ - ֎෦ͷϥΠϒϥϦΛͨ͘͞ΜཔΔ - 1. CPUʹ΋GPUʹ΋ґଘ͠ͳ͍ந৅తͳσʔλܕAbstractTensorΛఆٛ͢Δ - 2. CFFIΛհͯ͠֎෦ϥΠϒϥϦΛݺͼग़͢ A. গͳ͍هड़ྔͰ֦ுΛॻ͖ ಡΈ΍͍͢ڞ௨ͷAPIͰ REPLͱϝλϓϩΛੜ͔ͯ͠ APIΛߴ଎ʹݺͼग़͢ ͜͜ͷڑ཭Λ୹͘
  10. Ϟνϕʔγϣϯ(2/4) 2. REPL x ஗ԆධՁ - ஗ԆධՁ ໋ྩ(add, sin, matmulͳͲɾɾɾ)Λ࣮ߦͯ͠΋ͦͷ৔Ͱ͸࣮ߦ͞Εͳ͍ɻ

    (i.e.: cl-wa ff e2 = DSL) - DSLΛίϯύΠϧ + άϥϑͷ࠷దԽ → ੍໿ͷগͳ͍Ծ૝ϚγϯͰಈ࡞ (IR=Extended Wengert ListΛ༻͍Δ) - Extended Wengert List: ࣗಈඍ෼ͷཧ࿦ͰΑ͘༻͍Δσʔλߏ଄ ଟ஋ΛฦͤΔ ɹop=lambdaؔ਺ ޙड़͢ΔϚΫϩͰࣗ༝ʹ࡞ͬͯ΋Β͏ out = ax+b - ଞϥΠϒϥϦͱͷࠩҟ: ౴͕͑ཉ͍࣌͠ʹ(proceed tensor)ͰғΉ͚ͩ
  11. Ϟνϕʔγϣϯ(2/4) ɹop=lambdaؔ਺ ޙड़͢ΔϚΫϩͰࣗ༝ʹ࡞ͬͯ΋Β͏ proceed(out = ax+b) 2. REPL x ஗ԆධՁ

    - ஗ԆධՁ ໋ྩ(add, sin, matmulͳͲɾɾɾ)Λ࣮ߦͯ͠΋ͦͷ৔Ͱ͸࣮ߦ͞Εͳ͍ɻ (i.e.: cl-wa ff e2 = DSL) - DSLΛίϯύΠϧ + άϥϑͷ࠷దԽ → ੍໿ͷগͳ͍Ծ૝ϚγϯͰಈ࡞ (IR=Extended Wengert ListΛ༻͍Δ) - Extended Wengert List: ࣗಈඍ෼ͷཧ࿦ͰΑ͘༻͍Δσʔλߏ଄ ଟ஋ΛฦͤΔ - ଞϥΠϒϥϦͱͷࠩҟ: ౴͕͑ཉ͍࣌͠ʹ(proceed tensor)ͰғΉ͚ͩ
  12. Ϟνϕʔγϣϯ(2/4) ɹop=lambdaؔ਺ ޙड़͢ΔϚΫϩͰࣗ༝ʹ࡞ͬͯ΋Β͏ proceed(sum(proceed(ax+b))) 2. REPL x ஗ԆධՁ - ஗ԆධՁ

    ໋ྩ(add, sin, matmulͳͲɾɾɾ)Λ࣮ߦͯ͠΋ͦͷ৔Ͱ͸࣮ߦ͞Εͳ͍ɻ (i.e.: cl-wa ff e2 = DSL) - DSLΛίϯύΠϧ + άϥϑͷ࠷దԽ → ੍໿ͷগͳ͍Ծ૝ϚγϯͰಈ࡞ (IR=Extended Wengert ListΛ༻͍Δ) - Extended Wengert List: ࣗಈඍ෼ͷཧ࿦ͰΑ͘༻͍Δσʔλߏ଄ ଟ஋ΛฦͤΔ - ଞϥΠϒϥϦͱͷࠩҟ: ౴͕͑ཉ͍࣌͠ʹ(proceed tensor)ͰғΉ͚ͩ
  13. Ϟνϕʔγϣϯ(3/4) CFFI Only / Python ͡ΌͩΊʁ (Ҿ༻: The Deep Learning

    Compiler: A Comprehensive Survey https://arxiv.org/pdf/2002.03794.pdf) - CFFI Only - Common LispΛ࢖͏ҙຯ͕ͳ͍ - APIͷ࢓༷มߋͱ͔ʹ଱͑ΒΕͳ͍ - σόοά͕ΊΜͲ͘ͳΔ - ͳΜͰCommon Lisp(SBCL)? - Loopؔ࿈ͷ࠷దԽ͕΍Γ΍͍͢ - ίϯύΠϥΛॻ͔ͳͯ͘΋ɺϚΫϩΛॻ͚ͩ͘Ͱ͍͍ - (compile nil body) … gccͷ20ഒૣ͍ίϯύΠϧ࣌ؒ - CLͰ΍ΔͳΒCLͰಈ͔͢ࢥ૝ͷϑϨʔϜϫʔΫʹ͍ͨ͠ - ࢥ૝: ΊΜͲ͍ڞ௨߲͸શ෦ϚΫϩʹॻ͔ͤΔ - Common LispΛѪͯ͠Δ
  14. ࣮૷ [ܭࢉϊʔυͷߏங] (1/15) 1. ܭࢉϊʔυͷߏங 2. Sort + Pruning 3.

    ෆཁͳίϐʔͷࠜઈ 4. ϝϞϦہॴੑͷ࠷దԽ 5. ϝϞϦׂΓ౰ͯ 6. ٯ఻೻ͷߏங 7. ٯ఻೻ͷAdjointͷ࠷దԽ 8. ઃܭਤͷίϯύΠϧ 9. άϥϑͷॻ͖׵͑ 10. Ωϟογϡ ܭࢉϊʔυͷجຊ୯Ґ (AbstractNode) 11. ࣮ߦ 1. ͲͷσόΠεͰ΋ڞ௨ͷ࢓༷ΛdefnodeͰఆٛ 1ߦͷ:where͸࣍ͷίʔυͱ౳Ձ - Shape ErrorͷAssertionͱɺΤϥʔ಺༰ͷGenerator (ίϯύΠϧ࣌ʹల։) - ͲͷϙΠϯλ͕In-place͔Λ؅ཧʢ࠷దԽʹ༻͍ΒΕΔʣ - ࣍ͷ஗ԆධՁςϯιϧΛੜ੒(next_inputs in Aesera/Theano) - Optional BroadcastingͷϊʔυଆͰͷએݴ - TensorͷϥϯΫΛ΋ͱʹLoopͷ࠷దԽ
  15. ࣮૷ [ܭࢉϊʔυͷߏங] (2/15) 1. ܭࢉϊʔυͷߏங 2. Sort + Pruning 3.

    ෆཁͳίϐʔͷࠜઈ 4. ϝϞϦہॴੑͷ࠷దԽ 5. ϝϞϦׂΓ౰ͯ 6. ٯ఻೻ͷߏங 7. ٯ఻೻ͷAdjointͷ࠷దԽ 8. ઃܭਤͷίϯύΠϧ 9. άϥϑͷॻ͖׵͑ 10. Ωϟογϡ 11. ࣮ߦ ܭࢉϊʔυͷجຊ୯Ґ (AbstractNode) 2. σόΠε(AbstractTensor)Λ࢖͏ ܧঝؔ܎: MyTensor << LispTensor << AbstractTensor ↑όοΫΤϯυͷҰཡΛදࣔ͢Δ
  16. ࣮૷ [ܭࢉϊʔυͷߏங] (3/15) 1. ܭࢉϊʔυͷߏங 2. Sort + Pruning 3.

    ෆཁͳίϐʔͷࠜઈ 4. ϝϞϦہॴੑͷ࠷దԽ 5. ϝϞϦׂΓ౰ͯ 6. ٯ఻೻ͷߏங 7. ٯ఻೻ͷAdjointͷ࠷దԽ 8. ઃܭਤͷίϯύΠϧ 9. άϥϑͷॻ͖׵͑ 10. Ωϟογϡ 11. ࣮ߦ ܭࢉϊʔυͷجຊ୯Ґ (AbstractNode) 3. AbstractNode(=opΛ࡞੒͢ΔΫϥε)ͷ࣮૷Λ࡞੒͢Δ Ծ૝ϚγϯͰ࢖͏໋ྩྻ(Extended Wengert List)͸λؔ਺Ͱ໋ྩΛද͢ - de fi ne-impl : SࣜΛ(compile nil body)ͯ͠λؔ਺ΛಘΔ - de fi ne-impl-op: ௚઀λؔ਺Λఆٛͯ͠VMʹ࢖ͬͯ΋Β͏ ↑֤σόΠεʹNodeͷ࣮૷͕ඞཁ 1. call-with-viewʹඞཁͳ৘ใΛೖྗ͢Ε͹ࣗಈͰloopΛॻ͍ͯ͘ΕΔ 2. Unrollͱ͔͢ΔͨΊʹdefmacroͱಉ͡ελΠϧͰॻ͘ඞཁ͕͋Δ
  17. ࣮૷ [ܭࢉϊʔυͷߏங] (4/15) 1. ܭࢉϊʔυͷߏங 2. Sort + Pruning 3.

    ෆཁͳίϐʔͷࠜઈ 4. ϝϞϦہॴੑͷ࠷దԽ 5. ϝϞϦׂΓ౰ͯ 6. ٯ఻೻ͷߏங 7. ٯ఻೻ͷAdjointͷ࠷దԽ 8. ઃܭਤͷίϯύΠϧ 9. άϥϑͷॻ͖׵͑ 10. Ωϟογϡ 11. ࣮ߦ ܭࢉϊʔυͷߏங/߹੒ 4. Shape Error - ShapeError͸શͯࣄલݕࠪ - એݴແ͠ͷBroadcasting͸ېࢭ ShapeError(࣮ߦલࣗಈੜ੒) - ϊʔυ Tensor ྆ํએݴ
  18. ࣮૷ [ܭࢉϊʔυͷߏங] (5/15) 1. ܭࢉϊʔυͷߏங 2. Sort + Pruning 3.

    ෆཁͳίϐʔͷࠜઈ 4. ϝϞϦہॴੑͷ࠷దԽ 5. ϝϞϦׂΓ౰ͯ 6. ٯ఻೻ͷߏங 7. ٯ఻೻ͷAdjointͷ࠷దԽ 8. ઃܭਤͷίϯύΠϧ 9. άϥϑͷॻ͖׵͑ 10. Ωϟογϡ 11. ࣮ߦ ܭࢉϊʔυͷߏங/߹੒ 5. ஗ԆධՁ - forward͔callͰωοτϫʔΫΛ઀ଓ - AbstractNode/Tensor͸CLOSΫϥε - ౴͕͑ඞཁͳՕॴͰ: - build (ϊʔυͷऴ఺) - proceed (ͦͷ··ϊʔυܨ͛ΕΔ) Λ༻͍ͯίϯύΠϧ/࣮ߦ͢Δ - call->ؔ਺Ͱෳ਺ͷϊʔυΛ߹੒ - asnodeͰؔ਺ΛϞσϧͱͯ͠ѻ͏ ↑νϡʔτϦΞϧ͔ΒҾͬு͖ͬͯͨͷͰͪ͝Όͪ͝Όͯ͠·͕͢ ຊདྷ͸with-devices͚ͩͰσόΠεͷมߋ͕ՄೳͰ͢
  19. ࣮૷ [άϥϑ࠷దԽฤ] (6/15) 1. ܭࢉϊʔυͷߏங 2. Sort + Pruning 3.

    ෆཁͳίϐʔͷࠜઈ 4. ϝϞϦہॴੑͷ࠷దԽ 5. ϝϞϦׂΓ౰ͯ 6. ٯ఻೻ͷߏங 7. ٯ఻೻ͷAdjointͷ࠷దԽ 8. ઃܭਤͷίϯύΠϧ 9. άϥϑͷॻ͖׵͑ 10. Ωϟογϡ 11. ࣮ߦ ໦ߏ଄ → Ұ࣍ݩͷϦετ (ॻ͘͜ͱͳ͍) - Forward/BackwardͷͨΊʹτϙϩδΧϧιʔτ͢Δඞཁ͕͋Δ - (Backward࣌͸)ޯ഑ܭࢉʹඞཁͳํ޲͔͠ܭࢉ͠ͳ͍ ↑ͷάϥϑ͔Βԫ৭ͷIR΁ͷίϯύΠϥʹ͍ͭͯͷ࿩Λࠓ͔Β͢Δ - ͜ΕʹΑͬͯॏෳͨ͠ܭࢉϊʔυͷϧʔτ͕ͳ͘ͳΔ
  20. ࣮૷ [άϥϑ࠷దԽฤ] (7/15) 1. ܭࢉϊʔυͷߏங 2. Sort + Pruning 3.

    ෆཁͳίϐʔͷࠜઈ 4. ϝϞϦہॴੑͷ࠷దԽ 5. ϝϞϦׂΓ౰ͯ 6. ٯ఻೻ͷߏங 7. ٯ఻೻ͷAdjointͷ࠷దԽ 8. ઃܭਤͷίϯύΠϧ 9. άϥϑͷॻ͖׵͑ 10. Ωϟογϡ 11. ࣮ߦ In-place mutation Ұ࣌ྖҬͷ࠷খԽ - ExistTensor{ֶशσʔλ ޯ഑ ϞσϧͷॏΈ}͸ઈରʹഁյతʹܭࢉͯ͠͸͍͚ͳ͍ - ExistTensor͔ΒܭࢉΛ͢Δ͜ͱͰੜ͡Δ݁ՌͷTensor(InputTensor)͸ഁյతʹͯ͠΋ྑ͍ !sin͸(࣮࣭)෭࡞༻͕ͳ͍ؔ਺ͱͯ͠ѻ͑Δɻ ഁյత(In-place)ԋࢉ → ܭࢉͷ݁ՌΛࣗ਎ʹ֨ೲ͢Δԋࢉ ߴ଎ԽʹΊͬͪΌେࣄ ԼͷϧʔϧΛجʹ͢Ε͹ϓϩάϥϚʔ͸ಛผͳίʔυΛॻ͔ͳͯ͘ྑ͍: ഁյతʹ͢Δ৚݅ → IRΛԼ͔ΒḷͬͯҰ൪࠷ޙͷࢀর͚ͩ͸ഁյతʹ͢Δ O(n) → ϊʔυΛ߹੒͢Δͱ͖͸ॻ͍ͯ͋Δ਺ࣜΛͦͷ··Ҡ͚ͩ͢ͰOK ਂ૚ֶशͰ͸ܭࢉ్தͷม਺ͷ99%͸ֶशதʹ؍ଌ͞Εͳ͍ → άϥϑϨϕϧͷ࠷దԽ͕େࣄ (஫: ֶशதʹ100%؍ଌ͢ΔTensor … ExistTensor. ͦΕҎ֎ͷTensor … InputTensorͷ࢖͍෼͚) (i.e.: !sinؔ਺͕ͪΌΜͱ࣮૷͞Ε͍ͯΕ͹ɺ͜ͷ࢓૊Έ͸100%ಈ࡞͢ΔΑ͏ʹɺೋछྨͷTensor͕͋Δ)
  21. ࣮૷ [άϥϑ࠷దԽฤ] (8/15) 1. ܭࢉϊʔυͷߏங 2. Sort + Pruning 3.

    ෆཁͳίϐʔͷࠜઈ 4. ϝϞϦہॴੑͷ࠷దԽ 5. ϝϞϦׂΓ౰ͯ 6. ٯ఻೻ͷߏங 7. ٯ఻೻ͷAdjointͷ࠷దԽ 8. ઃܭਤͷίϯύΠϧ 9. άϥϑͷॻ͖׵͑ 10. Ωϟογϡ 11. ࣮ߦ Memory Locality Ұ࣌ྖҬͷ࠶ར༻ InputTensor - ઐΒܭࢉ݁Ռอଘ༻ͷྖҬ ίϯύΠϥ͕ܨ͔͗͑ͯ΋໰୊ͳ͍ɻ ͜͜Ͱ֬อͨ͠Ұ࣌ྖҬ͸ɾɾɾ ͬͪ͜Ͱ΋࢖͏ Memory Locality ࠷దԽແ͠: 8 Tensors ࠷దԽ͋Γ: 5 Tensors. ҰճͷSoftmax(x)Ͱ5 Tensors
  22. ࣮૷ [άϥϑ࠷దԽฤ] (9/15) 1. ܭࢉϊʔυͷߏங 2. Sort + Pruning 3.

    ෆཁͳίϐʔͷࠜઈ 4. ϝϞϦہॴੑͷ࠷దԽ 5. ϝϞϦׂΓ౰ͯ 6. ٯ఻೻ͷߏங 7. ٯ఻೻ͷAdjointͷ࠷దԽ 8. ઃܭਤͷίϯύΠϧ 9. άϥϑͷॻ͖׵͑ 10. Ωϟογϡ 11. ࣮ߦ ϝϞϦׂΓ౰ͯ - 1ͭͷCompiled-Compositeʹ͖ͭ1ͭVMAllocationߏ଄ମ͕༻ҙ͞ΕΔ - ͜ͷߏ଄ମ͕ɺϞσϧ಺Ͱ༻͍ΔϝϞϦϓʔϧΛ؅ཧ͢Δ - ͜ͷϝϞϦϓʔϧ͸ ଞͷϞσϧʹׯব͞Εͳ͍ Thread-Safe - ίϯύΠϧ͞ΕͨϞσϧҰཡΛcl-wa ff e2͕ه࿥͍ͯͯ͠ɺͦͬͪΛgc-reachableʹ͍ͯ͠Δ Buildؔ਺ͰίϯύΠϧ
  23. ࣮૷ [Reverse Mode] (10/15) 1. ܭࢉϊʔυͷߏங 2. Sort + Pruning

    3. ෆཁͳίϐʔͷࠜઈ 4. ϝϞϦہॴੑͷ࠷దԽ 5. ϝϞϦׂΓ౰ͯ 6. ٯ఻೻ͷߏங 7. ٯ఻೻ͷAdjointͷ࠷దԽ 8. ઃܭਤͷίϯύΠϧ 9. άϥϑͷॻ͖׵͑ 10. Ωϟογϡ 11. ࣮ߦ ࣗಈඍ෼(Reverse Mode) - ࢓༷: ೋ֊ඍ෼͸ະରԠ Reverse Mode (େ͖͍ߦྻ -> খ͍͞ߦྻʹͳΔ৔߹ߴ଎) ↑ந৅ϊʔυͷ:backward͸ɺଞͷந৅ϊʔυͷ:forwardΛ૊Έ߹Θͤͯಋؔ਺Λදݱ͢Δ ΋ͪΖΜλؔ਺ຒΊࠐΈ΋Մೳ
  24. ࣮૷ [Reverse Mode] (11/15) 1. ܭࢉϊʔυͷߏங 2. Sort + Pruning

    3. ෆཁͳίϐʔͷࠜઈ 4. ϝϞϦہॴੑͷ࠷దԽ 5. ϝϞϦׂΓ౰ͯ 6. ٯ఻೻ͷߏங 7. ٯ఻೻ͷAdjointͷ࠷దԽ 8. ઃܭਤͷίϯύΠϧ 9. άϥϑͷॻ͖׵͑ 10. Ωϟογϡ 11. ࣮ߦ ϊʔυͷ෼ذ - ٯ఻೻ΛInlineͯ͠΋͔͍ͬIn-place mutationΛ૸ΒͤΔ - Gradients͕Permute͞Εͯٻ·Δ৔߹͸࠷దԽؔ਺ͷͨΊʹcontiguousʹ͓ͯ͘͠
  25. ࣮૷ [Reverse Mode] (12/15) 1. ܭࢉϊʔυͷߏங 2. Sort + Pruning

    3. ෆཁͳίϐʔͷࠜઈ 4. ϝϞϦہॴੑͷ࠷దԽ 5. ϝϞϦׂΓ౰ͯ 6. ٯ఻೻ͷߏங 7. ٯ఻೻ͷAdjointͷ࠷దԽ 8. ઃܭਤͷίϯύΠϧ 9. άϥϑͷॻ͖׵͑ 10. Ωϟογϡ 11. ࣮ߦ call-with-view͔Βλؔ਺ - solve-loop-order: ֎෦ϥΠϒϥϦ͕SIMDϨδελશ෦࢖͑ΔΑ͏ʹ࠷దԽ - Loop Collapse, Loop Reordering(Experimental), Unrolling… - ϚΫϩల։ʹຒΊࠐΉ → call-with-view - ࣮ߦ࣌ʹ࢖͏ → do-compiled-loop
  26. ࣮૷ [Reverse Mode] (13/15) 1. ܭࢉϊʔυͷߏங 2. Sort + Pruning

    3. ෆཁͳίϐʔͷࠜઈ 4. ϝϞϦہॴੑͷ࠷దԽ 5. ϝϞϦׂΓ౰ͯ 6. ٯ఻೻ͷߏங 7. ٯ఻೻ͷAdjointͷ࠷దԽ 8. ઃܭਤͷίϯύΠϧ 9. άϥϑͷॻ͖׵͑ 10. Ωϟογϡ 11. ࣮ߦ Symbolic Di f - ͋Δ૊Έ߹Θͤͷܭࢉ໦ΛɺΑΓ҆ఆͰߴ଎ͳ΋ͷʹॻ͖׵͑Δ - log(1+x) ͸ChainRuleͰඍ෼Ͱ͖ͳ͍ - σόΠεಛԽͷ࠷దԽ (ྫ: Conv2D͸CPUͳΒoneDNNͰɾɾɾ) - ReLU/GeLUͳΜ͔͸໋ྩ਺Λ࡟ݮ͢ΔͨΊʹҰͭͷ໋ྩʹ͍ͨ͠(FusionOps) - Theanoͷ࣮૷: ίϯύΠϧ͞ΕͨܭࢉϊʔυΛ·ͨτϨεͯ͠ݕࡧ͢Δ Traceͨ͠ܭࢉϊʔυΛޙ͔Βݕࡧ͢Δ
  27. ࣮૷ [Reverse Mode] (13/15) 1. ܭࢉϊʔυͷߏங 2. Sort + Pruning

    3. ෆཁͳίϐʔͷࠜઈ 4. ϝϞϦہॴੑͷ࠷దԽ 5. ϝϞϦׂΓ౰ͯ 6. ٯ఻೻ͷߏங 7. ٯ఻೻ͷAdjointͷ࠷దԽ 8. ઃܭਤͷίϯύΠϧ 9. άϥϑͷॻ͖׵͑ 10. Ωϟογϡ 11. ࣮ߦ Symbolic Di ff (ຊ෺) - cl-wa ff e2ͷ৔߹ - ͦ΋ͦ΋ͦΜͳάϥϑΛ࡞Βͳ͚Ε͹͍͍ (<-> ͩͬͯݕࡧίετॏ͍ͨ΋Μ) - de fi ne-compiler-macroΛ༻͍ͯࣄલʹFusion͓ͯ͘͠ Pattern MatchͰίϯύΠϧ࣌ʹReplace ↑disassemble͢ΔͱFusion͞Ε͍ͯΔͷ͕Θ͔Δ - ίϯύΠϧ࣌ؒʹ͸ؚ·Εͳ͍
  28. ࣮૷ [Reverse Mode] (14/15) 1. ܭࢉϊʔυͷߏங 2. Sort + Pruning

    3. ෆཁͳίϐʔͷࠜઈ 4. ϝϞϦہॴੑͷ࠷దԽ 5. ϝϞϦׂΓ౰ͯ 6. ٯ఻೻ͷߏங 7. ٯ఻೻ͷAdjointͷ࠷దԽ 8. ઃܭਤͷίϯύΠϧ 9. άϥϑͷॻ͖׵͑ 10. Ωϟογϡ 11. ࣮ߦ Ωϟογϡ - call-with-viewͰੜ੒͞Εͨؔ਺͸Ωϟογϡ͞ΕΔ - ೋճ໨ͷ(compile nil body)͸͠ͳ͍ͷͰͲΜͲΜ(build toplevel)ͯ͠OK - 12૚ͷGPT-2Ͱ0.3sec͘Β͍ - ཧ૝: ࣮ߦ࣌ʹશͯΩϟογϡ͞ΕΕ͹େ఍2msҎ಺ʹ͸ऴΘΔ ࠷దԽޙͷ໋ྩ਺͕100Ͱ2ms͘Β͍
  29. ࣮૷ [Reverse Mode] (15/15) 1. ܭࢉϊʔυͷߏங 2. Sort + Pruning

    3. ෆཁͳίϐʔͷࠜઈ 4. ϝϞϦہॴੑͷ࠷దԽ 5. ϝϞϦׂΓ౰ͯ 6. ٯ఻೻ͷߏங 7. ٯ఻೻ͷAdjointͷ࠷దԽ 8. ઃܭਤͷίϯύΠϧ 9. άϥϑͷॻ͖׵͑ 10. Ωϟογϡ 11. ࣮ߦ Tadah~ - (forward model Ҿ਺)ͰForward ModeΛ࣮ߦ - (backward model) ͰReverse ModeΛ࣮ߦ ར༻ऀ: defnode+de fi ne-impl+defclass͚ͩͰ͜ͷ࠷దԽ͕શ෦࢖͑Δ
  30. ϕϯνϚʔΫ ৚݅ - MacBook Pro (13-inch, 2017) 8GB 2.3 GHz

    Dual-Core Intel Core i5 - CPU, શͯγϯάϧεϨου - SIMD֦ு໋ྩ=AVX2·Ͱ - MNISTΛࡾ૚MLPͰֶश - SLEEF OpenBLAS Backend - Keras=Tensor fl ow Backend (No AVX2, buildΊΜͲ͔ͬͨ…) ׆ੑԽؔ਺=ReLU, ࠷దԽؔ਺=Adam, lr=1e-3) ॳճͷ࣮ߦ=ίϯύΠϧ࣌ؒΛؚΉ ࣮ݧ؀ڥ͕ద౰ͳͷͰ͋Μ·Γ͋ͯʹ͠ͳ͍Α͏ʹ ଛࣦؔ਺=CrossEntropy
  31. ·ͱΊ 1. Loop࠷దԽ + ֎෦ͷAPI - ΦϑηοτΛ࡞੒ͯ͠΋ฒྻԽ/SIMDͷϨδελΛͪΌΜͱ࢖͑Δϧʔτʹमਖ਼ - Ωϟογϡ͞ΕΔͷͰೋճ໨ͷίϯύΠϧ࣌ؒ͸Zero Cost

    2. ڧྗͳάϥϑ࠷దԽ ҎԼͷࡾͭͷઓུͰSoTAੑೳΛ໨ࢦ͢ɿ - ͲΜͳॻ͖ํΛͯ͠΋100% in-placeʹͳΔ 3. Symbolic Di ff (ຊ෺) - ϝϞϦͷہॴੑͷ࠷దԽ (<-> C/C++Ͱ௚ॻ͖͡ΌಘΒΕͳ͍) - Common LispͷϝλϓϩάϥϛϯάͰ࣮ߦલʹSymbolic Di f - ίϯύΠϧͨ͠ϊʔυΛԿճ΋ݕࡧ͠௚͞ͳ͍͍ͯ͘ ϊʔυߏங + (1. 2. 3.)Λ20msҎ಺ͷΦʔόʔϔουʹऩΊΔ PyTorch Likeͳ࢖͍ํͰ࢖͑ΔΑ͏ʹ͢Δ 1೥Ҏ಺ͷΰʔϧ:
  32. Tracing JITͷ໰୊఺ If΍MapͷදݱΛͲ͏͢Δ͔ʁ (RNN΍GatingͷදݱͳͲ) - ղܾࡦ1 IfNode΍MapNodeΛ࡞੒͢Δ - ίϯύΠϧ࣌ؒͷ࠷దԽΛؾʹ͠ͳ͍Ͱ͍͍ -

    ࢖͍ํ͕௚ײత͡Όͳ͍ - ղܾࡦ2 ίϯύΠϧ࣌ؒΛ࡟ݮ͢Δ(༗ྗ) - ίϯύΠϧͷܭࢉྔ͸O(nlogn + 2N)ʹൺྫ͢Δ - ͏·͘ίϯύΠϧͷίετΛ20msҎ಺ʹऩΊΒΕͨΒܭࢉ࣌ؒ >>> ίϯύΠϧ࣌ؒʹͳΔ - de fi ned-by-runͬΆ͘ಈ͔ͤΔΑ͏ʹͳΔͷͰͬͪ͜Ͱ΍Ζ͏ͱࢥͬͯΔ - 2.ͷํ๏ͰϞσϧΛهड़͠ͳ͕Βɺ଎౓͕ཉ͘͠ͳͬͨΒ1.ʹॻ͖௚ͤΔͷ͕ཧ૝
  33. ݁࿦/ࠓޙ - oneDNN/GGMLͷόοΫΤϯυ - ίϯύΠϧ࣌ؒ࡟ݮ de fi ned-by-run style -

    গͳ͍ίετͰCommon LispʹSoTAͷDeep Learning͕ߦ͑ΔΑ͏ʹؤுΓ͍ͨ - ͜Ε͔Β͸ GGML΍oneDNNΛڞ௨ͷAPIͰݺͼग़ͤΔΑ͏ʹWrapperΛॻ͘ͷ͕໨ඪ - ৽͍͠ઃܭΛऔΓೖΕ͍ͯͯ ֦ுతͰ Common Lisp͕࢖͑Δ - ঺հͨ͠cl-wa ff e2ͷύϥμΠϜͰ࣮༻తʹେن໛ͳՊֶܭࢉ͕ߦ͑Δ - VMͷ࣮૷(=ϕʔεϥΠϯ)ʹؔͯ͠͸OK - ػೳͱίϛϡχςΟ͕ශऑ - CUDA OpenCL Metal΁ͷରԠ͸͍ͭʹͳΔ΍Βɾɾɾ - ਪ࿦ʹ͓͍ͯॏཁͳ ௿ਫ਼౓Ͱߴ଎ͳߦྻܭࢉͷαϙʔτ͕ෆे෼ - Ϟσϧͷอଘ΋·࣮ͩ૷ͯ͠ͳ͍ɾɾɾ - ࣍ͷΰʔϧ - ContributorΛ૿΍͍ͨ͠: ઃܭࢥ૝ͷ෍ڭ υΩϡϝϯτͷॆ࣮ - GPT2ͷਪ࿦Λ҆ఆԽ - PyTorchϨϕϧͷAPIͷ࢓༷Λࡦఆ͢Δ