Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Transformer and Graph Neural Network

LiberalArts
February 22, 2021

Transformer and Graph Neural Network

下記で取り扱ったTransformerの簡易的な解説を公開します。

・Transformer と画像処理
https://lib-arts.booth.pm/items/2741653

TransformerはLLMのベースに用いられるなど、応用の幅が広いので抑えておくと良いと思います。

LiberalArts

February 22, 2021
Tweet

More Decks by LiberalArts

Other Decks in Research

Transcript

  1. ୈ 2 ষ Transformer ͷجຊཧղ ୈ 2 ষͰ͸ɺ ຊॻͷٕज़తͳཧղͷϕʔεͱͯ͠ Transformer

    ͷج ຊͷཧղʹ͍ͭͯऔΓѻ͍·͢ɻҎԼɺ 2.1 અͰ͸ RNN ͱ Attention ͷ֓ཁʹ͍ͭͯऔΓѻ͍ɺ2.3 અͱ 2.4 અͰ Dot Product Attention Λத৺ͱ͢Δ Transformer Ϟδϡʔϧͷղऍ͕ߦ͑ΔΑ͏ʹ 2.2 અ ͰͦͷϕʔεͱͳΔߟ͑ํͱͯ͠ Message Passing ͱάϥϑχϡʔ ϥϧωοτϫʔΫ (Graph Neural Network) ʹ͍ͭͯ͝঺հ͠·͢ɻ 2.4 અ·ͰͰ Transformer ʹ͍ͭͯ೺Ѳ্ͨ͠Ͱɺ2.5 અͰ͸ 3.2 અ ͰऔΓѻ͏ Vision Transformer(ViT) Ͱ΋༻͍ΒΕΔ BERT ͷωο τϫʔΫߏ੒ʹ͍ͭͯ֬ೝ͠·͢ɻ 2.1 RNN ͱ Attention 2.1 અͰ͸ RNN ͱ Attention ʹ͍ͭͯऔΓѻ͍·͢ɻ https://lib-arts.booth.pm/items/1834866 RNN ͱ Attention ʹ͍ͭͯ͸্هͷୈ 1 ষʙୈ 2 ষͰৄ͘͠औΓѻ͍· ͨ͠ͷͰɺຊॻͰ͸؆୯ͳ֓ཁͱͦͷղऍʹ͍ͭͯ·ͱΊ͍ͨͱࢥ͍·͢ɻ 15
  2. ୈ 2 ষ Transformer ͷجຊཧղ ˛ਤ 2.1 RNN ϕʔεͷܥྻม׵ϞσϦϯά (Seq2seq

    ࿦จ Figure 1) https://arxiv.org/abs/1409.3215 ˛ਤ 2.2 ܥྻม׵ϞσϦϯά΁ͷ Attention ͷಋೖ (Attention ࿦จΑΓ) https://arxiv.org/abs/1409.0473 ਤ 2.1 ͱਤ 2.2 ͸ͦΕͧΕ RNN ܥΛ༻͍ͨ Seq2seq ͱɺAttention Λಋ ೖͨ͠ݚڀͰ͢ɻͲͪΒ΋ػց຋༁ (Machine Translation) ʹ͍ͭͯऔΓ૊ 16
  3. 2.1 RNN ͱ Attention Μͩಉ࣌ظͷݚڀͰ͢ɻRNN ͸࠶ؼతʹχϡʔϥϧωοτϫʔΫΛܭࢉ͠ ͍ͯ͘ͷʹରͯ͠ɺAttention ͸ӅΕ૚ͷॏΈ෇͚࿨Λܭࢉ͠·͢ɻ ͯ͞ɺ΋ͪΖΜ͜ΕΒͷܭࢉͷྲྀΕͷ೺Ѳ͸ॏཁͳͷͰ͕͢ɺΑΓॏཁ ͳͷ͸ͦΕͧΕͷॲཧͷߏ଄͕ͲͷΑ͏ͳҙຯΛ͔࣋ͭͰ͢ɻRNN

    ܥ͸ܥ ྻͷϞσϦϯάΛऔΓѻ͏ʹ͋ͨͬͯͷγϯϓϧͳߟ͑ํͰ͸͋ΔҰํͰɺ ໰୊͕͋Δͱ͢Ε͹ܥྻ͕௕͘ͳΔʹͭΕͯ৘ใͷऔΓѻ͍͕೉͘͠ͳΔ ͱ͍͏఺Ͱ͢ɻLSTM ΍ GRU ͳͲɺ௕͍ܥྻΛऔΓѻ͏ͨΊʹ༷ʑͳ޻ ෉͕ͳ͞Ε͖ͯ·͕ͨ͠ɺ݁ہͷͱ͜ΖϞδϡʔϧશମͰ౳ൺ਺ྻతʹ৘ ใ͕ࣦΘΕΔ͜ͱʹ͸มΘΓͳ͘ɺ௕͍ܥྻͷऔΓѻ͍ʹ೉͕͋Γ·ͨ͠ɻ ͨͱ͑͹ RNN ͷ఻೻ʹ͋ͨͬͯ৘ใ͕൒෼͔͠࢒Βͳ͍ͱԾఆ͢Δͱɺ 0.510 = 0.000976 · · · ͷΑ͏ʹܥྻͷ௕͕͞ 10 Ͱ͋Δ͚ͩͰ࠷ॳͷ୯ޠͷ ৘ใ͸ 0.1 ˋҎԼ͔͠࢒Βͳ͘ͳΓ·͢ɻ ͜Εʹର͠ɺAttention ͸ॏΈ෇͚࿨ͱͯ͠ܭࢉ͢ΔͨΊɺ཭Εͨ৘ใ΋ ͦΕ΄Ͳແཧͳ͘औΓѻ͑·͢*1 ɻAttention ॲཧͰ͸ܭࢉॲཧͷաఔ͕౳ ൺ਺ྻతͰͳ͍͚ͩͰɺ཭Εͨ৘ใͷऔΓѻ͍͕֨ஈʹ༰қʹͳΓ·͢ɻͪ ͳΈʹ౳ൺ਺ྻతͳऔΓѻ͍ͱ͍͏ҙຯͰ͸ CNN ͳͲͷ૚ͷ਺΋ಉ༷Ͱ͢ ͕ɺ1.2 અͰ͝঺հͨ͠ ResNet ͷܗࣜΛऔΓೖΕΔ͜ͱͰͪ͜Β͸͋Δఔ ౓ճආ͕Մೳͩͱࢥ͍·͢ɻ ·ͨɺ Attention Λܭࢉ͢Δ্Ͱཧղ͓͔ͯ͠ͳͯ͘͸͍͚ͳ͍ͷ͕ɺ ॏΈ ͷܭࢉͰ͢ɻ͜ͷॏΈͷܭࢉʹؔͯ͠͸৭ʑͱํ๏͕͋Γ·͕͢ɺೖྗܥྻ ΛॏΈͷܭࢉʹͦͷ··ར༻Ͱ͖Δ self-attention ͕༗ྗͰɺTransformer ΋ self-attention ͷߏ଄ʹجͮ͘ͱߟ͑Δ͜ͱ͕Ͱ͖·͢ɻTransformer Ͱ༻͍ΒΕ͍ͯΔ self-attention ͷߏ଄ͱͯ͠͸ɺ2.3 અͷ Dot Product Attention ΍ 2.4 અͷ Transformer ϞδϡʔϧͰऔΓѻ͍·͢ɻ *1 LSTM ΍ GRU ΋ಉ༷ʹ௕͍ܥྻ͕औΓѻ͑ΔΑ͏ͳվྑͰ͋Δͱཧղͯ͠ྑ͍ͱࢥ ΘΕ·͕͢ɺ໌ࣔతʹ࿨ͷܗࣜͰද͞ΕΔ Attention ͷԋࢉͷύϑΥʔϚϯεʹ͸ٴ͹ ͳ͍ͱ͍͏ͷ͕ҰൠతͳධՁͱղऍͰ͖Δͱࢥ͍·͢ɻ 17
  4. ୈ 2 ষ Transformer ͷجຊཧղ 2.2 Message Passing ͱ GNN

    2.2 અͰ͸ Message Passing ͱ Graph Neural Network ʹ͍ͭͯऔΓѻ ͍·͢ɻ https://www.amazon.co.jp/dp/B08JGM3JNP ৄ͘͠͸্هͷୈ 3 ষʙୈ 4 ষͰऔΓѻ͍·͕ͨ͠ɺTransformer ͷ Dot Product Attention(self-attention ͷҰछ) ͸ϝοηʔδ఻೻ͷ࿮૊Έ (Message Passing Paradigm) Λར༻ͨ͠ Graph Neural Network ͱͯ͠ཧ ղ͢Δ͜ͱ͕Ͱ͖·͢ɻ Dot Product Attention ͷॲཧ͸ 2.3 અͰऔΓѻ͍·͕͢ɺDot Product Attention Λ୯ʹॲཧͷྲྀΕͱͯ͠೺Ѳ͢Δ͚ͩͩͱͳ͔ͳ͔Πϝʔδ͕༙ ͖ͮΒ͍Ͱ͢ɻͦ͜Ͱ 2.2 અͰ͸ 2.3 અͰҙຯ߹͍Λཧղ͢Δʹ͋ͨͬͯ ͷલஈ֊ͱͯ͠ Message Passing Paradigm ʹج͍ͮͯάϥϑχϡʔϥϧ ωοτϫʔΫ (Graph Neural Network) ͷ֓ཁΛ֬ೝ͠·͢*2 ɻ ˛ਤ 2.3 άϥϑͷྫ (Wikipedia άϥϑཧ࿦ΑΓ) https://ja.wikipedia.org/wiki/άϥϑཧ࿦ *2 ʮMessage Passing Paradigm ͸άϥϑχϡʔϥϧωοτϫʔΫͷݚڀͷ 1 ͭͰ͋Δʯ ͱ͍͏ͷ͕ҰൠతͳධՁͰ͋ΔҰํͰɺ ʮάϥϑχϡʔϥϧωοτϫʔΫ͸ Message Passing Paradigm Ͱ֓ͶදͤΔʯͱղऍ͢Δ͜ͱ΋Մೳͩͱࢥ͍·͢ɻ 18
  5. 2.2 Message Passing ͱ GNN ·ͣɺάϥϑʹ͍ͭͯ͸ਤ 2.3 ͷΑ͏ʹ఺ (ϊʔυ) ͱઢ

    (Τοδ) Ͱؔ܎ ੑΛදͨ͠΋ͷͱཧղ͓ͯ͘͠ͱྑ͍Ͱ͢ɻ۩ମతʹ͸Ӻͷ࿏ઢਤͳͲ͕Θ ͔Γ΍͍͢Ͱ͢ɻͯ͞ɺӺͷ࿏ઢਤ͕͋Δࡍʹࠞࡶ۩߹ͳͲΛ༧ଌ͍ͨ͠ࡍ ͳͲͷϞσϦϯάʹศརͳͷ͕ Message Passing ͷߟ͑ํͰ͢ɻ ˛ਤ 2.4 Message Passing ͷ਺ࣜ (MPNN ࿦จΑΓ) https://arxiv.org/abs/1704.01212 ਺ࣜʹ͍ͭͯ͸ਤ 2.4 ͷΑ͏ʹද͢͜ͱ͕Ͱ͖·͕͢ɺͬ͘͟Γ௫ΉͳΒ ਤ 2.4 ͷ (1) ͕ࣜిंʹ৐Δɺ(2) ͕ࣜిं͔Β߱ΓΔΠϝʔδͰ௫Ήͷ͕ ྑ͍͔ͱࢥ͍·͢*3 ɻ ˛ਤ 2.5 Message Passing ͷΠϝʔδ (ۙ๣ͷӅΕ૚ͷ࿨ͷܭࢉ) *3 ͋͘·ͰΠϝʔδͳͷͰ͋·Γݫີʹߟ͑ա͗ͳ͍Ͱྑ͍ͱࢥ͍·͢ɻ 19
  6. ୈ 2 ষ Transformer ͷجຊཧղ ͜ͷ Message Passing Λ͋ΔӺ v1

    ʹ͍ͭͯண໨ͨ͠ͷ͕ਤ 2.5 ͱߟ͑Δ ͜ͱ͕Ͱ͖ΔͷͰ͕͢ɺ͜͜ʹύϥϝʔλͷֶशΛಋೖͨ͠ͷ͕άϥϑ৞Έ ࠐΈͰ͢ɻ ࣜ 2.1: άϥϑ৞ΈࠐΈ (ύϥϝʔλॲཧͱ৞ΈࠐΈ) h (l+1) i = σ ⎛ ⎝b(l) + j∈N(i) 1 cij h (l) j W(l) ⎞ ⎠ άϥϑ৞ΈࠐΈ͸ࣜ 2.1 ͷΑ͏ʹද͢͜ͱ͕Ͱ͖ɺW(l) ͕ύϥϝʔλΛද ͍ͯ͠Δͱߟ͑Δͱྑ͍Ͱ͢ɻ͜͜ͰύϥϝʔλΛಋೖ͢ΔҙٛΛӺͷࠞࡶ ۩߹ͷ༧ଌͷྫΛݩʹߟ͑ΔͱɺਓͷྲྀΕͷಋઢͷ֬อͳͲʹΑͬͯ͋ΔӺ Ͱ߱Γͨ৐٬ͷਓ਺͕ͦͷ··ࠞࡶ۩߹ʹ൓ө͞ΕΔ༁Ͱ͸ͳ͍͜ͱʹ͋Γ ·͢*4 ɻ͜ΕΛऔΓѻ͏ʹ͋ͨͬͯύϥϝʔλΛಋೖ͢Δ͜ͱͰɺͦΕͧΕ ͷӺͰ৐ͬͨ৐٬ͷ਺ͱ࿏ઢਤΛೖྗͱͯ͠༩͑ͯɺग़ྗΛͦΕͧΕͷӺͷ ࠞࡶ౓ͱͯ͠ڭࢣ͋ΓֶशΛ͢Ε͹ɺӺͷࠞࡶ౓ͷ༧ଌ͕ՄೳʹͳΓ·͢ɻ ্هͷΑ͏ʹάϥϑ৞ΈࠐΈΛ༻͍ͯߏஙͨ͠χϡʔϥϧωοτϫʔΫΛ άϥϑχϡʔϥϧωοτϫʔΫ (GNN; Graph Neural Network) ͱݺͿͷ Ͱ͕͢ɺ͜ͷߟ͑ํΛ༻͍Δ͜ͱͰྡ઀ؔ܎Λࢀߟʹͨ͠χϡʔϥϧωο τϫʔΫͷֶश͕ՄೳͰ͢ɻ2.2 અͷྫͰ͸ӺΛϊʔυɺ࿏ઢΛΤοδͱݟ ཱͯͯάϥϑχϡʔϥϧωοτϫʔΫΛߟ͑·͕ͨ͠ɺ୯ޠΛϊʔυɺͦΕ ͧΕͷ୯ޠͷྨࣅ౓ΛΤοδͱݟཱͯͯάϥϑ৞ΈࠐΈ (Attention ܭࢉ + ϑΟʔυϑΥϫʔυͷύϥϝʔλܭࢉ) Λߦͬͨͷ͕ Transformer Ͱ͢ɻ https://www.amazon.co.jp/dp/B08B4SBQL7 ্هͷୈ 5 ষͱୈ 7 ষͰऔΓѻͬͨ BoW Λݩʹͨ͠จॻ෼ྨ΍ωοτ ϫʔΫ෼ੳͷߟ͑ํΛจॻ୯ҐͰ͸ͳ͘୯ޠ୯Ґʹద༻ͨ͠ͱߟ͑ͯ΋ಉ༷ *4 ͨͱ͑͹ରԠ͕ૣ͍Ӻһ͕ଟ͍Ӻͱͦ͏Ͱͳ͍ӺͰ͸ࠞࡶ౓ʹ͕ࠩग़ΔͩΖ͏ͱߟ͑Β ΕΔͱࢥ͍·͢ɻ 20
  7. 2.3 Dot Product Attention ʹਤ 2.6 ͷΑ͏ͳάϥϑΛ࡞੒͢Δ͜ͱ͕ՄೳͰ͢*5 ɻ ˛ਤ 2.6

    ωοτϫʔΫ෼ੳͱάϥϑ ͜ͷΑ͏ʹ୯ޠ΍จষͳͲʹج͍ͮͯάϥϑΛܭࢉ͢Δ͜ͱ͕Ͱ͖ΔͷͰ ͕͢ɺ͜ͷ͜ͱ͸ 2.3 અͷ Dot Product Attention ΍ 2.4 અͷ Transformer ϞδϡʔϧΛཧղ্͍ͯ͘͠Ͱඇৗʹ໾ʹཱͪ·͢ɻ 2.3 Dot Product Attention 2.3 અͰ͸ Dot Product Attention ͱ Multi-Head Attention ʹ͍ͭͯ֬ ೝ͠·͢ɻ *5 BoW ͸จॻͷ D ͱ୯ޠͷ W ͷߦྻͰ͕͢ɺD ʹରͯ͠ྨࣅ౓Λܭࢉ͢Δͷ΋ W ʹ ରͯ͠ྨࣅ౓Λܭࢉ͢Δͷ΋ͲͪΒ΋ Cos ྨࣅ౓ͳͲͰܭࢉͰ͖ΔͷͰɺख๏ͱͯ͠͸ ಉ༷ʹߟ͓͑ͯ͘͜ͱ͕ՄೳͰ͢ɻ 21
  8. ୈ 2 ষ Transformer ͷجຊཧղ ˛ਤ 2.7 Dot Product Attention

    ͱ Multi-Head Attention ͷ֓ཁਤ (Trans- former ࿦จ Figure 2) https://arxiv.org/abs/1706.03762 ਤ 2.7 ͕ Dot Product Attention ͱ Multi-Head Attention ͷ֓ཁਤͰ ͢ɻ਺ࣜ΋߹Θͤͯ֬ೝ͠·͢ɻ ˛ਤ 2.8 Dot Product Attention(Transformer ࿦จΑΓ) 22
  9. 2.3 Dot Product Attention ˛ਤ 2.9 Multi-Head Attention(Transformer ࿦จΑΓ) ਤ

    2.8 ͱਤ 2.9 ͸ Dot Product Attention ͱ Multi-Head Attention ʹͦ ΕͧΕରԠ͍ͯ͠·͢ɻ ͜͜·Ͱ֬ೝͨ͠࿦จͷਤͱ਺͚ࣜͩͰ͸গ͠Θ͔ΓͮΒ͍ͷͰҎԼॲཧ ֓ཁΛ֬ೝ͠·͢ɻ·ͣɺԿ͔͠ΒͷλεΫΛલఏͱͯ͠ߟ͑Δํ͕Θ͔Γ ΍͍͢ͷͰɺseq2seq ͳͲͱಉ༷ͷػց຋༁ (Machine Translation) λεΫ Λલఏʹߟ͑·͢ɻ ػց຋༁λεΫͰ೔ຊޠΛೖྗ͢Δʹ͋ͨͬͯ͸ʮࢲ ͸ ࠓ೔ ౦ژ ʹ ߦ͘ʯͷΑ͏ͳܗଶૉղੳͷ݁ՌͷܥྻΛೖྗ͠ɺͦΕͧΕͷ୯ޠʹ Word2vec ͷΑ͏ͳॲཧΛࢪͯ͠୯ޠΛύϥϝʔλͰදͨ͠දݱͰ͋Δ෼ࢄ දݱ (Distributed representation) Λಘ·͢*6 ɻ Attention ͷॲཧͰ͸ͦΕͧΕͷ୯ޠͷ಺ੵΛܭࢉͯ֬͠཰Խ͠ (Q ͱ K ͷߦྻͷੵʹରͯ͠ softmax ؔ਺Λ൓өͤ͞Δ)ɺͦΕʹج͍ͮͯॏΈ෇͚ ࿨ͷܭࢉ (Attention ॲཧ) Λߦ͍·͢ɻཁ͸͍ۙ͠෼ࢄදݱͰද͞ΕΔ୯ ޠ͕ͳΔ΂͘ޓ͍ʹ૬ޓ࡞༻͞ΕΔΑ͏ͳॲཧΛ࣮ݱ͍ͯ͠Δͱղऍ͢Ε ͹ྑ͍Ͱ͢ɻ·ͨɺ్தͷܭࢉͰ಺ੵ (Dot Product) Λܭࢉ͢Δ͜ͱ͔Β Dot Product Attention ͱ໊෇͚ΒΕ͍ͯΔ͜ͱ΋཈͓͑ͯ͘ͱྑ͍ͱࢥ͍ ·͢ɻ ͜͜·Ͱͷ಺༰͕ Dot Product Attention ͷॲཧͰ͕͢ɺ ͜ͷ Dot Prod- uct Attention ΛݩʹΞϯαϯϒϧతʹܭࢉΛߦͬͨͷ͕ Multi-Head At- *6 Transformer Ͱ͸جຊతʹ୯ޠΛ 512 ࣍ݩͷύϥϝʔλͰද͠·͢ɻ͜ͷ෼ࢄදݱΛ ಘΔॲཧͷ͜ͱΛ Embedding ͱ͍͏͜ͱ΋͋Γ·͢ɻ 23
  10. ୈ 2 ষ Transformer ͷجຊཧղ tention Ͱ͢ɻMulti-Head Λ༻͍Δ͜ͱͰΞϯαϯϒϧతͳؤ݈ੑ΍ɺฒ ྻॲཧͷͳͲ͕࣮ݱͰ͖Δͱߟ͑ͯྑ͍͔ͱࢥ͍·͢*7 ɻ

    2.4 Transformer Ϟδϡʔϧͷղऍ 2.4 અͰ͸ Transformer Ϟδϡʔϧͷղऍʹ͍ͭͯߦ͍·͢ɻ ˛ਤ 2.10 Transformer Ϟδϡʔϧͷશମ૾ (Transformer ࿦จ Figure 1) *7 Multi-Head Attention ·Ͱৄ͘͠औΓѻ͏ͱ΍΍೉͍͠ͷͱɺجຊతʹ Dot Product Attention ͚ͩͰ Transformer ͷߴ͍ύϑΥʔϚϯε͸े෼ཧղͰ͖ΔͷͰຊॻͰ͸ লུ͠·͢ɻ 24
  11. 2.4 Transformer Ϟδϡʔϧͷղऍ https://arxiv.org/abs/1706.03762 ·ͣɺਤ 2.10 ͕ Transfomer Ϟδϡʔϧͷ֓ཁͰ͢ɻࠨͷΤϯίʔμʔ ͱӈͷσίʔμʔΛ

    2.3 અͷ຋༁ͷྫͰߟ͑Δͱ೔ຊޠΛ಺෦දݱʹม͑Δ ͷ͕ΤϯίʔμʔɺӳޠΛੜ੒͢ΔͷΛσίʔμͱͦΕͧΕߟ͑Δͱྑ͍ Ͱ͢ɻ 2.3 અͰऔΓѻͬͨ Dot Product Attention ΍ Multi-Head Attention ͸ ਤ 2.10 ʹ͓͍ͯΦϨϯδͰදݱ͞Ε͍ͯ·͢ɻ͜Ε͸ 2.2 અͰऔΓѻͬͨ άϥϑ৞ΈࠐΈʹ͓͚Δ Message Passing ʹରԠ͍ͯ͠Δͱߟ͑Δ͜ͱ͕ Ͱ͖·͢ɻ·ͨɺਤʹ͓͍ͯਫ৭Ͱࣔ͞Εͨ Feed Forward ͸֤୯ޠ͝ͱʹ ύϥϝʔλֶशΛߦ͓ͬͯΓɺ͜Ε΋ 2.2 અͰऔΓѻͬͨάϥϑ৞ΈࠐΈʹ ͓͍ͯϊʔυ͝ͱʹӅΕ૚ͷϕΫτϧͱύϥϝʔλߦྻͷੵΛܭࢉ͢Δ͜ͱ ʹ૬౰͍ͯ͠Δͱߟ͑Δ͜ͱ͕Ͱ͖·͢ɻ ˛ਤ 2.11 Dot Product Attention ͱ GNN 25
  12. ୈ 2 ষ Transformer ͷجຊཧղ https://www.hello-statisticians.com/ml/deeplearning/trans former1.html Transformer ͱάϥϑχϡʔϥϧωοτϫʔΫͱͷରԠ͸ਤ 2.11

    ͷΑ͏ ʹղऍ͢Δ͜ͱ͕ՄೳͰ͢ɻTransformer ͸ GNN ʹ͓͚ΔάϥϑΛͦΕͧ Εͷ୯ޠͷ෼ࢄදݱ΍ӅΕ૚ͷྨࣅ౓ʹج͍ͮͯࣗಈੜ੒ͨ͠ GNN Ͱ͋Δ ͱ֓ͶղऍͰ͖·͢ɻ ˛ਤ 2.12 Reformer ࿦จ https://arxiv.org/abs/2001.04451 ·ͨɺ͜͜·Ͱͷ Transformer Ϟδϡʔϧͷղऍ͸ Reformer ΛབྷΊΔ͜ ͱͰ͞Βʹߟ࡯͕ՄೳʹͳΔͷͰ؆୯ʹ͝঺հ͠·͢ɻ 26
  13. 2.4 Transformer Ϟδϡʔϧͷղऍ ˛ਤ 2.13 LSH Attention(Reformer ࿦จ Figure 2)

    Reformer Ͱ͸ਤ 2.13 Ͱද͞ΕΔ LSH(Locality Sensitive Hashing) At- tention Ͱ͸ɺAttention ॲཧΛಛఆͷ୯ޠ͚ͩʹߜͬͯߦ͏͜ͱͰܭࢉޮ ཰Λ޲্ͤ͞ɺ1,000 ୯ޠલޙͷऔΓѻ͍͕த৺ͷ Transformer ೿ੜͷݚ ڀʹର͠ɺͦͷ਺ेഒҎ্ͷ୯ޠΛऔΓѻ͑ΔΑ͏ʹͨ͠ݚڀͰ͢ɻ͜ͷ Reformer Ͱ༻͍ΒΕ͍ͯΔߟ͑ํ΍ͦͷ࿦ཧల։͸άϥϑͷऔΓѻ͍Λ͔ ͳΓߟྀͨ͠಺༰Ͱ͋ΓɺάϥϑχϡʔϥϧωοτϫʔΫͱ߹Θͤͯ೺Ѳ͠ ͓ͯ͘ͱཧղ͕ਂ·Γ·͢ɻ ΋͏গ͠ߟ࡯͢ΔͳΒɺTransformer ͸શͯͷ୯ޠಉ࢜ʹ͍ͭͯ Atten- tion ॲཧΛߦ͏ͱ͍͏ιϑτͳߏ଄ɺReformer ͸ಛఆͷؔ࿈ੑͷߴ͍୯ޠ ಉ࢜ʹ͍͔ͭͯ͠ܭࢉΛߦΘͳ͍ϋʔυͳߏ଄ͱߟ͑Δ͜ͱ͕Ͱ͖Δ͔ͱࢥ ͍·͢ɻάϥϑχϡʔϥϧωοτϫʔΫͱͯ͠ Transformer Λཧղ͢Δ͜ ͱͰɺ͜ͷΑ͏ʹҰݟෳࡶͰ೉ͦ͠͏ʹݟ͑Δ Transformer ೿ੜͷॲཧΛ ୯ޠΛϊʔυͱΈͳͨ͠άϥϑχϡʔϥϧωοτϫʔΫͱͯ͠ཧղ͢Δ͜ͱ ͕Ͱ͖ɺ௚ײతͳཧղͱߟ࡯͕ՄೳʹͳΓ·͢ɻ 27
  14. ୈ 2 ষ Transformer ͷجຊཧղ 2.5 BERT ͷωοτϫʔΫͷߏ଄ 2.5 અͰ͸

    Transformer Λݩʹͨ͠ݴޠॲཧͷࣄલֶशϞσϦϯάͷ BERT ʹ͍ͭͯ͝঺հ͠·͢ɻಛʹ 3.2 અͰ͝঺հ͢Δ Vision Trans- former(ViT) ͸ BERT ͷωοτϫʔΫߏ଄Λͦͷ··༻͍͍ͯΔͨΊɺ BERT ͷ೺Ѳʹ͍ͭͯ͸ઌʹߦ͓ͬͯ͘ํ͕๬·͍͠Ͱ͢ɻBERT ͸ Transformer Λϕʔεʹͨ͠ϞσϦϯάͰ͋Γɺࣄલֶशͱͯ͠ Masked Language Modeling ΍ Next Sentence Prediction Λ༻͍͍ͯ·͢ɻ ˛ਤ 2.14 BERT ͷ BASE ͱ LARGE(BERT ࿦จΑΓ) https://arxiv.org/abs/1810.04805 BERT ͷ BASE ͱ LARGE ͸্هͷαΠζͰ͋Γɺେମͷن໛ײΛ཈͑ ͓ͯ͘ͱྑ͍ͱࢥ͍·͢ɻBERT ͷ BASE ʹ͓͍ͯ͸ɺ૚ͷ਺͕ 12ɺӅΕ ૚ (୯ޠͷ෼ࢄදݱͷ࣍ݩ) ͕ 768ɺ૯ύϥϝʔλ਺͕ 1.1 ԯͱ͍͏਺ࣈ͸ಛ ʹ཈͓͑ͯ͘ͱྑ͍Ͱ͢ɻ BERT ʹ͍ͭͯ͸ Transformer ͕ཧղͰ͖͍ͯΕ͹ͬ͟ͱཧղ͍ͯ͠Δ Ͱे෼ͳͷͰຊॻͰ͸ৄ͘͠औΓѻ͍·ͤΜɻΑΓৄ֬͘͠ೝ͍ͨ͠ํ͸Լ هͰৄ͘͠औΓѻ͍·ͨ͠ͷͰԼهͳͲΛ֬͝ೝ͍ͩ͘͞ɻ https://lib-arts.booth.pm/items/1834866 28