Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Transformer and Graph Neural Network

Sponsored · Your Podcast. Everywhere. Effortlessly. Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
Avatar for LiberalArts LiberalArts
February 22, 2021

Transformer and Graph Neural Network

下記で取り扱ったTransformerの簡易的な解説を公開します。

・Transformer と画像処理
https://lib-arts.booth.pm/items/2741653

TransformerはLLMのベースに用いられるなど、応用の幅が広いので抑えておくと良いと思います。

Avatar for LiberalArts

LiberalArts

February 22, 2021
Tweet

More Decks by LiberalArts

Other Decks in Research

Transcript

  1. ୈ 2 ষ Transformer ͷجຊཧղ ୈ 2 ষͰ͸ɺ ຊॻͷٕज़తͳཧղͷϕʔεͱͯ͠ Transformer

    ͷج ຊͷཧղʹ͍ͭͯऔΓѻ͍·͢ɻҎԼɺ 2.1 અͰ͸ RNN ͱ Attention ͷ֓ཁʹ͍ͭͯऔΓѻ͍ɺ2.3 અͱ 2.4 અͰ Dot Product Attention Λத৺ͱ͢Δ Transformer Ϟδϡʔϧͷղऍ͕ߦ͑ΔΑ͏ʹ 2.2 અ ͰͦͷϕʔεͱͳΔߟ͑ํͱͯ͠ Message Passing ͱάϥϑχϡʔ ϥϧωοτϫʔΫ (Graph Neural Network) ʹ͍ͭͯ͝঺հ͠·͢ɻ 2.4 અ·ͰͰ Transformer ʹ͍ͭͯ೺Ѳ্ͨ͠Ͱɺ2.5 અͰ͸ 3.2 અ ͰऔΓѻ͏ Vision Transformer(ViT) Ͱ΋༻͍ΒΕΔ BERT ͷωο τϫʔΫߏ੒ʹ͍ͭͯ֬ೝ͠·͢ɻ 2.1 RNN ͱ Attention 2.1 અͰ͸ RNN ͱ Attention ʹ͍ͭͯऔΓѻ͍·͢ɻ https://lib-arts.booth.pm/items/1834866 RNN ͱ Attention ʹ͍ͭͯ͸্هͷୈ 1 ষʙୈ 2 ষͰৄ͘͠औΓѻ͍· ͨ͠ͷͰɺຊॻͰ͸؆୯ͳ֓ཁͱͦͷղऍʹ͍ͭͯ·ͱΊ͍ͨͱࢥ͍·͢ɻ 15
  2. ୈ 2 ষ Transformer ͷجຊཧղ ˛ਤ 2.1 RNN ϕʔεͷܥྻม׵ϞσϦϯά (Seq2seq

    ࿦จ Figure 1) https://arxiv.org/abs/1409.3215 ˛ਤ 2.2 ܥྻม׵ϞσϦϯά΁ͷ Attention ͷಋೖ (Attention ࿦จΑΓ) https://arxiv.org/abs/1409.0473 ਤ 2.1 ͱਤ 2.2 ͸ͦΕͧΕ RNN ܥΛ༻͍ͨ Seq2seq ͱɺAttention Λಋ ೖͨ͠ݚڀͰ͢ɻͲͪΒ΋ػց຋༁ (Machine Translation) ʹ͍ͭͯऔΓ૊ 16
  3. 2.1 RNN ͱ Attention Μͩಉ࣌ظͷݚڀͰ͢ɻRNN ͸࠶ؼతʹχϡʔϥϧωοτϫʔΫΛܭࢉ͠ ͍ͯ͘ͷʹରͯ͠ɺAttention ͸ӅΕ૚ͷॏΈ෇͚࿨Λܭࢉ͠·͢ɻ ͯ͞ɺ΋ͪΖΜ͜ΕΒͷܭࢉͷྲྀΕͷ೺Ѳ͸ॏཁͳͷͰ͕͢ɺΑΓॏཁ ͳͷ͸ͦΕͧΕͷॲཧͷߏ଄͕ͲͷΑ͏ͳҙຯΛ͔࣋ͭͰ͢ɻRNN

    ܥ͸ܥ ྻͷϞσϦϯάΛऔΓѻ͏ʹ͋ͨͬͯͷγϯϓϧͳߟ͑ํͰ͸͋ΔҰํͰɺ ໰୊͕͋Δͱ͢Ε͹ܥྻ͕௕͘ͳΔʹͭΕͯ৘ใͷऔΓѻ͍͕೉͘͠ͳΔ ͱ͍͏఺Ͱ͢ɻLSTM ΍ GRU ͳͲɺ௕͍ܥྻΛऔΓѻ͏ͨΊʹ༷ʑͳ޻ ෉͕ͳ͞Ε͖ͯ·͕ͨ͠ɺ݁ہͷͱ͜ΖϞδϡʔϧશମͰ౳ൺ਺ྻతʹ৘ ใ͕ࣦΘΕΔ͜ͱʹ͸มΘΓͳ͘ɺ௕͍ܥྻͷऔΓѻ͍ʹ೉͕͋Γ·ͨ͠ɻ ͨͱ͑͹ RNN ͷ఻೻ʹ͋ͨͬͯ৘ใ͕൒෼͔͠࢒Βͳ͍ͱԾఆ͢Δͱɺ 0.510 = 0.000976 · · · ͷΑ͏ʹܥྻͷ௕͕͞ 10 Ͱ͋Δ͚ͩͰ࠷ॳͷ୯ޠͷ ৘ใ͸ 0.1 ˋҎԼ͔͠࢒Βͳ͘ͳΓ·͢ɻ ͜Εʹର͠ɺAttention ͸ॏΈ෇͚࿨ͱͯ͠ܭࢉ͢ΔͨΊɺ཭Εͨ৘ใ΋ ͦΕ΄Ͳແཧͳ͘औΓѻ͑·͢*1 ɻAttention ॲཧͰ͸ܭࢉॲཧͷաఔ͕౳ ൺ਺ྻతͰͳ͍͚ͩͰɺ཭Εͨ৘ใͷऔΓѻ͍͕֨ஈʹ༰қʹͳΓ·͢ɻͪ ͳΈʹ౳ൺ਺ྻతͳऔΓѻ͍ͱ͍͏ҙຯͰ͸ CNN ͳͲͷ૚ͷ਺΋ಉ༷Ͱ͢ ͕ɺ1.2 અͰ͝঺հͨ͠ ResNet ͷܗࣜΛऔΓೖΕΔ͜ͱͰͪ͜Β͸͋Δఔ ౓ճආ͕Մೳͩͱࢥ͍·͢ɻ ·ͨɺ Attention Λܭࢉ͢Δ্Ͱཧղ͓͔ͯ͠ͳͯ͘͸͍͚ͳ͍ͷ͕ɺ ॏΈ ͷܭࢉͰ͢ɻ͜ͷॏΈͷܭࢉʹؔͯ͠͸৭ʑͱํ๏͕͋Γ·͕͢ɺೖྗܥྻ ΛॏΈͷܭࢉʹͦͷ··ར༻Ͱ͖Δ self-attention ͕༗ྗͰɺTransformer ΋ self-attention ͷߏ଄ʹجͮ͘ͱߟ͑Δ͜ͱ͕Ͱ͖·͢ɻTransformer Ͱ༻͍ΒΕ͍ͯΔ self-attention ͷߏ଄ͱͯ͠͸ɺ2.3 અͷ Dot Product Attention ΍ 2.4 અͷ Transformer ϞδϡʔϧͰऔΓѻ͍·͢ɻ *1 LSTM ΍ GRU ΋ಉ༷ʹ௕͍ܥྻ͕औΓѻ͑ΔΑ͏ͳվྑͰ͋Δͱཧղͯ͠ྑ͍ͱࢥ ΘΕ·͕͢ɺ໌ࣔతʹ࿨ͷܗࣜͰද͞ΕΔ Attention ͷԋࢉͷύϑΥʔϚϯεʹ͸ٴ͹ ͳ͍ͱ͍͏ͷ͕ҰൠతͳධՁͱղऍͰ͖Δͱࢥ͍·͢ɻ 17
  4. ୈ 2 ষ Transformer ͷجຊཧղ 2.2 Message Passing ͱ GNN

    2.2 અͰ͸ Message Passing ͱ Graph Neural Network ʹ͍ͭͯऔΓѻ ͍·͢ɻ https://www.amazon.co.jp/dp/B08JGM3JNP ৄ͘͠͸্هͷୈ 3 ষʙୈ 4 ষͰऔΓѻ͍·͕ͨ͠ɺTransformer ͷ Dot Product Attention(self-attention ͷҰछ) ͸ϝοηʔδ఻೻ͷ࿮૊Έ (Message Passing Paradigm) Λར༻ͨ͠ Graph Neural Network ͱͯ͠ཧ ղ͢Δ͜ͱ͕Ͱ͖·͢ɻ Dot Product Attention ͷॲཧ͸ 2.3 અͰऔΓѻ͍·͕͢ɺDot Product Attention Λ୯ʹॲཧͷྲྀΕͱͯ͠೺Ѳ͢Δ͚ͩͩͱͳ͔ͳ͔Πϝʔδ͕༙ ͖ͮΒ͍Ͱ͢ɻͦ͜Ͱ 2.2 અͰ͸ 2.3 અͰҙຯ߹͍Λཧղ͢Δʹ͋ͨͬͯ ͷલஈ֊ͱͯ͠ Message Passing Paradigm ʹج͍ͮͯάϥϑχϡʔϥϧ ωοτϫʔΫ (Graph Neural Network) ͷ֓ཁΛ֬ೝ͠·͢*2 ɻ ˛ਤ 2.3 άϥϑͷྫ (Wikipedia άϥϑཧ࿦ΑΓ) https://ja.wikipedia.org/wiki/άϥϑཧ࿦ *2 ʮMessage Passing Paradigm ͸άϥϑχϡʔϥϧωοτϫʔΫͷݚڀͷ 1 ͭͰ͋Δʯ ͱ͍͏ͷ͕ҰൠతͳධՁͰ͋ΔҰํͰɺ ʮάϥϑχϡʔϥϧωοτϫʔΫ͸ Message Passing Paradigm Ͱ֓ͶදͤΔʯͱղऍ͢Δ͜ͱ΋Մೳͩͱࢥ͍·͢ɻ 18
  5. 2.2 Message Passing ͱ GNN ·ͣɺάϥϑʹ͍ͭͯ͸ਤ 2.3 ͷΑ͏ʹ఺ (ϊʔυ) ͱઢ

    (Τοδ) Ͱؔ܎ ੑΛදͨ͠΋ͷͱཧղ͓ͯ͘͠ͱྑ͍Ͱ͢ɻ۩ମతʹ͸Ӻͷ࿏ઢਤͳͲ͕Θ ͔Γ΍͍͢Ͱ͢ɻͯ͞ɺӺͷ࿏ઢਤ͕͋Δࡍʹࠞࡶ۩߹ͳͲΛ༧ଌ͍ͨ͠ࡍ ͳͲͷϞσϦϯάʹศརͳͷ͕ Message Passing ͷߟ͑ํͰ͢ɻ ˛ਤ 2.4 Message Passing ͷ਺ࣜ (MPNN ࿦จΑΓ) https://arxiv.org/abs/1704.01212 ਺ࣜʹ͍ͭͯ͸ਤ 2.4 ͷΑ͏ʹද͢͜ͱ͕Ͱ͖·͕͢ɺͬ͘͟Γ௫ΉͳΒ ਤ 2.4 ͷ (1) ͕ࣜిंʹ৐Δɺ(2) ͕ࣜిं͔Β߱ΓΔΠϝʔδͰ௫Ήͷ͕ ྑ͍͔ͱࢥ͍·͢*3 ɻ ˛ਤ 2.5 Message Passing ͷΠϝʔδ (ۙ๣ͷӅΕ૚ͷ࿨ͷܭࢉ) *3 ͋͘·ͰΠϝʔδͳͷͰ͋·Γݫີʹߟ͑ա͗ͳ͍Ͱྑ͍ͱࢥ͍·͢ɻ 19
  6. ୈ 2 ষ Transformer ͷجຊཧղ ͜ͷ Message Passing Λ͋ΔӺ v1

    ʹ͍ͭͯண໨ͨ͠ͷ͕ਤ 2.5 ͱߟ͑Δ ͜ͱ͕Ͱ͖ΔͷͰ͕͢ɺ͜͜ʹύϥϝʔλͷֶशΛಋೖͨ͠ͷ͕άϥϑ৞Έ ࠐΈͰ͢ɻ ࣜ 2.1: άϥϑ৞ΈࠐΈ (ύϥϝʔλॲཧͱ৞ΈࠐΈ) h (l+1) i = σ ⎛ ⎝b(l) + j∈N(i) 1 cij h (l) j W(l) ⎞ ⎠ άϥϑ৞ΈࠐΈ͸ࣜ 2.1 ͷΑ͏ʹද͢͜ͱ͕Ͱ͖ɺW(l) ͕ύϥϝʔλΛද ͍ͯ͠Δͱߟ͑Δͱྑ͍Ͱ͢ɻ͜͜ͰύϥϝʔλΛಋೖ͢ΔҙٛΛӺͷࠞࡶ ۩߹ͷ༧ଌͷྫΛݩʹߟ͑ΔͱɺਓͷྲྀΕͷಋઢͷ֬อͳͲʹΑͬͯ͋ΔӺ Ͱ߱Γͨ৐٬ͷਓ਺͕ͦͷ··ࠞࡶ۩߹ʹ൓ө͞ΕΔ༁Ͱ͸ͳ͍͜ͱʹ͋Γ ·͢*4 ɻ͜ΕΛऔΓѻ͏ʹ͋ͨͬͯύϥϝʔλΛಋೖ͢Δ͜ͱͰɺͦΕͧΕ ͷӺͰ৐ͬͨ৐٬ͷ਺ͱ࿏ઢਤΛೖྗͱͯ͠༩͑ͯɺग़ྗΛͦΕͧΕͷӺͷ ࠞࡶ౓ͱͯ͠ڭࢣ͋ΓֶशΛ͢Ε͹ɺӺͷࠞࡶ౓ͷ༧ଌ͕ՄೳʹͳΓ·͢ɻ ্هͷΑ͏ʹάϥϑ৞ΈࠐΈΛ༻͍ͯߏஙͨ͠χϡʔϥϧωοτϫʔΫΛ άϥϑχϡʔϥϧωοτϫʔΫ (GNN; Graph Neural Network) ͱݺͿͷ Ͱ͕͢ɺ͜ͷߟ͑ํΛ༻͍Δ͜ͱͰྡ઀ؔ܎Λࢀߟʹͨ͠χϡʔϥϧωο τϫʔΫͷֶश͕ՄೳͰ͢ɻ2.2 અͷྫͰ͸ӺΛϊʔυɺ࿏ઢΛΤοδͱݟ ཱͯͯάϥϑχϡʔϥϧωοτϫʔΫΛߟ͑·͕ͨ͠ɺ୯ޠΛϊʔυɺͦΕ ͧΕͷ୯ޠͷྨࣅ౓ΛΤοδͱݟཱͯͯάϥϑ৞ΈࠐΈ (Attention ܭࢉ + ϑΟʔυϑΥϫʔυͷύϥϝʔλܭࢉ) Λߦͬͨͷ͕ Transformer Ͱ͢ɻ https://www.amazon.co.jp/dp/B08B4SBQL7 ্هͷୈ 5 ষͱୈ 7 ষͰऔΓѻͬͨ BoW Λݩʹͨ͠จॻ෼ྨ΍ωοτ ϫʔΫ෼ੳͷߟ͑ํΛจॻ୯ҐͰ͸ͳ͘୯ޠ୯Ґʹద༻ͨ͠ͱߟ͑ͯ΋ಉ༷ *4 ͨͱ͑͹ରԠ͕ૣ͍Ӻһ͕ଟ͍Ӻͱͦ͏Ͱͳ͍ӺͰ͸ࠞࡶ౓ʹ͕ࠩग़ΔͩΖ͏ͱߟ͑Β ΕΔͱࢥ͍·͢ɻ 20
  7. 2.3 Dot Product Attention ʹਤ 2.6 ͷΑ͏ͳάϥϑΛ࡞੒͢Δ͜ͱ͕ՄೳͰ͢*5 ɻ ˛ਤ 2.6

    ωοτϫʔΫ෼ੳͱάϥϑ ͜ͷΑ͏ʹ୯ޠ΍จষͳͲʹج͍ͮͯάϥϑΛܭࢉ͢Δ͜ͱ͕Ͱ͖ΔͷͰ ͕͢ɺ͜ͷ͜ͱ͸ 2.3 અͷ Dot Product Attention ΍ 2.4 અͷ Transformer ϞδϡʔϧΛཧղ্͍ͯ͘͠Ͱඇৗʹ໾ʹཱͪ·͢ɻ 2.3 Dot Product Attention 2.3 અͰ͸ Dot Product Attention ͱ Multi-Head Attention ʹ͍ͭͯ֬ ೝ͠·͢ɻ *5 BoW ͸จॻͷ D ͱ୯ޠͷ W ͷߦྻͰ͕͢ɺD ʹରͯ͠ྨࣅ౓Λܭࢉ͢Δͷ΋ W ʹ ରͯ͠ྨࣅ౓Λܭࢉ͢Δͷ΋ͲͪΒ΋ Cos ྨࣅ౓ͳͲͰܭࢉͰ͖ΔͷͰɺख๏ͱͯ͠͸ ಉ༷ʹߟ͓͑ͯ͘͜ͱ͕ՄೳͰ͢ɻ 21
  8. ୈ 2 ষ Transformer ͷجຊཧղ ˛ਤ 2.7 Dot Product Attention

    ͱ Multi-Head Attention ͷ֓ཁਤ (Trans- former ࿦จ Figure 2) https://arxiv.org/abs/1706.03762 ਤ 2.7 ͕ Dot Product Attention ͱ Multi-Head Attention ͷ֓ཁਤͰ ͢ɻ਺ࣜ΋߹Θͤͯ֬ೝ͠·͢ɻ ˛ਤ 2.8 Dot Product Attention(Transformer ࿦จΑΓ) 22
  9. 2.3 Dot Product Attention ˛ਤ 2.9 Multi-Head Attention(Transformer ࿦จΑΓ) ਤ

    2.8 ͱਤ 2.9 ͸ Dot Product Attention ͱ Multi-Head Attention ʹͦ ΕͧΕରԠ͍ͯ͠·͢ɻ ͜͜·Ͱ֬ೝͨ͠࿦จͷਤͱ਺͚ࣜͩͰ͸গ͠Θ͔ΓͮΒ͍ͷͰҎԼॲཧ ֓ཁΛ֬ೝ͠·͢ɻ·ͣɺԿ͔͠ΒͷλεΫΛલఏͱͯ͠ߟ͑Δํ͕Θ͔Γ ΍͍͢ͷͰɺseq2seq ͳͲͱಉ༷ͷػց຋༁ (Machine Translation) λεΫ Λલఏʹߟ͑·͢ɻ ػց຋༁λεΫͰ೔ຊޠΛೖྗ͢Δʹ͋ͨͬͯ͸ʮࢲ ͸ ࠓ೔ ౦ژ ʹ ߦ͘ʯͷΑ͏ͳܗଶૉղੳͷ݁ՌͷܥྻΛೖྗ͠ɺͦΕͧΕͷ୯ޠʹ Word2vec ͷΑ͏ͳॲཧΛࢪͯ͠୯ޠΛύϥϝʔλͰදͨ͠දݱͰ͋Δ෼ࢄ දݱ (Distributed representation) Λಘ·͢*6 ɻ Attention ͷॲཧͰ͸ͦΕͧΕͷ୯ޠͷ಺ੵΛܭࢉͯ֬͠཰Խ͠ (Q ͱ K ͷߦྻͷੵʹରͯ͠ softmax ؔ਺Λ൓өͤ͞Δ)ɺͦΕʹج͍ͮͯॏΈ෇͚ ࿨ͷܭࢉ (Attention ॲཧ) Λߦ͍·͢ɻཁ͸͍ۙ͠෼ࢄදݱͰද͞ΕΔ୯ ޠ͕ͳΔ΂͘ޓ͍ʹ૬ޓ࡞༻͞ΕΔΑ͏ͳॲཧΛ࣮ݱ͍ͯ͠Δͱղऍ͢Ε ͹ྑ͍Ͱ͢ɻ·ͨɺ్தͷܭࢉͰ಺ੵ (Dot Product) Λܭࢉ͢Δ͜ͱ͔Β Dot Product Attention ͱ໊෇͚ΒΕ͍ͯΔ͜ͱ΋཈͓͑ͯ͘ͱྑ͍ͱࢥ͍ ·͢ɻ ͜͜·Ͱͷ಺༰͕ Dot Product Attention ͷॲཧͰ͕͢ɺ ͜ͷ Dot Prod- uct Attention ΛݩʹΞϯαϯϒϧతʹܭࢉΛߦͬͨͷ͕ Multi-Head At- *6 Transformer Ͱ͸جຊతʹ୯ޠΛ 512 ࣍ݩͷύϥϝʔλͰද͠·͢ɻ͜ͷ෼ࢄදݱΛ ಘΔॲཧͷ͜ͱΛ Embedding ͱ͍͏͜ͱ΋͋Γ·͢ɻ 23
  10. ୈ 2 ষ Transformer ͷجຊཧղ tention Ͱ͢ɻMulti-Head Λ༻͍Δ͜ͱͰΞϯαϯϒϧతͳؤ݈ੑ΍ɺฒ ྻॲཧͷͳͲ͕࣮ݱͰ͖Δͱߟ͑ͯྑ͍͔ͱࢥ͍·͢*7 ɻ

    2.4 Transformer Ϟδϡʔϧͷղऍ 2.4 અͰ͸ Transformer Ϟδϡʔϧͷղऍʹ͍ͭͯߦ͍·͢ɻ ˛ਤ 2.10 Transformer Ϟδϡʔϧͷશମ૾ (Transformer ࿦จ Figure 1) *7 Multi-Head Attention ·Ͱৄ͘͠औΓѻ͏ͱ΍΍೉͍͠ͷͱɺجຊతʹ Dot Product Attention ͚ͩͰ Transformer ͷߴ͍ύϑΥʔϚϯε͸े෼ཧղͰ͖ΔͷͰຊॻͰ͸ লུ͠·͢ɻ 24
  11. 2.4 Transformer Ϟδϡʔϧͷղऍ https://arxiv.org/abs/1706.03762 ·ͣɺਤ 2.10 ͕ Transfomer Ϟδϡʔϧͷ֓ཁͰ͢ɻࠨͷΤϯίʔμʔ ͱӈͷσίʔμʔΛ

    2.3 અͷ຋༁ͷྫͰߟ͑Δͱ೔ຊޠΛ಺෦දݱʹม͑Δ ͷ͕ΤϯίʔμʔɺӳޠΛੜ੒͢ΔͷΛσίʔμͱͦΕͧΕߟ͑Δͱྑ͍ Ͱ͢ɻ 2.3 અͰऔΓѻͬͨ Dot Product Attention ΍ Multi-Head Attention ͸ ਤ 2.10 ʹ͓͍ͯΦϨϯδͰදݱ͞Ε͍ͯ·͢ɻ͜Ε͸ 2.2 અͰऔΓѻͬͨ άϥϑ৞ΈࠐΈʹ͓͚Δ Message Passing ʹରԠ͍ͯ͠Δͱߟ͑Δ͜ͱ͕ Ͱ͖·͢ɻ·ͨɺਤʹ͓͍ͯਫ৭Ͱࣔ͞Εͨ Feed Forward ͸֤୯ޠ͝ͱʹ ύϥϝʔλֶशΛߦ͓ͬͯΓɺ͜Ε΋ 2.2 અͰऔΓѻͬͨάϥϑ৞ΈࠐΈʹ ͓͍ͯϊʔυ͝ͱʹӅΕ૚ͷϕΫτϧͱύϥϝʔλߦྻͷੵΛܭࢉ͢Δ͜ͱ ʹ૬౰͍ͯ͠Δͱߟ͑Δ͜ͱ͕Ͱ͖·͢ɻ ˛ਤ 2.11 Dot Product Attention ͱ GNN 25
  12. ୈ 2 ষ Transformer ͷجຊཧղ https://www.hello-statisticians.com/ml/deeplearning/trans former1.html Transformer ͱάϥϑχϡʔϥϧωοτϫʔΫͱͷରԠ͸ਤ 2.11

    ͷΑ͏ ʹղऍ͢Δ͜ͱ͕ՄೳͰ͢ɻTransformer ͸ GNN ʹ͓͚ΔάϥϑΛͦΕͧ Εͷ୯ޠͷ෼ࢄදݱ΍ӅΕ૚ͷྨࣅ౓ʹج͍ͮͯࣗಈੜ੒ͨ͠ GNN Ͱ͋Δ ͱ֓ͶղऍͰ͖·͢ɻ ˛ਤ 2.12 Reformer ࿦จ https://arxiv.org/abs/2001.04451 ·ͨɺ͜͜·Ͱͷ Transformer Ϟδϡʔϧͷղऍ͸ Reformer ΛབྷΊΔ͜ ͱͰ͞Βʹߟ࡯͕ՄೳʹͳΔͷͰ؆୯ʹ͝঺հ͠·͢ɻ 26
  13. 2.4 Transformer Ϟδϡʔϧͷղऍ ˛ਤ 2.13 LSH Attention(Reformer ࿦จ Figure 2)

    Reformer Ͱ͸ਤ 2.13 Ͱද͞ΕΔ LSH(Locality Sensitive Hashing) At- tention Ͱ͸ɺAttention ॲཧΛಛఆͷ୯ޠ͚ͩʹߜͬͯߦ͏͜ͱͰܭࢉޮ ཰Λ޲্ͤ͞ɺ1,000 ୯ޠલޙͷऔΓѻ͍͕த৺ͷ Transformer ೿ੜͷݚ ڀʹର͠ɺͦͷ਺ेഒҎ্ͷ୯ޠΛऔΓѻ͑ΔΑ͏ʹͨ͠ݚڀͰ͢ɻ͜ͷ Reformer Ͱ༻͍ΒΕ͍ͯΔߟ͑ํ΍ͦͷ࿦ཧల։͸άϥϑͷऔΓѻ͍Λ͔ ͳΓߟྀͨ͠಺༰Ͱ͋ΓɺάϥϑχϡʔϥϧωοτϫʔΫͱ߹Θͤͯ೺Ѳ͠ ͓ͯ͘ͱཧղ͕ਂ·Γ·͢ɻ ΋͏গ͠ߟ࡯͢ΔͳΒɺTransformer ͸શͯͷ୯ޠಉ࢜ʹ͍ͭͯ Atten- tion ॲཧΛߦ͏ͱ͍͏ιϑτͳߏ଄ɺReformer ͸ಛఆͷؔ࿈ੑͷߴ͍୯ޠ ಉ࢜ʹ͍͔ͭͯ͠ܭࢉΛߦΘͳ͍ϋʔυͳߏ଄ͱߟ͑Δ͜ͱ͕Ͱ͖Δ͔ͱࢥ ͍·͢ɻάϥϑχϡʔϥϧωοτϫʔΫͱͯ͠ Transformer Λཧղ͢Δ͜ ͱͰɺ͜ͷΑ͏ʹҰݟෳࡶͰ೉ͦ͠͏ʹݟ͑Δ Transformer ೿ੜͷॲཧΛ ୯ޠΛϊʔυͱΈͳͨ͠άϥϑχϡʔϥϧωοτϫʔΫͱͯ͠ཧղ͢Δ͜ͱ ͕Ͱ͖ɺ௚ײతͳཧղͱߟ࡯͕ՄೳʹͳΓ·͢ɻ 27
  14. ୈ 2 ষ Transformer ͷجຊཧղ 2.5 BERT ͷωοτϫʔΫͷߏ଄ 2.5 અͰ͸

    Transformer Λݩʹͨ͠ݴޠॲཧͷࣄલֶशϞσϦϯάͷ BERT ʹ͍ͭͯ͝঺հ͠·͢ɻಛʹ 3.2 અͰ͝঺հ͢Δ Vision Trans- former(ViT) ͸ BERT ͷωοτϫʔΫߏ଄Λͦͷ··༻͍͍ͯΔͨΊɺ BERT ͷ೺Ѳʹ͍ͭͯ͸ઌʹߦ͓ͬͯ͘ํ͕๬·͍͠Ͱ͢ɻBERT ͸ Transformer Λϕʔεʹͨ͠ϞσϦϯάͰ͋Γɺࣄલֶशͱͯ͠ Masked Language Modeling ΍ Next Sentence Prediction Λ༻͍͍ͯ·͢ɻ ˛ਤ 2.14 BERT ͷ BASE ͱ LARGE(BERT ࿦จΑΓ) https://arxiv.org/abs/1810.04805 BERT ͷ BASE ͱ LARGE ͸্هͷαΠζͰ͋Γɺେମͷن໛ײΛ཈͑ ͓ͯ͘ͱྑ͍ͱࢥ͍·͢ɻBERT ͷ BASE ʹ͓͍ͯ͸ɺ૚ͷ਺͕ 12ɺӅΕ ૚ (୯ޠͷ෼ࢄදݱͷ࣍ݩ) ͕ 768ɺ૯ύϥϝʔλ਺͕ 1.1 ԯͱ͍͏਺ࣈ͸ಛ ʹ཈͓͑ͯ͘ͱྑ͍Ͱ͢ɻ BERT ʹ͍ͭͯ͸ Transformer ͕ཧղͰ͖͍ͯΕ͹ͬ͟ͱཧղ͍ͯ͠Δ Ͱे෼ͳͷͰຊॻͰ͸ৄ͘͠औΓѻ͍·ͤΜɻΑΓৄ֬͘͠ೝ͍ͨ͠ํ͸Լ هͰৄ͘͠औΓѻ͍·ͨ͠ͷͰԼهͳͲΛ֬͝ೝ͍ͩ͘͞ɻ https://lib-arts.booth.pm/items/1834866 28