Upgrade to Pro — share decks privately, control downloads, hide ads and more …

チュートリアル:モデルマージ

Avatar for Rei Fujiyoshi Rei Fujiyoshi
June 03, 2025
1k

 チュートリアル:モデルマージ

(1)重みの単純平均によるモデルマージ
(2)同じ事前学習済みモデルを用いたモデルマージ
   Model soups [M.Wortsman+, ICML’22]
   Task Arithmetic [G.Iiharco+, ICLR’23]
(3)重みを整列してからモデルマージ
   OT Fusion [S.Singh+, NeurIPS’20]

github:https://github.com/machine-perception-robotics-group/Model-Fusion-via-Optimal-Transport-with-DepGraph-for-Residual-Networks/tree/main

Avatar for Rei Fujiyoshi

Rei Fujiyoshi

June 03, 2025
Tweet

Transcript

  1. w ෳ਺ͷϞσϧΛ୯ҰϞσϧʹ౷߹ɾू໿͢Δٕज़  ܰྔԽɾޮ཰Խɿ.PEFMTPVQT<.8PSUTNBO *$.-`>  ஌ࣝͷ༥߹ɿ5BTL"SJUINFUJD<(*JIBSDP *$-3`> ϞσϧϚʔδ 

    த෦େֶϩΰ த෦େֶϩΰ .PEFMTPVQT 5BTL"SJUINFUJD ϚʔδϞσϧ ϚʔδϞσϧ ΞϯαϯϒϧϞσϧͱಉ౳ͷ୯ҰϞσϧ ΞϯαϯϒϧϞσϧ̍ ΞϯαϯϒϧϞσϧk ਺ֶʹಛԽͨ͠--. ೔ຊޠʹಛԽͨ͠--. ਺ֶɾ೔ຊޠΛ༥߹ͨ͠--.
  2. w ॏΈͷઢܗ߹੒  ͭҎ্ͷϞσϧͷՃॏฏۉ  ॏΈͷ୯७ฏۉʹΑΔϞσϧϚʔδ  ಉ͡ࣄલֶशࡁΈϞσϧΛ༻͍ͨϞσϧϚʔδ w.PEFMTPVQT<.8PSUTNBO *$.-`>

    w5BTL"SJUINFUJD<(*JIBSDP *$-3`>  ॏΈΛ੔ྻ͔ͯ͠ΒϞσϧϚʔδ w05'VTJPO<44JOHI /FVS*14`> w ෦෼తͳϞσϧϚʔδ  -P3"Ͱඍௐ੔ͨ͠෦෼͚ͩΛ༥߹ ϞσϧϚʔδͷํ๏  த෦େֶϩΰ த෦େֶϩΰ
  3. w ֤ϞσϧͷॏΈ ͷฏۉΛϚʔδϞσϧͷॏΈͱͯ͠ར༻ θ ॏΈͷ୯७ฏۉʹΑΔϞσϧϚʔδ  த෦େֶϩΰ த෦େֶϩΰ ϞσϧͷॏΈθ1 Ϟσϧ

    ͷॏΈ k θk ϚʔδϞσϧ Ϛʔδ͍ͨ͠Ϟσϧ θmerge θmerge = k ∑ n=1 1 k θn ҟͳΔಛ௃Λ࣋ͭॏΈಉ࢜ͷฏۉ ϚʔδϞσϧͷੑೳ͕େ෯ʹ௿Լ
  4. w ॏΈͷઢܗ߹੒  ͭҎ্ͷϞσϧͷՃॏฏۉ  ॏΈͷ୯७ฏۉʹΑΔϞσϧϚʔδ  ಉ͡ࣄલֶशࡁΈϞσϧΛ༻͍ͨϞσϧϚʔδ w.PEFMTPVQT<.8PSUTNBO *$.-`>

    w5BTL"SJUINFUJD<(*JIBSDP *$-3`>  ॏΈΛ੔ྻ͔ͯ͠ΒϞσϧϚʔδ w05'VTJPO<44JOHI /FVS*14`> w ෦෼తͳϞσϧϚʔδ  -P3"Ͱඍௐ੔ͨ͠෦෼͚ͩΛ༥߹ ϞσϧϚʔδͷํ๏  த෦େֶϩΰ த෦େֶϩΰ
  5. w ߴ͍ਫ਼౓ͷϞσϧ͔ΒॱʹϚʔδ͢΂͖͔Λ൑ఆ  ϚʔδϞσϧͷਫ਼౓͕௿͘ͳΔϞσϧ͸আ֎ ϞσϧϚʔδͷ࢓ํɿ(SFFEZTPVQ  த෦େֶϩΰ த෦େֶϩΰ ɿ ɿ

    ɿ k ϚʔδϞσϧ ਫ਼౓ॱʹιʔτ ɿ S = {θ1 }  θ1 θ2 θ3 θk ϞσϧD ϞσϧB ϞσϧL ϞσϧC θgreedy = 1 S S ∑ i=1 θi ࠾༻͞ΕͨϞσϧ਺ ϞσϧϚʔδͷͨΊͷू߹ St = St−1 ∪ {t} if ACC (θt greedy) ≥ ACC (θt−1 greedy) St−1 otherwise for t = {1,...,k}
  6. w ߴ͍ਫ਼౓ͷϞσϧ͔ΒॱʹϚʔδ͢΂͖͔Λ൑ఆ  ϚʔδϞσϧͷਫ਼౓͕௿͘ͳΔϞσϧ͸আ֎ ϞσϧϚʔδͷ࢓ํɿ(SFFEZTPVQ  த෦େֶϩΰ த෦େֶϩΰ ɿ ɿ

    ɿ k ϚʔδϞσϧ ɿ S = {θ1 }  θ1 θ2 θ3 θk ϞσϧD ϞσϧB ϞσϧL ϞσϧC S = {θ1 , θ2 }  ਫ਼౓ॱʹιʔτ ❌ θgreedy = 1 S S ∑ i=1 θi ࠾༻͞ΕͨϞσϧ਺ ϞσϧϚʔδͷͨΊͷू߹ St = St−1 ∪ {t} if ACC (θt greedy) ≥ ACC (θt−1 greedy) St−1 otherwise for t = {1,...,k}
  7. w ߴ͍ਫ਼౓ͷϞσϧ͔ΒॱʹϚʔδ͢΂͖͔Λ൑ఆ  ϚʔδϞσϧͷਫ਼౓͕௿͘ͳΔϞσϧ͸আ֎ ϞσϧϚʔδͷ࢓ํɿ(SFFEZTPVQ  த෦େֶϩΰ த෦େֶϩΰ ɿ ɿ

    ɿ k ϚʔδϞσϧ ɿ S = {θ1 }  θ1 θ2 θ3 θk ϞσϧD ϞσϧB ϞσϧL ϞσϧC ਫ਼౓ॱʹιʔτ S = {θ1 , θ3 }  θgreedy = 1 S S ∑ i=1 θi ࠾༻͞ΕͨϞσϧ਺ ϞσϧϚʔδͷͨΊͷू߹ St = St−1 ∪ {t} if ACC (θt greedy) ≥ ACC (θt−1 greedy) St−1 otherwise for t = {1,...,k}
  8. w ߴ͍ਫ਼౓ͷϞσϧ͔ΒॱʹϚʔδ͢΂͖͔Λ൑ఆ  ϚʔδϞσϧͷਫ਼౓͕௿͘ͳΔϞσϧ͸আ֎ ϞσϧϚʔδͷ࢓ํɿ(SFFEZTPVQ  த෦େֶϩΰ த෦େֶϩΰ ɿ ɿ

    ɿ k ϚʔδϞσϧ ɿ θ1 θ2 θ3 θk ϞσϧD ϞσϧB ϞσϧL ϞσϧC ਫ਼౓ॱʹιʔτ S = {θ1 , θ3 }  S = {θ1 , θ3 , . . . , θk }  θgreedy = 1 S S ∑ i=1 θi ࠾༻͞ΕͨϞσϧ਺ ϞσϧϚʔδͷͨΊͷू߹ St = St−1 ∪ {t} if ACC (θt greedy) ≥ ACC (θt−1 greedy) St−1 otherwise for t = {1,...,k}
  9. w ॏΈͷઢܗ߹੒  ͭҎ্ͷϞσϧͷՃॏฏۉ  ॏΈͷ୯७ฏۉʹΑΔϞσϧϚʔδ  ಉ͡ࣄલֶशࡁΈϞσϧΛ༻͍ͨϞσϧϚʔδ w.PEFMTPVQT<.8PSUTNBO *$.-`>

    w5BTL"SJUINFUJD<(*JIBSDP *$-3`>  ॏΈΛ੔ྻ͔ͯ͠ΒϞσϧϚʔδ w05'VTJPO<44JOHI /FVS*14`> w ෦෼తͳϞσϧϚʔδ  -P3"Ͱඍௐ੔ͨ͠෦෼͚ͩΛ༥߹ ϞσϧϚʔδͷํ๏  த෦େֶϩΰ த෦େֶϩΰ
  10. w ஌ࣝͷ༥߹ɿλεΫϕΫτϧ Λ༻͍ͨϞσϧͷฤू  ɿಛఆͷλεΫʹಛԽ͢ΔͨΊͷϕΫτϧ  ͷ׆༻खஈ τ τ τ

    5BTLT"SJUINFUJD<(*MIBSDP *$-3`>  த෦େֶϩΰ த෦େֶϩΰ τ = θft − θpre ϑΝΠϯνϡʔχϯάޙͷॏΈ ࣄલֶशࡁΈϞσϧͷॏΈ Published as a conference paper at ICLR 2023 τ τnew = − τ Published as a conference paper at ICLR 2023 τA τB τnew = τA + τB Published as a conference paper at ICLR 2023 τB τA τnew = τC + (τA − τB ) τC ̍ɽ஌ࣝʢλεΫʣͷ๨٫ ̎ɽ஌ࣝʢλεΫʣͷՃࢉ ̏ɽλεΫΞφϩδʔ
  11. w ̍ɽ஌ࣝʢλεΫʣͷ๨٫  ෆඞཁͳλεΫͷΈΛ๨٫Մೳ  λεΫϕΫτϧ ͷٯϕΫτϧΛར༻ τ  த෦େֶϩΰ

    த෦େֶϩΰ Published as a conference paper at ICLR 2023 Figure 1: An illustration of task vectors and the arithmetic operations we study for editi (a) A task vector is obtained by subtracting the weights of a pre-trained model from the the same model after fine-tuning (Section 2). (b) Negating a task vector degrades perfo τ τnew = − τ ๨٫ ඞཁɾෆඞཁͷ྆ํͷ஌ࣝΛอ࣋ θ θnew = θ + λτnew ඞཁͳ஌ࣝͷΈอ࣋ Ϟσϧ 5BTLT"SJUINFUJD<(*MIBSDP *$-3`>
  12. w ̎ɽ஌ࣝʢλεΫʣͷՃࢉ  ෳ਺ͷλεΫʹରԠՄೳ  ྫʣ೔ຊޠͷλεΫϕΫτϧͱ਺ֶͷλεΫϕΫτϧΛՃࢉ  த෦େֶϩΰ த෦େֶϩΰ Figure

    1: An illustration of task vectors and the arithmetic operations we study for (a) A task vector is obtained by subtracting the weights of a pre-trained model from the same model after fine-tuning (Section 2). (b) Negating a task vector degrades p the task, without substantial changes in control tasks (Section 3). (c) Adding task v improves the performance of the pre-trained model on the tasks under considerati (d) When tasks form an analogy relationship such as supervised and unsupervised l different data sources, it is possible to improve performance on a supervised target vectors from the remaining three combinations of objectives and datasets (Section 5) τA τB τnew = τA + τB ਺ֶʹରԠ θmath ೔ຊޠʹରԠ θjpn ӳޠͰֶशͨ͠--. θ ਺ֶɾ೔ຊޠͷ྆ରԠ θnew = θ + λτnew ਺ֶɾ೔ຊޠͷ྆ରԠ τmath τjpn λεΫϕΫτϧͷՃࢉ τnew = τmath + τjpn ϕΫτϧͷՃࢉʹΑΔ૯ϊϧϜͷେ͖͞Λௐ੔͢Δ܎਺ 5BTLT"SJUINFUJD<(*MIBSDP *$-3`>
  13. w ̏ɽλεΫΞφϩδʔ  λεΫؒͷؔ܎ੑΛදݱՄೳ  ͭͷλεΫϕΫτϧͷؔ܎ੑΛར༻  த෦େֶϩΰ த෦େֶϩΰ τsketch_rion

    = τrion + (τsketch_dog − τdog ) τdog τsketch_dog τsketch_rion τrion ݘ͔Βݘͷεέον΁ͷؔ܎ੑ 5BTLT"SJUINFUJD<(*MIBSDP *$-3`> ˠֶशͳ͠ͰεέονͰඳ͍ͨϥΠΦϯͷλεΫϕΫτϧΛ֫ಘ
  14. w ॏΈͷઢܗ߹੒  ͭҎ্ͷϞσϧͷՃॏฏۉ  ॏΈͷ୯७ฏۉʹΑΔϞσϧϚʔδ  ಉ͡ࣄલֶशࡁΈϞσϧΛ༻͍ͨϞσϧϚʔδ w.PEFMTPVQT<.8PSUTNBO *$.-`>

    w5BTL"SJUINFUJD<(*JIBSDP *$-3`>  ॏΈΛ੔ྻ͔ͯ͠ΒϞσϧϚʔδ w05'VTJPO<44JOHI /FVS*14`> w ෦෼తͳϞσϧϚʔδ  -P3"Ͱඍௐ੔ͨ͠෦෼͚ͩΛ༥߹ ϞσϧϚʔδͷํ๏  த෦େֶϩΰ த෦େֶϩΰ
  15. w ҟͳΔॳظ஋ͷॏΈ͔Βֶशͨ̎͠Ϟσϧʹ୯७ฏۉΛద༻  σʔληοτɿ$*'"3 ͳͥ੔ྻ͢Δඞཁ͕͋Δ͔ʁ  த෦େֶϩΰ த෦େֶϩΰ .PEFM Ϟσϧ"

    Ϟσϧ# ୯७ฏۉ 7((    3FT/FU    Ϟσϧ"ͱϞσϧ#͕ࣅͨॏΈΛ֫ಘ͍ͯͯ͠΋ ॳظ஋͕ҟͳΔͨΊॏΈͷॱ൪ʢ഑ஔʣ͸ಉ͡Ͱ͸ͳ͍ ॏΈͷॱ൪Λἧ͑Δ͜ͱ͕ඞཁ
  16. w ࠷ద༌ૹ 0QUJNBM5SBOTQPSU Λ༻͍ͨϞσϧϚʔδख๏  ֤Ϟσϧͷಉ͡૚Ͱྨࣅͨ͠ॏΈΛ࠷ద༌ૹʹΑΓ୳ࡧ͠ରԠ෇͚  ରԠͨ͠ॏΈͷฏۉΛ୯ҰϞσϧͷॏΈͱͯ͠ར༻ 05'VTJPO<44JOHI /FVS*14`>

     த෦େֶϩΰ த෦େֶϩΰ Ϟσϧ" Ϟσϧ# ೖྗϞσϧ 'VTJPO ग़ྗϞσϧ Ϟσϧ#Λج४ʹॏΈΛ੔ྻ ੔ྻ͞ΕͨϞσϧ Ϟσϧ#Λج४ʹॏΈΛ੔ྻ ࠷ద༌ૹ
  17. w ϨΠϠʔຖʹྨࣅͨ͠ॏΈͷ୳ࡧʹར༻ ϞσϧؒͷॏΈʹ͓͚Δ࠷ద༌ૹ  த෦େֶϩΰ த෦େֶϩΰ minimize P∈ℝn×m n ∑

    i=1 m ∑ j=1 C(xi , yj )Pij subject to Pij ≥ 0, ∀i, j, m ∑ j=1 Pij = 1 n , ∀i, n ∑ i=1 Pij = 1 m , ∀j . ɿ֬཰෼෍ͷཁૉ਺ n, m ෼෍ؒͷ༌ૹߦྻ ෼෍ؒͷίετߦྻ           Ϟσϧ" Ϟσϧ# ૚ l w1,1 w2,1 w1,1 w2,1 ఆࣜԽ
  18. w ϨΠϠʔຖʹྨࣅͨ͠ॏΈͷ୳ࡧʹར༻ ϞσϧؒͷॏΈʹ͓͚Δ࠷ద༌ૹ  த෦େֶϩΰ த෦େֶϩΰ minimize P∈ℝn×m n ∑

    i=1 m ∑ j=1 C(xi , yj )Pij subject to Pij ≥ 0, ∀i, j, m ∑ j=1 Pij = 1 n , ∀i, n ∑ i=1 Pij = 1 m , ∀j . ɿ֬཰෼෍ͷཁૉ਺ n, m ෼෍ؒͷ༌ૹߦྻ ෼෍ؒͷίετߦྻ ఆࣜԽ           Ϟσϧ" Ϟσϧ#       w1 w2 ૚ l ࣸ૾
  19. w ϨΠϠʔຖʹྨࣅͨ͠ॏΈͷ୳ࡧʹར༻ ϞσϧؒͷॏΈʹ͓͚Δ࠷ద༌ૹ  த෦େֶϩΰ த෦େֶϩΰ minimize P∈ℝn×m n ∑

    i=1 m ∑ j=1 C(xi , yj )Pij subject to Pij ≥ 0, ∀i, j, m ∑ j=1 Pij = 1 n , ∀i, n ∑ i=1 Pij = 1 m , ∀j . ɿ֬཰෼෍ͷཁૉ਺ n, m ෼෍ؒͷ༌ૹߦྻ ෼෍ؒͷίετߦྻ ఆࣜԽ           Ϟσϧ" Ϟσϧ#       w1 w2 C(a1 , b1 )P11 C(a3 , b3 )P33 C(a2 , b2 )P22 a2 b2 ૚ l ࠷దͰͳ͍༌ૹ b1 a1 , , b3 a3 , ༌ૹʹ͔͔Δ૯ίετ͕ߴ͍
  20. w ϨΠϠʔຖʹྨࣅͨ͠ॏΈͷ୳ࡧʹར༻ ϞσϧؒͷॏΈʹ͓͚Δ࠷ద༌ૹ  த෦େֶϩΰ த෦େֶϩΰ minimize P∈ℝn×m n ∑

    i=1 m ∑ j=1 C(xi , yj )Pij subject to Pij ≥ 0, ∀i, j, m ∑ j=1 Pij = 1 n , ∀i, n ∑ i=1 Pij = 1 m , ∀j . ɿ֬཰෼෍ͷཁૉ਺ n, m ෼෍ؒͷ༌ૹߦྻ ෼෍ؒͷίετߦྻ ఆࣜԽ ࠷ద༌ૹ           Ϟσϧ" Ϟσϧ#       w1 w2 C(a1 , b2 )P12 C(a3 , b1 )P31 C(a2 , b3 )P23 ૚ l b3 a2 , b1 a3 , b2 a1 , ༌ૹʹ͔͔Δ૯ίετ͕௿͍
  21. w ҟͳΔॳظ஋ͷॏΈ͔Βֶशͨ͠Ϟσϧ"ͱϞσϧ#ΛϚʔδ  σʔληοτɿ$*'"3 05'VTJPOʹΑΔਫ਼౓มԽ  த෦େֶϩΰ த෦େֶϩΰ .PEFM Ϟσϧ"

    Ϟσϧ# ୯७ฏۉ 05'VTJPO 7((     3FT/FU     3FT/FUͰ͸05'VTJPOʹΑΔੑೳ௿Լ͕ஶ͍͠ QU QU
  22. w ੔ྻ͠ͳ͍৔߹ͷεΩοϓ઀ଓ εΩοϓ઀ଓΛؚΉߏ଄ͷ࠷ద༌ૹ  த෦େֶϩΰ த෦େֶϩΰ 9   

             : -BZFS′  -BZFS′  ϒϩοΫग़ྗ 9 :         -BZFS -BZFS εΩοϓ઀ଓ ੔ྻલ ˠϒϩοΫग़ྗͷ഑ஔ͕Ұக͍ͯ͠ΔͷͰϚʔδՄೳ ஔ͖׵͑
  23. w ࠷ద༌ૹʹΑΓ੔ྻͨ͠৔߹ͷεΩοϓ઀ଓ εΩοϓ઀ଓΛؚΉߏ଄ͷ࠷ద༌ૹ  த෦େֶϩΰ த෦େֶϩΰ 9   

       : -BZFS′  -BZFS′  ϒϩοΫग़ྗ       9 :         -BZFS -BZFS         9 :         -BZFS -BZFS ੔ྻલ ࠷ద༌ૹ ʹΑΔ੔ྻ ੔ྻޙ ஔ͖׵͑ ੔ྻޙͷϒϩοΫग़ྗͷ഑ஔ͕มԽ͢ΔͨΊՃࢉ࣌ͷରԠؔ܎่͕յ ˠରԠ͢Δ-BZFSΛάϧʔϓԽͯ͠ಉ࣌ʹ࠷ద༌ૹͱ੔ྻΛߦ͏͜ͱ͕ඞཁ
  24. w ࠷ద༌ૹʹΑΓ੔ྻͨ͠৔߹ͷεΩοϓ઀ଓ εΩοϓ઀ଓΛؚΉߏ଄ͷ࠷ద༌ૹ  த෦େֶϩΰ த෦େֶϩΰ 9   

       : -BZFS′  -BZFS′  ϒϩοΫग़ྗ       9 :         -BZFS -BZFS         9 :         -BZFS -BZFS ੔ྻલ ࠷ద༌ૹ ʹΑΔ੔ྻ ੔ྻޙ ஔ͖׵͑ ੔ྻޙͷϒϩοΫग़ྗͷ഑ஔ͕มԽ͢ΔͨΊՃࢉ࣌ͷରԠؔ܎่͕յ ˠରԠ͢Δ-BZFSΛάϧʔϓԽͯ͠ಉ࣌ʹ࠷ద༌ૹͱ੔ྻΛߦ͏͜ͱ͕ඞཁ %FQ(SBQIʹΑΔάϧʔϓԽΛಋೖ Ϟσϧͷߏ଄ʹΑͬͯάϧʔϓԽ͢Δ΂͖-BZFSؔ܎͕มԽ ༷ʑͳϞσϧߏ଄ʹରͯ͠ਓखͰ൑அ͢Δͷ͸ߴίετ
  25. w ߏ଄ԽࢬמΓ͸૚ؒͷґଘؔ܎ʹج͍ͮͯάϧʔϓԽ͠ɼάϧʔϓ୯ҐͰύϥϝʔλΛ࡟আ  ໰୊఺ɿϞσϧߏ଄ʹΑͬͯґଘؔ܎͕େ͖͘ҟͳΓɼखಈͰͷάϧʔϓԽ͸ࠔ೉ ૚ؒͷґଘؔ܎Λࣗಈతʹ൑ఆ͠ύϥϝʔλΛάϧʔϓԽ͢Δ%FQ(SBQIΛఏҊ %FQ(SBQI<('BOH $713`>  த෦େֶϩΰ த෦େֶϩΰ

    Conv !! BN !" ReLU !# + Conv !$ BN !% ReLU !& (a) CNNs Add !' !! ( !! ) !" ( !" ) !# ( !# ) !$ ( !$ ) !% ( !% ) !& ( !& ) !' ( !' ) Succeeding Layers Preceding Layers (b) Propagation on Dependency Graph !"ℎ $! " ≠ !"ℎ($! #) !"ℎ $$ " = !"ℎ($$ #) 1 0 2 3 0 1 2 3 1 0 2 3 1 0 2 3 1 0 2 3 1 0 2 3 0 1 2 3 1 0 2 3 Norm Multi-Head Attention Norm MLP + + Conv1 BN1 ReLU + Conv2 BN2 ReLU (a) CNNs (b) Transformers ! ! tanh ! × × × + tanh (c) RNNs (d) GNNs GNN Layer ༷ʑͳϞσϧߏ଄ Ϟσϧߏ଄͔Βࣗಈతʹґଘؔ܎Λ൑ఆ
  26. w ૚ͷશ݁߹૚ωοτϫʔΫʹ͓͚Δґଘؔ܎  ૚.-1ͷॏΈɿ   ґଘؔ܎ ‣ ͱ ͷؒʹґଘؔ܎ʢࠇઢʣ͕ଘࡏˠࢬמΓ࣌͸ಉ࣌ʹ࡟আ͢Δඞཁ͕͋Δ

    ‣ ͱ Λܨ͙ ൪໨ͷχϡʔϩϯʢ੨ؙʣͷ࡟আ࣌͸ɼ ͱ Λಉ࣌ʹ࡟আ wl , wl+1 , wl+2 wl wl+1 wl wl+1 k wl [k, :] wl+1 [: , k] χϡʔϥϧωοτϫʔΫʹ͓͚Δґଘؔ܎<>  த෦େֶϩΰ த෦େֶϩΰ Residual Connection (a) Basic dependency (b) Residual dependency (c) Conca Concat "! "!"# "!"$ "! "!"# "!"$ "!
  27. w εΩοϓ઀ଓʹ͓͚ґଘؔ܎  3FT/FUͷߏ଄ɿ$POW #/ 3F-6 $POW #/ 3F-6 "EE

     ‣࠷ऴग़ྗɿ ͷग़ྗͱ ͷग़ྗͷ࿨ ‣ࢬמΓɿ$POW ͷνϟϯωϧͷҰ෦Λ࡟আ ‣໰୊఺ɿ ͱ ͷνϟϯωϧ਺͕ҟͳΓɼ࿨͕ܭࢉͰ͖ͳ͍ ɹɹɹʢՃࢉ͢Δνϟϯωϧؒͷؔ܎่͕ΕΔʣ f1 f2 f3 f4 f5 f6 f7 f6 f7 f4 f6 f4 χϡʔϥϧωοτϫʔΫʹ͓͚Δґଘؔ܎<>  த෦େֶϩΰ த෦େֶϩΰ Conv !! BN !" ReLU !# + Conv !$ BN !% ReLU !& (a) CNNs Add !' !! ( !! ) !" ( !" ) !# ( !# ) !$ ( S Preceding Layers (b) Propagation on D !"ℎ $! " ≠ !" !"ℎ $$ " = !"ℎ($$ # 1 0 2 3 0 1 2 3 1 0 2 3 1 0 2 3 1 0 2 3 $POW ͱ$POW ΛάϧʔϓԽ͠ɼಉ࣌ʹࢬמΓΛద༻˞ f1 f4 ˞ਖ਼֬ʹ͸ ͔Β Λͭͷάϧʔϓͱͯ͠ఆٛ f1 f7
  28. w େ͖ͭ͘ͷεςοϓ͔Β%FQ(SBQIΛߏங ɽ ݸͷ૚Λ࣋ͭωοτϫʔΫΛ෼ղ  ࢬמΓʹΑΓೖྗͱग़ྗͷ࣍ݩΛͦΕͧΕ࡟আ͢ΔͨΊɼೖྗ ͱग़ྗ Λ෼͚ͯදݱ  ෼͚ͯදݱ͢Δ͜ͱͰ૚ؒͷґଘؔ܎ͱ૚಺ͷґଘؔ܎ͷ྆ํΛߟྀՄೳ

     ͸৞ΈࠐΈ૚ͳͲͷύϥϝʔλΛ࣋ͭ૚ͱ3F-6ͳͲͷύϥϝʔλΛ࣋ͨͳ͍૚ͷͲͪΒ͔Λදݱ L f− i f+ i fi %FQFOEFODZ(SBQI %FQ(SBQI ͷߏங  த෦େֶϩΰ த෦େֶϩΰ ℱ = {f− 1 , f+ 1 , ⋯, f− L , f+ L } (f− 1 , f+ 1 ) ↔ (f− 2 , f+ 2 )⋯, ↔ (f− L , f+ L ) ૚ؒͷґଘؔ܎ ૚಺ͷґଘؔ܎
  29. w େ͖ͭ͘ͷεςοϓ͔Β%FQ(SBQIΛߏங ɽґଘؔ܎Λ൑ఆ͢Δϧʔϧʹج͍ͮͯ૚ؒɼ૚಺ͷґଘؔ܎Λ൑ఆ  ૚ؒͷґଘؔ܎ɿ ͷ઀ଓͰ͸ґଘؔ܎ ͕ৗʹ੒Γཱͭ w ઀ଓ͞Εͨ૚ͷ ͱ

    ͸ಉ͡தؒಛ௃ʹରԠˠґଘؔ܎͕ৗʹ੒Γཱͭ  ૚಺ͷґଘؔ܎ɿ ͱ ͰࢬמΓํ๏Λڞ༗͍ͯ͠Δ࣌ ɼ ͕੒Γཱͭ w ࢬמΓ࣌ʹೖྗͱग़ྗΛಉ࣌ʹ࡟আˠೖྗͱग़ྗͰࢬמΓํ๏Λڞ༗ f− i ↔ f+ j f− i ⇔ f+ j f− i f+ j f− i f+ i sch(f− i ) = sch(f+ i ) f− i ⇔ f+ i %FQFOEFODZ(SBQI %FQ(SBQI ͷߏங  த෦େֶϩΰ த෦େֶϩΰ D(f− i , f+ j ) = 1[f− i ↔ f+ j ] ∨ 1[i = j ∧ sch(f− i ) = sch(f+ i )] 03 "/% ࢦࣔؔ਺ ࠷ऴతͳ൑ఆϧʔϧ
  30. w %FQ(SBQIʹج͍ͮͯύϥϝʔλΛάϧʔϓԽ w ͋Δ૚ Λج४ͱͨ࣌͠ɼґଘؔ܎͕ଘࡏ͠ͳ͘ͳΔ·Ͱલޙͷ૚Λऔಘ w औಘͨ͠૚ͨͪΛͭͷάϧʔϓͱͯ͠άϧʔϓԽ fi Conv !!

    BN !" ReLU !# + Conv !$ BN !% ReLU !& (a) CNNs Add !' !! ( !! ) !" ( !" ) !# ( !# ) !$ ( !$ ) !% ( !% ) !& ( !& ) !' ( !' ) Succeeding Layers Preceding Layers (b) Propagation on Dependency Graph !"ℎ $! " ≠ !"ℎ($! #) !"ℎ $$ " = !"ℎ($$ #) 1 0 2 3 0 1 2 3 1 0 2 3 1 0 2 3 1 0 2 3 1 0 2 3 0 1 2 3 1 0 2 3 Λج४ͱͨ͠৔߹ f4 %FQ(SBQIʹΑΔύϥϝʔλͷάϧʔϓԽ  த෦େֶϩΰ த෦େֶϩΰ f+ 1 , f− 2 , f+ 2 , f− 3 , f+ 3 , f− 4 , f+ 4 f− 5 , f+ 5 f− 6 , f+ 6 , f− 7 , f+ 7 ࠷ऴతͳάϧʔϓ εΩοϓ઀ଓʹΑΓՃࢉ͢Δ ͕  ͱͷґଘؔ܎͕ଘࡏ f+ 3 f− 4 w[: , k, : , :] ≠ w[k, : , : , :] ೖྗͷࢬמΓ ग़ྗͷࢬמΓ w[out_ch, in_ch, fi lter_h, fi lter_w] ৞ΈࠐΈ૚ͷॏΈ ˠґଘؔ܎͕ͳ͍ʢ੾அʣ ৞ࠐΈ૚͸ೖग़ྗ͕  sch( f− i ) ≠ sch( f+ i )
  31. w 3FT/FUͷάϧʔϓԽ  εΩοϓ઀ଓͱ࢒ࠩ઀ଓ͕ಉ͡άϧʔϓ w άϧʔϓ͝ͱʹϞσϧؒͰ࠷ద༌ૹ  ࢒ࠩ઀ଓͱεΩοϓ઀ଓͰҟͳΔ࠷ద༌ૹߦྻͷൃੜΛճආ %FQ(SBQIʹΑΔάϧʔϓʹ͓͚Δ࠷ద༌ૹ 

    த෦େֶϩΰ த෦େֶϩΰ 1×3×32×32 1×10 input Conv Wʪ64×3×3×3ʫ Bʪ64ʫ Relu Conv Wʪ64×64×3×3ʫ Bʪ64ʫ Relu Conv Wʪ64×64×3×3ʫ Bʪ64ʫ Add Relu Conv Wʪ64×64×3×3ʫ Bʪ64ʫ Relu Conv Wʪ64×64×3×3ʫ Bʪ64ʫ Add Relu Conv Wʪ128×64×3×3ʫ Bʪ128ʫ Relu Conv Wʪ128×128×3×3ʫ Bʪ128ʫ Conv Wʪ128×64×1×1ʫ Bʪ128ʫ Add Relu Conv Wʪ128×128×3×3ʫ Bʪ128ʫ Relu Conv Wʪ128×128×3×3ʫ Bʪ128ʫ Add Relu Conv Wʪ256×128×3×3ʫ Bʪ256ʫ Relu Conv Wʪ256×256×3×3ʫ Bʪ256ʫ Conv Wʪ256×128×1×1ʫ Bʪ256ʫ Add Relu Conv Wʪ256×256×3×3ʫ Bʪ256ʫ Relu Conv Wʪ256×256×3×3ʫ Bʪ256ʫ Add Relu Conv Wʪ512×256×3×3ʫ Bʪ512ʫ Relu Conv Wʪ512×512×3×3ʫ Bʪ512ʫ Conv Wʪ512×256×1×1ʫ Bʪ512ʫ Add Relu Conv Wʪ512×512×3×3ʫ Bʪ512ʫ Relu Conv Wʪ512×512×3×3ʫ Bʪ512ʫ Add Relu GlobalAveragePool Reshape shapeʪ2ʫ Gemm Bʪ10×512ʫ Cʪ10ʫ output (SPVQ (SPVQ (SPVQ (SPVQ (SPVQ (SPVQ (SPVQ (SPVQ (SPVQ (SPVQ (SPVQ (SPVQ
  32. w όονਖ਼نԽ૚ͷॲཧ  ɿ֤ϛχόονͷฏۉ஋Λࢦ਺Ҡಈฏۉͨ͠஋  ɿ֤ϛχόονͷ෼ࢄΛࢦ਺Ҡಈฏۉͨ͠஋ w όονਖ਼نԽ૚ͷϚʔδ  ֤ύϥϝʔλΛ୯७ʹฏۉ͢Δͱɺฏۉ͞Εͨ౷ܭྔ͸ϚʔδϞσϧͱෆ੔߹Λੜ͡Δ

    E(x) Var(x) ɽόονਖ਼نԽ૚ͷ౷ܭྔ  த෦େֶϩΰ த෦େֶϩΰ y = x − E(x) Var(x) + ϵ ⋅ γ + β ग़ྗ ೖྗ ֶशՄೳͳύϥϝʔλ ҟͳΔॏΈΛ࣋ͭ৔߹͸όονਖ਼نԽ૚ͷೖྗ΋มԽ ֶश࣌͸Ϟσϧ͝ͱʹ౷ܭྔΛอ࣋ Ϛʔδޙʹ౷ܭྔͷΈ࠶ܭࢉͯ͠ɺϚʔδϞσϧʹ͓͚Δద੾ͳ౷ܭྔΛ֫ಘ
  33. w ҟͳΔॳظ஋ͷॏΈ͔Βֶशͨ͠Ϟσϧ"ͱϞσϧ#ΛϚʔδ  σʔληοτɿ$*'"3 վળͨ͠05'VTJPOͷਫ਼౓มԽ  த෦େֶϩΰ த෦େֶϩΰ ͭͷվળʹΑΓੑೳ௿ԼΛ཈੍ .PEFM

    Ϟσϧ" Ϟσϧ# 05'VTJPO ͳ͠ %FQ(SBQI #/࠶ܭࢉ 3FT/FU     %FQHSBQIͷಋೖͱόονਖ਼نԽ૚ʹ͓͚Δ౷ܭྔͷ࠶ܭࢉΛద༻
  34. w ॏΈͷઢܗ߹੒  ͭҎ্ͷϞσϧͷՃॏฏۉ  ॏΈͷ୯७ฏۉʹΑΔϞσϧϚʔδ  ಉ͡ࣄલֶशࡁΈϞσϧΛ༻͍ͨϞσϧϚʔδ w.PEFMTPVQT<.8PSUTNBO *$.-`>

    w5BTL"SJUINFUJD<(*JIBSDP *$-3`>  ॏΈΛ੔ྻ͔ͯ͠ΒϞσϧϚʔδ w05'VTJPO<44JOHI /FVS*14`> w ෦෼తͳϞσϧϚʔδ  -P3"Ͱඍௐ੔ͨ͠෦෼͚ͩΛ༥߹ ϞσϧϚʔδͷํ๏  த෦େֶϩΰ த෦େֶϩΰ
  35. w -P3"<&+)V *$-3`>ɿࣄલֶशࡁΈϞσϧʹ௥Ճͨ͠௿ϥϯΫߦྻͷΈΛඍௐ੔  ࣄલֶशࡁΈϞσϧશମΛඍௐ੔͢Δ৔߹ͱൺ΂ֶͯशύϥϝʔλ਺Λ࡟ݮ w ͋ΔλεΫͷֶश࣌ʹ࡞੒ͨ͠௿ϥϯΫߦྻΛ༥߹ͯ͠৽͍͠λεΫʹରԠ  -PSB)VC<$)VBOH $0-.`>ɿ৽͍͠λεΫʹର͢Δੑೳʹج͍ͮͯ௿ϥϯΫߦྻΛॏΈ෇͖Ճࢉ

    -P3"Ͱඍௐ੔ͨ͠෦෼͚ͩΛ༥߹  த෦େֶϩΰ த෦େֶϩΰ COMPOSE LoRA Tuning on Upstream Tasks Commonsense Reasoning Question Answering Natural Language Inference Question Generation … LLM Evaluate the result of a Boolean expression: not ( True ) and ( True ) is False !! × !" × + + + !# × !$ × + ⋯ → . evaluate( ) ADAPT LoraHub Learning for Unseen Tasks Boolean Expressions K × !% × !
  36. w ෳ਺ͷϞσϧΛ୯ҰϞσϧʹू໿͢Δٕज़ w ू໿͢ΔͨΊͷΞϓϩʔν  ಉ͡ࣄલֶशࡁΈϞσϧΛ༻͍ͨϞσϧϚʔδ w ܰྔԽɾޮ཰Խɿ.PEFMTPVQT<.8PSUTNBO *$.-`>ˠҟͳΔϋΠύϥͰֶशͨ͠ϞσϧͷॏΈΛར༻ w

    ஌ࣝͷ༥߹ɿ5BTL"SJUINFUJD<(*JIBSDP *$-3`>ˠλεΫϕΫτϧΛ༻͍ͨϞσϧͷฤू  ॏΈΛ੔ྻ͢ΔϞσϧϚʔδ w 05'VTJPO<44JOHI /FVS*14`>ˠ࠷ద༌ૹΛ༻͍ͨॏΈͷ੔ྻ ·ͱΊɿϞσϧϚʔδ  த෦େֶϩΰ த෦େֶϩΰ Ϟσϧߏ଄ʹΑͬͯ࠷ద༌ૹॲཧ΍όονਖ਼نԽ૚ͷߟྀ͕ඞཁ
  37. w <.8PSUTNBO *$.-`>.PEFMTPVQTBWFSBHJOHXFJHIUTPGNVMUJQMF fi OFUVOFENPEFMTJNQSPWFTBDDVSBDZ XJUIPVUJODSFBTJOHJOGFSFODFUJNF w <(*JIBSDP *$-3`>&EJUJOH.PEFMTXJUI5BTL"SJUINFUJD w

    <44JOHI /FVS*14`>.PEFM'VTJPOWJB0QUJNBM5SBOTQPSU w <('BOH $713`>%FQHSBQI5PXBSET"OZ4USVDUVSBM1SVOJOH w <&+)V *$-3`>-P3"-PX3BOL"EBQUBUJPOPG-BSHF-BOHVBHF.PEFMT w <$)VBOH $0-.`>-PSB)VC& ff i DJFOU$SPTT5BTL(FOFSBMJ[BUJPOWJB%ZOBNJD-P3"$PNQPTJUJPO ࢀߟจݙ  த෦େֶϩΰ த෦େֶϩΰ