Upgrade to Pro — share decks privately, control downloads, hide ads and more …

[読み会資料] Federated Learning for Vision-and-Langu...

catla
March 10, 2020

[読み会資料] Federated Learning for Vision-and-Language Grounding Problems

ゼミ資料.
Federated Learning for Vision-and-Language Grounding Problems(AAAI'20)の紹介.

catla

March 10, 2020
Tweet

More Decks by catla

Other Decks in Science

Transcript

  1. ࿦จ৘ใ ʲஶऀʳ Fenglin Liu(Penking Univ), Xian Wu(Tencent), Shen Ge(Tencent), Wei

    Fan(Tencent), Yuexian Zou(Penking Univ & Peng Cheng Laboratory) ʲग़యʳ AAAI 2020 ʲͳͥɼ͜ͷ࿦จΛબΜͩʁʳ ɹ'-ͷϑϨʔϜϫʔΫΛผͷλεΫʹҠ২ͨ͜͠ͱʹ໘നΈΛײ͔ͨ͡Βɽ ʲࡶஊʳ ୈҰஶऀ'-JVͷ೥ͷઓ੷ʢୈҰஶऀͷ࿦จʣ w /FVS*14 "DDFQUFE w *+$"* "DDFQUFE w *$%. "DDFQUFE ⟵ ⟵ ⟵
  2. ༻ޠ 7JTJPOBOE-BOHVBHF(SPVOEJOH1SPCMFN ɹࣗવݴޠॲཧͱίϯϐϡʔλϏδϣϯʹ·͕ͨΔ໰୊Λಉ࣌ʹॲཧ͢Δ໰୊ɽ ྫ image captioning, visual question answer(VQA), image

    caption retrieval, etc… )PSJ[POUBM'FEFSBUFE-FBSOJOH )'-  ɹTBNQMFCBTFEGFEFSBUFEMFBSOJOHͱ΋஌ΒΕ͍ͯΔɽಛ௃ۭؒ͸ಉ͕ͩ͡ɼα ϯϓϧʢϢʔβʣू߹͸ҟͳΔΑ͏ͳσʔλΛ༻͍Δ'-໰୊ɽ 7FSUJDBM'FEFSBUFE-FBSOJOH 7'-  ɹGFBUVSFCBTFEGFEFSBUFEMFBSOJOHͱ΋஌ΒΕ͍ͯΔɽαϯϓϧʢϢʔβʣू߹ ͸ಉ͕ͩ͡ɼಛ௃ۭؒ͸ҟͳΔΑ͏ͳσʔλΛ༻͍Δ'-໰୊ɽ 'FEFSBUFE5SBOTGFS-FBSOJOH '5-  ɹಛ௃ۭؒ΋αϯϓϧʢϢʔβʣू߹΋ҟͳΔΑ͏ͳσʔλΛ༻͍Δ'-໰୊ɽ )'- 7'- '5-ͳͲͷৄࡉ͸ɼҎԼͷ࿦จʹॻ͍ͯ͋Δɽ “Federated Learning machine learning: Concept and applications”
  3. [Nguyen & Okatani , 2019] ɹλεΫؒͰσʔληοτΛ ౷Ұ͠ͳ͍ͱ͍͚ͳ͍໰୊Λ ؇࿨ͨ͠΋ͷɽ ɹ72"ͳͲʹ͸ೖྗʹςΩετ ͕ඞਢɽ

    ɹ*NBHFDBQUJPOJOHͷΑ͏ͳೖ ྗ͕ը૾ͷΈͷͱ͖ɼ͋·Γྑ͍ ݁Ռ͕ಘΒΕͳ͍ɽ ʢEBUBMFBLBHFೖྗςΩετ ͔͠͠ ͕ͨͬͯ͠
  4. ఏҊ ɹEBUBMFBLBHFΛ๷͗ͳ͕ΒλεΫؒͰಛ௃Λڞ༗͢ΔͨΊʹɼ'-ͷ࿮૊ΈΛར༻ ͨ͠ɹ"MJHOJOH *OUFHSBUJOHBOE.BQQJOH/FUXPSLʢBJN/FUʣͷఏҊɽ ɹ༷ʑͳ৚݅Λ)'- 7'- '5-ͷ࿮ʹ౰ͯ͸ΊΔ͜ͱͰॊೈʹରԠՄೳɽ ͜ͷ࿦จʹ͓͚Δߩݙ͸ɼҎԼͷ௨Γɽ ᶃ '-ͷ࿮૊ΈΛ࢖ͬͯɼҟͳΔλεΫ΍σʔληοτΛEBUBMFBLBHFΛ๷͗ͳ͕Β

    ར༻͢Δ͜ͱͰɼ୯ମͷλεΫΑΓ΋ྑ͍ಛ௃ΛಘΔ͜ͱʹ੒ޭͨ͜͠ͱɽ ᶄ '-ʹ͓͚ΔDFOUSBMJ[FENPEFMͱͯ͠ɼೖྗը૾͔ΒWJTVBMͱUFYUVBMಛ௃Λநग़ ͢ΔBJN/FUΛఏҊͨ͜͠ͱɽ ᶅ )PSJ[POUBM'FEFSBUFE-FBSOJOH 7FSUJDBM'FEFSBUFE-FBSOJOH 'FEFSBUFE 5SBOTGFS-FBSOJOHͷ৚݅Լͷ্Ͱɼ༗ޮੑ͕ࣔ͞Εͨ͜ͱɽ
  5. Visual and Textual Feature ʲΠϝʔδʳ ೖྗը૾ .VMUJQMF*OTUBODF -FBSOJOH 'BTUFS3$// WJTVBMGFBUVSFFYUSBDUPS

    UFYUVBMGFBUVSFFYUSBDUPS &NCFEEJOH ⃗ I = { ⃗ i1 , ⃗ i2 , …, ⃗ iN } ∈ ℝN×d ⃗ T = { ⃗ w1 , ⃗ w2 , …, ⃗ wM } ∈ ℝM×d ɹࠨͷΑ͏ͳಘΒΕͨ୯ޠ܈ΛTFNBOUJDDPODFQU ͱ͍͍ɼ෺ମʢEPH΍GSJTCFFʣɼঢ়ଶʢP⒎΍ FMFDUSJDʣɼؔ܎ੑʢIPMEJOH qZJOHʣؚ͕·ΕΔɽ QSPWJEFECZ"OEFSTPOFUBM  QSPWJEFECZ'BOHFUBM  ͜ͷFNCFEEJOH૚͸ɼ DBQUJPORVFTUJPOͱڞ༗ɽ Experiment Setting ɹXPSEFNCFEEJOH͸ɼDBQUJPORVFTUJPOͱڞ༗ɽ ͸Ҏ ߱Ͱ঺հ͢Δ#BTFMJOF طଘݚڀͷϞσϧ ͷσίʔμͷೖྗ ࣍ݩʹ͢Δɽ d N = M = 36.
  6. Aligning, Integrating and Mapping Network : aimNet ɹBJN/FU͸ɼҎԼͷͭͷϞδϡʔϧͰߏ੒͞ΕΔɽ͜Ε͸ɼ֤λεΫʹ͓͚Δը૾ ͷΤϯίʔμͷ෦෼ʹ૬౰͢Δɽ w"MJHOJOHNPEVMF

    ɹWJTVBMGFBUVSFͱUFYUVBMGFBUVSFؒͷؔ܎ੑΛରԠ͚ͮΔɽ w*OUFHSBUJOHNPEVMF ɹWJTVBMGFBUVSF಺ɼ͓ΑͼUFYUVBMGFBUVSF಺Ͱͷؔ܎ੑΛରԠ͚ͮΔɽ w.BQQJOH.PEVMF ɹλεΫ͝ͱͷೖྗαΠζʹม׵͢Δɽ
  7. Aligning, Integrating and Mapping Network : aimNet "MJHOJOH.PEVMFͱ*OUFHSBUJOH.PEVMF͸ɼ࣍ͷ#BTJD.PEVMFΛϕʔεͱ͍ͯ͠ Δɽ ʲ#BTJD.PEVMFʳ

    ɹ֤ಛ௃ؒͷؔ܎ੑΛଊ͑ΔͨΊʹɼ.VMUJ)FBE"UUFOUJPOʢ.)"ʣͱ'FFE 'PSXBSE/FUXPSLʢ''/ʣΛద༻ɽ͜ΕΛ࢖͏͜ͱͰɼಛ௃ΛҰରҰͰ͸ͳ͘ɼ ଟରଟͰಛ௃ؒͷؔ܎ੑͷڧ͞ΛଌΔ͜ͱ͕Ͱ͖Δɽ ɹ ͜ͷΞΠσΞΛɼ WJTVBMGFBUVSFͱUFYUVBMGFBUVSFؒͷؔ܎ੑΛଊ͑Δ ɹ "MJHOJOH.PEVMF WJTVBMGFBUVSFಉ࢜ɼ·ͨ͸ɼUFYUVBMGFBUVSFಉ࢜ͷؔ܎ੑΛଊ͑Δ ɹ *OUFHSBUJOH.PEVMF ⟹ ⟹
  8. Aligning, Integrating and Mapping Network : aimNet w"MJHOJOHNPEVMF ɹWJTVBMGFBUVSFͱUFYUVBMGFBUVSFؒͷؔ܎ੑΛରԠ͚ͮΔɽ w*OUFHSBUJOHNPEVMF

    ɹWJTVBMGFBUVSF಺ɼ͓ΑͼUFYUVBMGFBUVSF಺Ͱͷؔ܎ੑΛରԠ͚ͮΔɽ w.BQQJOH.PEVMF ɹλεΫ͝ͱͷೖྗαΠζʹม׵͢Δɽ
  9. Aligning, Integrating and Mapping Network : aimNet ʲ"MJHOJOH.PEVMFʳ ɹ΍͍ͬͯΔ͜ͱ͸ɼ ɹɹ4PVSDF5BSHFU"UUFOUJPOͱಉ͡ɽ

    ɹɹ  ɹɹ  ɹ͜Ε͸72"Ͱ༗ޮͱ͍͏͜ͱ͕ࣔ͞Ε͍ͯΔɽʢ4VFUBMʣ ɹը૾ͱΞϥΠϝϯτΛऔΔ͜ͱͰɼಉ͡εϖϧͰ΋ෳ਺ͷҙຯΛ࣋ͭΑ͏ͳᐆດͳ දݱΛ཈͑ࠐΉ͜ͱ͕Ͱ͖ΔɽʢྫɿNPVTFͶͣΈɼిࢠػثͷϚ΢εʣ ⃗ Ia = FFN(MHA( ⃗ I, ⃗ T , ⃗ T )) ⃗ T a = FFN(MHA( ⃗ T , ⃗ I, ⃗ I)) Experiment Setting ɹBUUFOUJPOIFBEͷݸ਺͸ɼGFFEGPSXBSEOFUXPSLͷ࣍ ݩ਺͸
  10. Aligning, Integrating and Mapping Network : aimNet w"MJHOJOHNPEVMF ɹWJTVBMGFBUVSFͱUFYUVBMGFBUVSFؒͷؔ܎ੑΛରԠ͚ͮΔɽ w*OUFHSBUJOHNPEVMF

    ɹWJTVBMGFBUVSF಺ɼ͓ΑͼUFYUVBMGFBUVSF಺Ͱͷؔ܎ੑΛରԠ͚ͮΔɽ w.BQQJOH.PEVMF ɹλεΫ͝ͱͷೖྗαΠζʹม׵͢Δɽ
  11. Aligning, Integrating and Mapping Network : aimNet ʲ*OUFHSBUJOH.PEVMFʳ ɹ΍͍ͬͯΔ͜ͱ͸ɼ ɹɹ4FMG"UUFOUJPOͱಉ͡ɽ

    ɹɹ  ɹɹ  ɹ͜Ε͸*NBHF$BQUJPOJOHͰ༗ޮͱ͍͏͜ͱ͕ࣔ͞Ε͍ͯΔɽʢ:BPFUBMʣ ⃗ Ii = FFN(MHA( ⃗ Ia , ⃗ Ia , ⃗ Ia )) ⃗ T i = FFN(MHA( ⃗ T a , ⃗ T a , ⃗ T a )) Experiment Setting ɹBUUFOUJPOIFBEͷݸ਺͸ɼGFFEGPSXBSEOFUXPSLͷ࣍ ݩ਺͸
  12. Aligning, Integrating and Mapping Network : aimNet w"MJHOJOHNPEVMF ɹWJTVBMGFBUVSFͱUFYUVBMGFBUVSFؒͷؔ܎ੑΛରԠ͚ͮΔɽ w*OUFHSBUJOHNPEVMF

    ɹWJTVBMGFBUVSF಺ɼ͓ΑͼUFYUVBMGFBUVSF಺Ͱͷؔ܎ੑΛରԠ͚ͮΔɽ w.BQQJOH.PEVMF ɹλεΫ͝ͱͷೖྗαΠζʹม׵͢Δɽ
  13. Aligning, Integrating and Mapping Network : aimNet ʲ.BQQJOH.PEVMFʳ ɹҎԼͷΑ͏ʹɼ.BQQJOHؔ਺Λ׆ੑԽؔ਺͕UBOIͷ૚ͷχϡʔϥϧωοτͱ͠ ͯఆٛ͢Δɽ

    ɹɹɹɹɹɹɹɹɹɹ  ɹ.BQQJOH.PEVMFͰ͸ɼλεΫ͝ͱͷσίʔμͷೖྗۭؒʹલͭͷϞδϡʔϧ͔ ΒಘΒΕͨಛ௃Λ߹ΘͤΔͨΊʹҎԼͷΑ͏ͳม׵Λߦ͏ɽ ɹɹɹɹɹɹɹɹɹɹ  ɹ͜ͷϞδϡʔϧ͸ɼҰൠతʹطଘख๏ͷଟ͘Ͱ࢖༻͞Ε͍ͯΔɽ#BTFMJOFͰ͸ɼ $//ͳͲʢFH'BTUFS3$//ʣ͔Βͷಛ௃ϚοϓΛ͜ͷϞδϡʔϧʹద༻͢Δɽ Mapping(x) = tanh(xWm + bm )Wmm + bmm LayerNorm(Mapping( ⃗ It ) + Mapping( ⃗ T t ))
  14. Implementation ɹ͞·͟·ͳλεΫ΍σʔληοτͷঢ়گԼͰ૊Έ߹ΘֶͤͨशΛߦ͏ɽ'-ʹ͓͚ Δͭͷঢ়گΛ.5-తͳղऍΛ͢Δɽ'-ʹ͓͚ΔΫϥΠΞϯτΛλεΫɼಛ௃ۭؒΛ σʔληοτͱΈͳͨ͠ɽ ɹධՁࢦඪ͸ɼ *NBHF$BQUJPOJOH41*$& $*%&S .&5&03 #-&6 72"UFTUTUBOEBSETFU

    '-ʹ͓͚Δঢ়گ λεΫ σʔληοτ )'- ಉ͡ Image Captioning) ҟͳΔ MSCOCO & Flickr30k 7'- ҟͳΔ Image Captioning & VQA ಉ͡ MSCOCO & VQA v2.0 '5- ҟͳΔ Image Captioning & VQA ҟͳΔ Flickr30k & VQA v2.0 ˞72"Wͷೖྗը૾͸.4$0$0ͷσʔλͰߏ੒͞Ε͍ͯΔ
  15. Implementation : Vertical Federated Learning Experiment Setting Baseline(Image Captioning Decoder)

    • Spatial ( Lu et al. 2017 ) • NBT ( Lu et al. 2017 ) Baseline(Visual Question Answering) • BUTB ( Anderson et al. 2018 ) • NBT ( Kim, Jun and Zhang 2018 )
  16. Implementation : Federated Transfer Learning Experiment Setting Baseline(Image Captioning Decoder)

    • Spatial ( Lu et al. 2017 ) • NBT ( Lu et al. 2017 ) Baseline(Visual Question Answering) • BUTB ( Anderson et al. 2018 ) • NBT ( Kim, Jun and Zhang 2018 )
  17. ·ͱΊ wEBUBMFBLBHFͷ໰୊Λ๊͍͑ͯͨWJTJPOBOEMBOHVBHFHSPVOEJOHQSPCMFNʹ͓ ͚Δ.5-ʹ'-ͷϑϨʔϜϫʔΫΛద༻͢Δ͜ͱͰɼ໰୊ͷղܾͱͭͷλεΫɼ· ͨ͸σʔληοτΑΓ΋ੑೳ޲্Λୡ੒ͨ͠ɽ w'-ͷ৚݅ )'- 7-' '5- ΛλεΫ΍σʔληοτʹରͯ͠ߟ͑Δ͜ͱʹΑΓɼॊ ೈʹఏҊख๏͸ରԠͰ͖Δɽ

    wେ͖ͳσʔληοτΑΓ΋খ͞ͳσʔληοτ͸ɼΑΓޮՌతͩͱ͍͏݁Ռ͕ݱΕ ͨɽ ʲײ૝ʳ wͦ΋ͦ΋.5-ʹ͓͚ΔEBUBMFBLBHF͸WJTJPOBOEMBOHVBHFHSPVOEJOH QSPCMFNͷΑ͏ͳɼ΍΍ಛघͳλεΫʹͷΈଘࡏ͢ΔͷͰɼ͍͢͝൚༻తͳख๏Ͱ ͸ͳ͘ɼγϯϓϧͳઃఆͳΒλεΫΛ෼ׂ͠ͳ͍ํ͕͍͍ͩΖ͏ɽ w'-ͷΞϧΰϦζϜΛผͷλεΫʹԠ༻ΑΓ΋ɼ७ਮͳ'-ͷݚڀͷ΄͏͕ࣗ෼͸ڵ ຯ͕͋Δͱࢥͬͨɽ w'FEFSBUFE-FBSOJOH͕λΠτϧͷओޠ͸͓͔͍͠