Upgrade to Pro — share decks privately, control downloads, hide ads and more …

20220112_AI勉強会

M.Inomata
January 11, 2022

 20220112_AI勉強会

M.Inomata

January 11, 2022
Tweet

More Decks by M.Inomata

Other Decks in Technology

Transcript

  1. ࣗݾ঺հ ழມ ॆԝ (͍ͷ·ͨ ΈͭͻΖ) גࣜձࣾ tech vein / DeepRad

    גࣜձࣾ ֤୅දऔక໾ ݉ σϕϩούʔ twitter: @ino2222
  2. Top10 Recent 1. Plenoxels: Radiance Fields without Neural Networks 2.

    GLIDE: Towards Photorealistic Image Generation and Editing with Text- Guided Diffusion Models 3. Uni-Perceiver: Pre-training Uni fi ed Architecture for Generic Perception for Zero-shot and Few-shot Tasks 4. Masked Feature Prediction for Self-Supervised Visual Pre-Training 5. Self-attention Does Not Need $O(n^2)$ Memory 6. Mega-NeRF: Scalable Construction of Large-Scale NeRFs for Virtual Fly- Throughs 7. Exploring the Equivalence of Siamese Self-Supervised Learning via A Uni fi ed Gradient Framework 8. Improving language models by retrieving from trillions of tokens 9. BEVT: BERT Pretraining of Video Transformers 10. SLIP: Self-supervision meets Language-Image Pre-training
  3. Top10 Hype 1. Critical Sentence Identi fi cation in Legal

    Cases Using Multi-Class Classi fi cation 2. Seeking Sinhala Sentiment: Predicting Facebook Reactions of Sinhala Posts 3. Plenoxels: Radiance Fields without Neural Networks 4. Show Your Work: Scratchpads for Intermediate Computation with Language Models 5. BANMo: Building Animatable 3D Neural Models from Many Casual Videos 6. GLIDE: Towards Photorealistic Image Generation and Editing with Text- Guided Diffusion Models 7. Ef fi cient Geometry-aware 3D Generative Adversarial Networks 8. Player of Games 9. NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation 10. FuseDream: Training-Free Text-to-Image Generation with Improved CLIP+GAN Space Optimization 1JDL6Q
  4. Top10 Recent (ςʔϚผ) 1. Plenoxels: Radiance Fields without Neural Networks

    2. GLIDE: Towards Photorealistic Image Generation and Editing with Text- Guided Diffusion Models 3. Uni-Perceiver: Pre-training Uni fi ed Architecture for Generic Perception for Zero-shot and Few-shot Tasks 4. Masked Feature Prediction for Self-Supervised Visual Pre-Training 5. Self-attention Does Not Need $O(n^2)$ Memory 6. Mega-NeRF: Scalable Construction of Large-Scale NeRFs for Virtual Fly- Throughs 7. Exploring the Equivalence of Siamese Self-Supervised Learning via A Uni fi ed Gradient Framework 8. Improving language models by retrieving from trillions of tokens 9. BEVT: BERT Pretraining of Video Transformers 10. SLIP: Self-supervision meets Language-Image Pre-training CLIP CLIP NeRF NeRF Video Video NLP SSL SSL SSL Transformer Transformer Transformer Transformer
  5. Top10 Hype (ςʔϚผ) 1. Critical Sentence Identi fi cation in

    Legal Cases Using Multi-Class Classi fi cation 2. Seeking Sinhala Sentiment: Predicting Facebook Reactions of Sinhala Posts 3. Plenoxels: Radiance Fields without Neural Networks 4. Show Your Work: Scratchpads for Intermediate Computation with Language Models 5. BANMo: Building Animatable 3D Neural Models from Many Casual Videos 6. GLIDE: Towards Photorealistic Image Generation and Editing with Text- Guided Diffusion Models 7. Ef fi cient Geometry-aware 3D Generative Adversarial Networks 8. Player of Games 9. NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation 10. FuseDream: Training-Free Text-to-Image Generation with Improved CLIP+GAN Space Optimization CLIP NeRF Video NLP NLP NLP NLP Transformer GAN GAN NeRF CLIP NeRF
  6. Top Hype 7. δΦϝτϦΛߟྀͨ͠ޮ཰తͳ3࣍ݩੜ੒Adversarial Networks (ݪจ: Ef fi cient Geometry-aware

    3D Generative Adversarial Networks) ୯؟ͷ2DࣸਅΛ༻͍ͯɺଟ؟తʹ੔߹ੑͷ͋Δߴ඼࣭ͳը૾΍3Dܗঢ়Λڭࢣͳ͠Ͱੜ੒͢Δ͜ͱ ͸ɺ௕೥ͷ՝୊Ͱͨ͠ɻطଘͷ3࣍ݩGAN͸ɺܭࢉྔ͕ଟ͍͔ɺ3࣍ݩతʹ੔߹ੑͷͳ͍ۙࣅΛߦ ͏͔ͷ͍ͣΕ͔Ͱ͋Γɺલऀ͸ੜ੒͞ΕΔը૾ͷ඼࣭ͱղ૾౓Λ੍ݶ͠ɺޙऀ͸ଟࢹ఺ͷ੔߹ੑ ͱܗঢ়ͷ඼࣭ʹѱӨڹΛ༩͑ΔɻຊݚڀͰ͸ɺ͜ΕΒͷۙࣅʹա౓ʹґଘ͢Δ͜ͱͳ͘ɺ3D GAN ͷܭࢉޮ཰ͱը࣭Λ޲্ͤ͞Δɻ͜ͷ໨తͷͨΊʹɺզʑ͸දݱྗ๛͔ͳ໌ࣔత-ඇ໌ࣔతϋΠϒ ϦουωοτϫʔΫΞʔΩςΫνϟΛಋೖ͠ɺଞͷઃܭ্ͷબ୒ͱ߹Θͤͯɺߴղ૾౓ͷଟࢹ఺Ұ ؏ੑͷ͋Δը૾ΛϦΞϧλΠϜͰ߹੒͢Δ͚ͩͰͳ͘ɺߴ඼࣭ͷ3Dܗঢ়Λੜ੒͠·͢ɻಛ௃ྔͷ ੜ੒ͱχϡʔϥϧϨϯμϦϯάΛ੾Γ཭͢͜ͱͰɺզʑͷϑϨʔϜϫʔΫ͸StyleGAN2ͷΑ͏ͳ࠷ ઌ୺ͷ2D CNNδΣωϨʔλΛ׆༻͠ɺͦͷޮ཰ੑͱදݱྗΛड͚ܧ͙͜ͱ͕Ͱ͖·͢ɻFFHQͱ AFHQ CatsΛ༻͍࣮ͨݧͳͲʹΑΓɺ࠷ઌ୺ͷ3DରԠ߹੒Λ࣮ূ͍ͯ͠·͢ɻ w ໨తɿ%ࣸਅ͔Β࣍ݩΛߟྀͨ͠%ࣸਅΛੜ੒͢Δ("/ͷੑೳ޲্ w ੒Ռɿਓؒɾಈ෺ͷإࣸਅΛཱମతʹॲཧͯ͠ը૾Λੜ੒͢Δ("/ͷ4P5"Λ։ൃ w ํ๏ɿ4UZMF("/ 7PMVNF3FOEFSFS w ݻ༗໊ɿ&(% & ff i DJFOU(FPNFUSZBXBSF%(FOFSBUJWF"EWFSTBSJBM/FUXPSLT  w ஶऀॴଐɿ4UBOGPSE6OJWFSTJUZ/7*%*" http://arxiv.org/abs/2112.07945v1
  7. 1. Plenoxels:χϡʔϥϧωοτϫʔΫͷͳ͍ϥσΟΞϯεϑΟʔϧυ (ݪจ: Plenoxels: Radiance Fields without Neural Networks) ϑΥτϦΞϦεςΟοΫͳϏϡʔ߹੒ͷͨΊͷγεςϜɺPlenoxels

    (plenoptic voxels)Λ঺հ͠·͢ɻPlenoxels͸ɺγʔϯΛٿ໘ௐ࿨ͷ ͋Δૄͳ3࣍ݩάϦουͱͯ͠දݱ͠·͢ɻ͜ͷදݱ͸ɺΩϟϦϒ Ϩʔγϣϯ͞Εͨը૾͔Βɺޯ഑๏ͱਖ਼ଇԽʹΑͬͯɺਆܦ੒෼Λ ؚ·ͣʹ࠷దԽ͢Δ͜ͱ͕Ͱ͖·͢ɻඪ४తͳϕϯνϚʔΫλεΫ ʹ͓͍ͯɺPlenoxels͸Neural Radiance FieldsΑΓ΋2ܻҎ্ߴ଎ʹ ࠷దԽ͞Εɺࢹ֮తͳ඼࣭Λଛͳ͏͜ͱ͸͋Γ·ͤΜɻ w ໨తɿϏϡʔ߹੒γεςϜͷվળ w ੒Ռɿ/F3' ͱಉਫ਼౓Ͱɺഒ଎ֶ͘शͰ͖ΔϞσϧ͕Ͱ͖ͨ w ํ๏ɿ/F3'ΞϧΰϦζϜΛࢀߟʹඇχϡʔϥϧωοτͰ࠶࣮૷ͨ͠ 57 4)  w ݻ༗໊ɿ1MFOPYFMT QMFOPQUJDWPYFMT  w ஶऀॴଐɿ6$#FSLFMFZ http://arxiv.org/abs/2112.05131v1
  8. d

  9. 2. GLIDE:ςΩετ༠ಋܕ֦ࢄϞσϧʹΑΔϑΥτϦΞϦεςΟοΫͳը૾ੜ੒ɾฤूΛ໨ࢦͯ͠ (ݪจ: GLIDE: Towards Photorealistic Image Generation and Editing

    with Text-Guided Diffusion Models) ֦ࢄϞσϧ͸ɺಛʹɺଟ༷ੑͱ஧࣮ੑΛτϨʔυΦϑ͢ΔΨΠμϯεٕज़ͱ૊Έ߹Θͤͨ৔߹ʹɺ ߴ඼࣭ͷ߹੒ը૾Λੜ੒͢Δ͜ͱ͕࠷ۙࣔ͞Ε͍ͯΔɻຊݚڀͰ͸ɺςΩετΛ৚݅ͱͨ͠ը૾߹ ੒ͷ໰୊ʹର͢Δ֦ࢄϞσϧΛݕ౼͠ɺ2ͭͷҟͳΔΨΠμϯεઓུΛൺֱ͢ΔɻຊݚڀͰ͸ɺς Ωετ৚݅෇͖ը૾߹੒໰୊ʹର͢Δ֦ࢄϞσϧΛݕ౼͠ɺCLIPΨΠμϯεͱ෼ྨثͳ͠ΨΠμϯ εͱ͍͏2ͭͷΨΠμϯεઓུΛൺֱͨ͠ɻͦͷ݁Ռɺޙऀͷํ͕ɺ࣮ࣸੑͱΩϟϓγϣϯͷྨࣅ ੑͷ྆ํʹ͓͍ͯਓؒͷධՁऀʹ޷·Εɺ࣮ࣸతͳαϯϓϧΛੜ੒͢Δ͜ͱ͕ଟ͍͜ͱ͕Θ͔ͬ ͨɻ·ͨɺ35ԯݸͷύϥϝʔλΛ࣋ͭςΩετ৚݅෇͖֦ࢄϞσϧΛ༻͍ͯɺ෼ྨثෆཁͷΨΠμ ϯεΛߦͬͨ৔߹ɺDALL-EͷαϯϓϧΑΓ΋ਓؒͷධՁऀʹ޷·ΕΔ͜ͱ͕෼͔Γ·ͨ͠ɻ͞Β ʹɺ͜ͷϞσϧΛඍௐ੔ͯ͠ը૾ͷΠϯϖΠϯςΟϯάΛߦ͏͜ͱͰɺςΩετۦಈܕͷڧྗͳը ૾ฤू͕ՄೳʹͳΔ͜ͱ͕Θ͔Γ·ͨ͠ɻϑΟϧλϦϯά͞ΕͨσʔληοτͰΑΓখ͞ͳϞσϧ Λ܇࿅͠ɺͦͷίʔυͱॏΈΛ https://github.com/openai/glide-text2im Ͱެ։͠·ͨ͠ɻ w ໨తɿUFYUUPJNBHFϞσϧͷվྑ w ੒ՌɿςΩετ͔Βͷը૾ͷੜ੒ɾ෦෼ฤू͕ՄೳͳߴੑೳϞσϧΛ։ൃ 
 ϑϧαΠζ͸ߴੑೳ͗͢ΔͷͰɺෆਖ਼๷ࢭ༻ͷϑΟϧλ͋ΓσʔληοτͱখܕϞσϧͷΈެ։  w ํ๏ɿ$-*1 %J ff VTJPO.PEFM w ݻ༗໊ɿ(-*%& (VJEFE-BOHVBHFUP*NBHF%J ff VTJPOGPS(FOFSBUJPOBOE&EJUJOH  w ஶऀॴଐɿ0QFO"* http://arxiv.org/abs/2112.10741v2
  10. 3. Uni-Perceiver:θϩγϣοτ͓Αͼ਺γϣοτͷλεΫͷͨΊͷ൚༻తͳ஌֮ͷͨΊͷࣄલֶश༻౷߹ΞʔΩςΫνϟ (ݪจ: Uni-Perceiver: Pre-training Uni fi ed Architecture for

    Generic Perception for Zero-shot and Few-shot Tasks) ಈ෺ͷੜ෺ֶతͳ஌ೳγεςϜ͸ɺҟͳΔϞμϦςΟͷ৘ใΛ౷߹͠ɺ༷ʑͳλεΫͷͨΊʹಉ࣌ʹॲཧ͢Δ͜ͱͰ ੈքΛೝ͍ࣝͯ͠·͢ɻҰํɺݱࡏͷػցֶशͷݚڀ͸ɺλεΫʹಛԽͨ͠ύϥμΠϜʹै͍ͬͯΔͨΊɺλεΫؒ ͷඇޮ཰ͳ࿈ܞ΍ɺ৽͍͠λεΫͷͨΊͷ஌֮Ϟσϧͷ։ൃʹ͔͔Δߴ͍ݶքίετʹͭͳ͕͍ͬͯΔɻຊ࿦จͰ ͸ɺUni-Perceiverͱ໊෇͚ΒΕͨ൚༻తͳ஌֮ΞʔΩςΫνϟΛ঺հ͢ΔɻUni-Perceiver͸ɺ౷Ұ͞ΕͨϞσϦϯά ͱڞ༗ύϥϝʔλͰ༷ʑͳϞμϦςΟͱλεΫΛॲཧ͢Δɻ۩ମతʹ͸ɺUni-Perceiver͸ɺ೚ҙͷϞμϦςΟ͔Βͷ ҟͳΔλεΫೖྗͱλʔήοτΛɺϞμϦςΟʹͱΒΘΕͳ͍TransformerΤϯίʔμͱܰྔͳϞμϦςΟݻ༗ͷ τʔΫϯԽثΛ༻͍ͯɺ౷Ұ͞ΕͨදݱۭؒʹΤϯίʔυ͠·͢ɻҟͳΔ஌֮λεΫ͸ɺಉ͡ఆࣜԽͱͯ͠ϞσϧԽ ͞Ε·͢ɻͭ·ΓɺͦΕͧΕͷೖྗʹର͢Δ࠷େ໬౓ͷλʔήοτΛɺදݱͷྨࣅੑΛ௨ͯ͠ݟ͚ͭΔͷͰ͢ɻ͜ͷ Ϟσϧ͸ɺ͍͔ͭ͘ͷϢχϞʔμϧ͓ΑͼϚϧνϞʔμϧͳλεΫͰࣄલʹֶश͞Εɺࣄલֶशͷஈ֊Ͱ͸ొ৔͠ͳ ͔ͬͨ৽نλεΫΛؚΉɺ͞·͟·ͳԼྲྀλεΫͰධՁ͞ΕΔɻͦͷ݁Ռɺνϡʔχϯάͳ͠Ͱࣄલֶशͨ͠Ϟσϧ ͸ɺ৽نλεΫͰ͋ͬͯ΋ଥ౰ͳੑೳΛୡ੒Ͱ͖Δ͜ͱ͕Θ͔ͬͨɻ·ͨɺԼྲྀλεΫͷ1%ͷσʔλʹରͯ͠ਝ଎ͳ νϡʔχϯάΛߦ͏͜ͱͰɺ࠷ઌ୺ͷख๏ʹ͍ۙϨϕϧ·ͰੑೳΛ޲্ͤ͞Δ͜ͱ͕Ͱ͖Δɻ͞ΒʹɺϑϧσʔλͰ ͷඍௐ੔Λߦ͏͜ͱͰɺ࠷ઌ୺ͷख๏ͱಉ౳Ҏ্ͷ݁ՌΛಘΔ͜ͱ͕Ͱ͖ΔɻίʔυΛެ։͠·͢ɻ w ໨తɿ5SBOTGPSNFSͷϚϧνλεΫֶशɾར༻ͷվળ w ੒Ռɿଟ༷ͳೖग़ྗܗࣜʹରԠͨ͠5SBOTGPSNFSʮ6OJ1FSDFJWFSʯͷ։ൃ w ํ๏ɿ5SBOTGPSNFS ϞμϦςΟݻ༗ͷτʔΫϯԽث w ݻ༗໊ɿ6OJ1FSDFJWFS w ஶऀॴଐɿ4FOTF5JNF3FTFBSDI੢҆ަ௨େֶ߳ߓதจେֶ http://arxiv.org/abs/2112.01522v1
  11. 4. ϚεΫ͞Εͨಛ௃ͷ༧ଌʹΑΔࣗݾڭࢣ෇͖ࢹ֮త༧උ܇࿅ (ݪจ: Masked Feature Prediction for Self-Supervised Visual Pre-Training)

    զʑ͸ɺϏσΦϞσϧͷࣗݾڭࢣ෇͖ࣄલֶशͷͨΊͷMasked Feature Prediction (MaskFeat)Λൃද͢ Δɻຊख๏Ͱ͸ɺ·ͣɺೖྗγʔέϯεͷҰ෦ΛϥϯμϜʹϚεΫ͠ɺ࣍ʹɺϚεΫ͞ΕͨྖҬͷಛ௃ Λ༧ଌ͢Δɻ5छྨͷಛ௃ྔΛݕ౼ͨ݁͠Ռɼख࡞Γͷಛ௃ྔهड़ࢠͰ͋ΔHistograms of Oriented GradientsʢHOGʣ͕ɼੑೳͱޮ཰ͷ྆໘Ͱಛʹ༏Ε͍ͯΔ͜ͱ͕Θ͔ͬͨɽ·ͨɺHOGͷہॴతͳί ϯτϥετͷਖ਼نԽ͸ɺྑ޷ͳ݁ՌΛಘΔͨΊʹෆՄܽͰ͋Γɺ͜Ε͸ɺHOGΛࢹ֮ೝࣝʹ༻͍ͨҎલ ͷݚڀͱಉ༷Ͱ͋Δ͜ͱ͕Θ͔ͬͨɻզʑͷΞϓϩʔν͸ɺ๛෋ͳࢹ֮త஌ࣝΛֶश͠ɺେن໛ͳ TransformerϕʔεͷϞσϧΛۦಈ͢Δ͜ͱ͕Ͱ͖·͢ɻϞσϧͷॏΈ΍؂ࢹΛ௥Ճ͢Δ͜ͱͳ͘ɺϥϕ ϧͷͳ͍ϏσΦͰࣄલֶश͞ΕͨMaskFeat͸ɺKinetics-400ͷMViT-LͰ86.7%ɺKinetics-600Ͱ 88.3%ɺKinetics-700Ͱ80.4%ɺAVAͰ38.8mAPɺSSv2Ͱ75.0%ͱ͍͏ɺ͜Ε·Ͱʹͳ͍݁ՌΛୡ੒͠ ͨɻMaskFeat͸͞Βʹɺ1ϑϨʔϜͷಈըͱղऍͰ͖Δը૾ೖྗʹ΋ҰൠԽ͠ɺImageNetͰڝ૪ྗͷ ͋Δ݁ՌΛಘ͍ͯ·͢ɻ w ໨తɿϏσΦΛର৅ʹͨ͠ϚεΩϯάը૾ิ׬ͷֶश w ੒Ռɿࣗݾڭࢣ͖ͭϏσΦֶशํ๏.BTL'FBUͷ։ൃ w ํ๏ɿϚεΫ͞Εͨίϯςϯπͷಛ௃ )0( Λ௚઀ճؼͯ͠ࣄલֶश͢Δ w ݻ༗໊ɿ.BTL'FBU .BTLFE'FBUVSF1SFEJDUJPO  w ஶऀॴଐɿ'BDFCPPL"*3FTFBSDI+PIOT)PQLJOT6OJWFSTJUZ http://arxiv.org/abs/2112.09133v1
  12. 5. ࣗݾೝࣝʹ͸On2 ͷهԱ͕ඞཁͳ͍ (ݪจ: Self-attention Does Not Need On2 Memory)

    ຊ࿦จͰ͸ɺ഑ྻͷ௕͞ʹରͯ͠O1 ͷϝϞϦΛඞཁͱ͢Δඇৗʹ୯७ͳ஫ҙͷΞϧΰϦζ ϜͱɺOn2ͷϝϞϦΛඞཁͱ͢Δࣗݾ஫ҙ΁ͷ֦ுΛࣔ͠·͢ɻ͜Ε͸ɺࣗݾ஫ҙ͕On2ͷ ϝϞϦΛඞཁͱ͢ΔͱΑ͘ݴΘΕΔͷͱ͸ରরతͰ͢ɻ࣌ؒతͳෳࡶ͞͸ґવͱͯ͠On2Ͱ ͕͢ɺ࠷ۙͷՃ଎ثͰ͸ɺܭࢉೳྗͰ͸ͳ͘σόΠεͷϝϞϦ੍͕ݶཁҼͱͳΔ͜ͱ͕Α͘ ͋Γ·͢ɻͦͷͨΊɺΞςϯγϣϯʹඞཁͳϝϞϦྔΛݮΒ͢͜ͱͰɺଞͷํ๏Ͱ͸࣮ݱͰ ͖ͳ͍Α͏ͳ௕͍γʔέϯεͷॲཧ͕ՄೳʹͳΓ·͢ɻຊݚڀͰ͸ɺΞΫηϥϨʔλ༻ͷ࣮ ༻తͳ࣮૷Λఏڙ͠·͢ɻ͜ͷ࣮૷Ͱ͸ɺO√nͷϝϞϦΛඞཁͱ͠ɺ਺஋తʹ҆ఆ͓ͯ͠ Γɺඪ४తͳΞςϯγϣϯͷ࣮૷ͷϥϯλΠϜͷ਺ύʔηϯτҎ಺ʹऩ·͍ͬͯ·͢ɻ· ͨɺϝϞϦޮ཰Λҡ࣋͠ͳ͕Βؔ਺Λඍ෼͢Δํ๏Λࣔ͠·͢ɻ഑ྻ௕16384ʹରͯ͠ɺࣗ ݾ஫໨ͷϝϞϦΦʔόʔϔου͸ɺਪ࿦Ͱ59ഒɺඍ෼Ͱ32ഒʹ࡟ݮ͞Εͨɻ w ໨తɿ4FMGBUUFOUJPOϞσϧͷলྗԽ w ੒Ռɿ5SBOTGPSNFSͷϝϞϦඞཁྔΛOͷ৐͔Β㲋ʹ࡟ݮ w ํ๏ɿ5SBOTGPSNFSͷϝϞϦίετ͕ߴ͍࣮૷Λ෦෼తʹมߋͨ͠ w ݻ༗໊ɿͳ͠ w ஶऀॴଐɿ(PPHMF3FTFBSDI http://arxiv.org/abs/2112.05682v2
  13. 6. Mega-NeRF: Ծ૝ϑϥΠεϧʔͷͨΊͷେن໛NeRFͷεέʔϥϒϧͳߏங (ݪจ: Mega-NeRF: Scalable Construction of Large-Scale NeRFs

    for Virtual Fly-Throughs) ຊݚڀͰ͸ɺओʹυϩʔϯσʔλ͔Βऩूͨ͠ɺϏϧ΍֗۠ʹ·͕ͨΔେن໛ͳϏδϡΞϧΩϟϓνϟ͔ΒɺχϡʔϥϧϥδΞ ϯεϑΟʔϧυʢNeRFʣΛ׆༻ͯ͠ɺΠϯλϥΫςΟϒͳ3D؀ڥΛߏங͢Δํ๏Λݕ౼͍ͯ͠·͢ɻैདྷɺNeRF͕ධՁ͞Εͯ ͖ͨ୯Ұ෺ମͷγʔϯͱ͸ରরతʹɺ͜ͷઃఆͰ͸ɺ(1)γʔϯͷখ͞ͳαϒηοτ͔͠ଊ͍͑ͯͳ͍ɺর໌৚݅ͷҟͳΔԿઍ΋ ͷը૾ΛऔΓࠐΉඞཁ͕͋Δɺ(2)୯ҰͷGPUͰૉ๿ʹֶशͰ͖ΔൣғΛ௒͑ͨɺ๏֎ʹߴ͍Ϟσϧ༰ྔͱϨΠαϯϓϦϯάͷ ཁٻ͕͋Δɺ(3)೚ҙͷ਺ͷՄೳͳࢹ఺͕͋ΔͨΊɺ(ϦΞϧλΠϜNeRFϨϯμϥʔ͕௨ৗߦ͏Α͏ʹ)͢΂ͯͷؔ࿈৘ใΛࣄલʹ ܭࢉ͢Δ͜ͱ͸ෆՄೳͰ͋Δɺͱ͍ͬͨෳ਺ͷ՝୊͕͋Γ·͢ɻ͜ΕΒͷ՝୊Λղܾ͢ΔͨΊʹɺ·ͣɺେن໛ͳγʔϯͷՄࢹ ੑ౷ܭΛ෼ੳ͠ɺύϥϝʔλ͕γʔϯͷҟͳΔྖҬʹಛԽ͞Ε͍ͯΔૄͳωοτϫʔΫߏ଄ͷಈػ෇͚Λߦ͍·͢ɻ͞Βʹɺγ ϯϓϧͳزԿֶతΫϥελϦϯάΞϧΰϦζϜΛಋೖ͠ɺτϨʔχϯάը૾ʢͱ͍͏ΑΓ΋ϐΫηϧʣΛɺฒྻʹτϨʔχϯά Ͱ͖ΔҟͳΔNeRFαϒϞδϡʔϧʹ෼ׂ͠·͢ɻQuad 6kσʔληοτɺUrbanScene3Dσʔληοτɺ͓Αͼզʑͷυϩʔ ϯө૾͔Βऔಘͨ͠γʔϯΛର৅ʹɺզʑͷΞϓϩʔνΛධՁͨ͠ͱ͜ΖɺฏۉͰPSNRΛ11ˋҎ্޲্ͤ͞ͳ͕Βɺֶश଎౓ Λ3ഒʹ޲্ͤ͞Δ͜ͱ͕Ͱ͖·ͨ͠ɻଓ͍ͯɺMega-NeRFʹՃ͑ͯ࠷ۙͷNeRFߴ଎Ϩϯμϥʔͷ࣮ূධՁΛߦ͍ɺ࣌ؒతͳ ίώʔϨϯεΛར༻ͨ͠৽͍͠ख๏Λ঺հ͠·͢ɻզʑͷख๏͸ɺPSNR඼࣭Λ0.5dbҎ಺ʹ཈͑ͳ͕ΒɺैདྷͷNeRFϨϯμϦ ϯάʹൺ΂ͯ40ഒͷߴ଎ԽΛୡ੒͠ɺطଘͷߴ଎Ϩϯμϥʔͷ஧࣮౓Λ্ճΔ݁Ռͱͳͬͨɻ w ໨తɿΠϯλϥΫςΟϒͳେن໛/F3'%؀ڥͷߏங w ੒Ռɿ/F3' ͱಉੑೳͰ̏ഒ଎͍େن໛ۭؒ޲͚/F3'Λ։ൃ w ํ๏ɿυϩʔϯө૾σʔληοτ౳Λର৅ʹۭؒ෼ׂͯ͠ฒྻͰ/F3'ֶशͨ͠ w ݻ༗໊ɿ.FHB/F3' w ஶऀॴଐɿ$BSOFHJF.FMMPO6OJWFSTJUZ"SHP"* ΧϦϑΥϧχΞͷࣗಈӡసελʔτΞοϓ http://arxiv.org/abs/2112.10703v1
  14. 7. ౷Ұޯ഑ϑϨʔϜϫʔΫʹΑΔSiamese Self-Supervised Learningͷ౳Ձੑͷ୳ٻ (ݪจ: Exploring the Equivalence of Siamese

    Self-Supervised Learning via A Uni fi ed Gradient Framework) ࣗݾڭࢣ෇ֶ͖श͸ɺਓؒͷΞϊςʔγϣϯͳ͠Ͱڧྗͳࢹ֮දݱΛநग़͢ΔͨΊͷେ͖ͳՄೳੑΛ͍ࣔͯ͠Δɻࣗ ݾڭࢣ෇ֶ͖शΛ༷ʑͳ؍఺͔Βѻ͏ͨΊʹɺ༷ʑͳ࡞඼͕ఏҊ͞Ε͍ͯΔɻ(1) ରൺֶश๏ʢMoCo, SimCLRͳͲʣ ͸ɺֶशͷํ޲ੑΛܾΊΔͨΊʹਖ਼ෛ྆ํͷαϯϓϧΛར༻͢Δɻ(2) ඇରশωοτϫʔΫ๏ʢBYOL, SimSiamͳ Ͳʣ͸ɺ༧ଌωοτϫʔΫͷಋೖͱఀࢭޯ഑ૢ࡞ʹΑͬͯෛͷαϯϓϧΛऔΓআ͘ɻ(3) ಛ௃૷০๏ʢBarlow Twins, VICRegͳͲʣ͸ɺಛ௃࣍ݩؒͷ৑௕ੑΛݮΒ͢͜ͱΛ໨తͱ͍ͯ͠Δɻ͜ΕΒͷख๏͸ɺ༷ʑͳಈػ͔Βɺઃܭ͞ Εͨଛࣦؔ਺͕͔ͳΓҟͳ͍ͬͯΔΑ͏Ͱ͢ɻ·ͨɺ࠷ऴతͳਫ਼౓΋༷ʑͰɺ࡞඼ʹΑͬͯҟͳΔωοτϫʔΫ΍τ ϦοΫ͕ར༻͞Ε͍ͯ·͢ɻຊݚڀͰ͸ɺ͜ΕΒͷख๏͕ಉ͡ܗࣜʹ౷ҰͰ͖Δ͜ͱΛࣔ͠·͢ɻͦΕͧΕͷଛࣦؔ ਺Λൺֱ͢ΔͷͰ͸ͳ͘ɺޯ഑෼ੳʹΑͬͯ౷Ұ͞ΕͨࣜΛಋ͖ग़͠·͢ɻ͞ΒʹɺެฏͰৄࡉͳ࣮ݧΛߦ͍ɺ྆ऀ ͷੑೳΛൺֱ͠·ͨ͠ɻͦͷ݁Ռɺ͜ΕΒͷख๏ͷؒʹ͸΄ͱΜͲΪϟοϓ͕ͳ͘ɺϞϝϯλϜΤϯίʔμͷ࢖༻͕ ੑೳΛ޲্ͤ͞ΔॏཁͳཁૉͰ͋Δ͜ͱ͕Θ͔ͬͨɻ͜ͷ౷Ұ͞ΕͨϑϨʔϜϫʔΫ͔Βɺզʑ͸UniGradΛఏҊ͠ ·͢ɻUniGrad͸ɺࣗݾڭࢣ෇ֶ͖शͷͨΊͷγϯϓϧͰޮՌతͳޯ഑ܗࣜͰ͢ɻUniGrad͸ɺϝϞϦόϯΫ΍༧ଌ ωοτϫʔΫΛඞཁͱ͠ͳ͍͕ɺ࠷ઌ୺ͷੑೳΛୡ੒͢Δ͜ͱ͕Ͱ͖ɺଞͷֶशઓུΛ༰қʹ࠾༻͢Δ͜ͱ͕Ͱ͖ Δɻ·ͨɺઢܗධՁ΍ଟ͘ͷԼྲྀλεΫͰͷ޿ൣͳ࣮ݧʹΑΓɺͦͷ༗ޮੑ͕ࣔ͞Ε͍ͯ·͢ɻίʔυΛެ։͢Δɻ w ໨తɿࣗݾڭࢣֶ͖ͭशΞϧΰϦζϜͷ౷ҰԽ w ํ๏ɿ֤ࣗݾڭࢣ෇ֶ͖शख๏Λ෼ੳ w ੒Ռɿࣗݾڭࢣֶ͖ͭशͷͨΊͷڞ௨ͷࣜΛಋ͖ग़ͯ͠ɺطଘͱ΄΅ಉ౳ͷਫ਼౓ͩͬͨ w ݻ༗໊ɿ6OJ(SBE w ஶऀॴଐɿਗ਼՚େֶ4FOTF5JNF3FTFBSDIᔳߐେֶ๺ژਓ޻஌ೳݚڀӃ http://arxiv.org/abs/2112.05141v1
  15. 8. Կஹ΋ͷτʔΫϯ͔Βݕࡧͯ͠ݴޠϞσϧΛվળ͢Δ (ݪจ: Improving language models by retrieving from trillions

    of tokens) େن໛ίʔύε͔Βݕࡧ͞ΕͨจॻνϟϯΫΛɺઌߦ͢ΔτʔΫϯͱͷہॴతͳྨࣅੑʹج͍ͮ ͯ৚݅෇͚͢Δ͜ͱͰɺࣗಈճؼܕݴޠϞσϧΛڧԽ͢Δɻ2ஹݸͷτʔΫϯσʔλϕʔεΛ༻ ͍ͯɺզʑͷRetrieval-Enhanced Transformer (RETRO)͸ɺ25ഒগͳ͍ύϥϝʔλΛ༻͍͍ͯΔ ʹ΋͔͔ΘΒͣɺthe PileσʔληοτͰGPT-3΍Jurassic-1ͱಉ౳ͷੑೳΛಘΔ͜ͱ͕Ͱ͖ͨɻ RETROͷੑೳ͸ɺඍௐ੔ͷޙɺ࣭໰Ԡ౴ͷΑ͏ͳԼྲྀͷ஌ࣝू໿ܕͷλεΫʹม׵͞Ε·͢ɻ RETRO͸frozen Bert retrieverɺdifferential encoderɺchunked cross-attentionػߏΛ૊Έ߹Θ ͤͯɺֶश࣌ʹ௨ৗফඅ͞ΕΔσʔλΑΓ΋ܻҧ͍ʹଟ͘ͷσʔλʹج͍ͮͯτʔΫϯΛ༧ଌ͠ ·͢ɻRETRO͸௨ৗεΫϥον͔Βֶश͠·͕͢ɺࣄલʹֶशͨ͠ม׵ثΛݕࡧ͠ͳ͕Βਝ଎ʹ RETRO fi t͢Δ͜ͱ΋Ͱ͖ɺྑ޷ͳੑೳΛಘΔ͜ͱ͕Ͱ͖·͢ɻࢲͨͪͷݚڀ͸ɺ͜Ε·Ͱʹͳ͍ ن໛ͷ໌ࣔతͳهԱʹΑͬͯݴޠϞσϧΛվળ͢ΔͨΊͷ৽ͨͳಓΛ։͘΋ͷͰ͢ɻ w ໨తɿ(15ɾ+VSBTTJDͷΑ͏ͳࣗಈճؼܕݴޠϞσϧͷվળ w ੒ՌɿطଘͷࣗવݴޠֶशϞσϧ405"ͱಉੑೳͰഒܰྔͳ3&530Λ࣮૷ w ํ๏ɿGSP[FO#FSUSFUSJFWFS EJ ff FSFOUJBMFODPEFS DIVOLFEDSPTTBUUFOUJPO w ݻ༗໊ɿ3&530 3FUSJFWBM&OIBODFE5SBOTGPSNFS  w ஶऀॴଐɿ%FFQ.JOE http://arxiv.org/abs/2112.04426v1
  16. 9. BEVT: BERT Pretraining of Video Transformers (ݪจ: BEVT: BERT

    Pretraining of Video Transformers) ຊ࿦จ͸ɺϏσΦม׵ثͷBERTࣄલֶशʹ͍ͭͯݚڀ͍ͯ͠·͢ɻ͜Ε͸؆୯ͳ͜ͱͰ͕͢ɺ࠷ۙͷը૾ม׵ͷBERT ࣄલֶशͷ੒ޭΛߟ͑Δͱɺݚڀ͢ΔՁ஋ͷ͋Δ֦ுͰ͢ɻຊ࿦จͰ͸ɺϏσΦදݱֶशΛۭؒදݱֶशͱ࣌ؒతμΠ φϛΫεֶशʹ෼཭͢ΔBEVTΛಋೖ͠·͢ɻ۩ମతʹ͸ɺBEVT͸ɺ·ͣը૾σʔλʹରͯ͠ϚεΩϯά͞Εͨը૾Ϟ σϦϯάΛߦ͍ɺ࣍ʹϏσΦσʔλʹରͯ͠ϚεΩϯά͞ΕͨϏσΦϞσϦϯάͱಉ࣌ʹϚεΩϯά͞Εͨը૾ϞσϦ ϯάΛߦ͍·͢ɻ͜ͷઃܭͷಈػ͸ɺ࣍ͷ2ͭͷ఺ʹ͋Γ·͢ɻ1) ը૾σʔλͰֶश͞Εͨม׵ث͸ɼద੾ͳۭؒϓϦʔ ΞΛఏڙ͠ɼεΫϥονͰֶश͞Εͨ৔߹ʹ͸͠͹͠͹ܭࢉෛՙ͕͔͔ΔϏσΦม׵ثͷֶशΛ༰қʹ͢Δ͜ͱ͕Ͱ͖ Δɽ 2) ਖ਼͍͠༧ଌΛߦ͏ͨΊʹඞཁͳࣝผతͳख͕͔Γɼ͢ͳΘۭͪؒత͓Αͼ࣌ؒతͳ৘ใ͸ɼΫϥε಺͓ΑͼΫϥ εؒͷมಈ͕େ͖͍ͨΊɼҟͳΔϏσΦؒͰมԽ͢ΔɽBEVT͕ඇৗʹ༗๬ͳ݁ՌΛಘͨ3ͭͷνϟϨϯδϯάͳϏσΦ ϕϯνϚʔΫͰɺ޿ൣғͳ࣮ݧΛߦ͍·ͨ͠ɻKinetics 400Ͱ͸ɺೝࣝ͸ओʹࣝผతͳۭؒදݱʹґଘ͓ͯ͠ΓɺBEVT ͸ڧྗͳڭࢣ෇͖ϕʔεϥΠϯͱಉ౳ͷ݁ՌΛୡ੒͠·ͨ͠ɻ·ͨɺSomething-Something-V2ͱDiving 48Ͱ͸ɺ࣌ؒ తͳμΠφϛΫεʹґଘ͢ΔϏσΦΛର৅ͱ͍ͯ͠·͕͢ɺBEVT͸ଞͷ͢΂ͯͷϕʔεϥΠϯΑΓ΋໌Β͔ʹ༏Ε͓ͯ ΓɺͦΕͧΕ70.6%ͱ86.7%ͷτοϓ1ਫ਼౓ͱ͍͏࠷ઌ୺ͷੑೳΛୡ੒͠·ͨ͠ɻ w ໨తɿ#&35ͰϏσΦม׵͢Δݚڀ w ੒Ռɿ4PNFUIJOH4PNFUIJOH7ͱ%JWJOHͰτοϓਫ਼౓Λୡ੒ w ํ๏ɿը૾ϚεΩϯά#&35ϞσϧͱϏσΦϚεΩϯά#&35ϞσϧΛڠௐֶͤͯ͞श w ݻ༗໊ɿ#&75 w ஶऀॴଐɿ෮୴େֶίϯ ピ ϡʔλαΠΤϯεֶ෦্ւ஌ೳ৘ใॲཧΩʔϥϘ.JDSPTPGU$MPVE  "* http://arxiv.org/abs/2112.01529v1
  17. 10. SLIP: Self-supervision meets Language-Image Pre-training (ݪจ: SLIP: Self-supervision meets

    Language-Image Pre-training) ࠷ۙͷݚڀͰ͸ɺ೉қ౓ͷߴ͍ࢹ֮ೝࣝλεΫʹ͓͍ͯɺࣗݾڭࢣ෇͖ͷࣄલֶश͕ڭࢣ෇ֶ͖शΑΓ ΋༏Ε͍ͯΔ͜ͱ͕ࣔ͞Ε͍ͯ·͢ɻ·ͨɺݴޠ؂ಜΛ༻ֶ͍ͨशͷ৽͍͠ΞϓϩʔνͰ͋ΔCLIP͸ɺ ༷ʑͳϕϯνϚʔΫͰ༗๬ͳੑೳΛ͍ࣔͯ͠·͢ɻຊݚڀͰ͸ɺࣗݾڭࢣ෇ֶ͖श͕ɺࢹ֮දݱͷֶश ʹ͓͚Δݴޠ؂ಜͷར༻ʹ໾ཱ͔ͭͲ͏͔Λݕ౼͢ΔɻຊݚڀͰ͸ɺࣗݾڭࢣ෇ֶ͖शͱCLIPʹΑΔࣄ લֶशΛ૊Έ߹ΘͤͨϚϧνλεΫֶशϑϨʔϜϫʔΫͰ͋ΔSLIPΛ঺հ͢ΔɻVision TransformersΛ ༻͍ͯࣄલֶशΛߦͬͨޙɺදݱ඼࣭ΛపఈతʹධՁ͠ɺθϩγϣοτసૹɺઢܗ෼ྨɺΤϯυπʔΤ ϯυͷඍௐ੔ͱ͍͏3ͭͷҟͳΔઃఆͷԼͰɺCLIPͱࣗݾڭࢣ෇ֶ͖शͷ྆ํͱੑೳΛൺֱ͠·͢ɻ ImageNet͓Αͼͦͷଞͷσʔληοτʹ͓͍ͯɺSLIP͸ਫ਼౓Λେ෯ʹ޲্ͤ͞Δ͜ͱ͕Θ͔Γ·͠ ͨɻ͞ΒʹɼϞσϧαΠζɼֶशεέδϡʔϧɼࣄલֶशσʔληοτΛม࣮͑ͯݧΛߦ͍ɼ͜ͷ݁Ռ Λݕূ͠·ͨ͠ɽͦͷ݁ՌɼSLIP͸ɼࣗݾεʔύʔϏδϣϯʢઢܗਫ਼౓8.1%૿ʣͱݴޠεʔύʔϏδϣ ϯʢθϩγϣοτਫ਼౓5.2%૿ʣͷ྆ํͷར఺ΛڗडͰ͖Δ͜ͱ͕Θ͔ͬͨɽ w ໨తɿը૾ͷࣗݾڭࢣֶ͖ͭश͕$-*1Ͱͷݴޠͷࣗݾڭࢣֶ͖ͭशʹ΋ԸܙΛ༩͑Δ͔ௐࠪ w ํ๏ɿ$-*1 ࣗݾڭࢣ෇ֶ͖श 4JN$-3  w ੒Ռθϩγϣοτਫ਼౓͕޲্ͨ͠ w ݻ༗໊ɿ4-*1 w ஶऀॴଐɿ6$#FSLFMFZ 'BDFCPPL"*3FTFBSDI '"*3  http://arxiv.org/abs/2112.12750v1
  18. 1. ϚϧνΫϥε෼ྨΛ༻͍ͨ๏ྫʹ͓͚Δॏཁจͷࣝผ (ݪจ: Critical Sentence Identi fi cation in Legal

    Cases Using Multi-Class Classi fi cation) ๏཯෼໺Ͱ͸ɺςΩετܗࣜͷ๲େͳσʔλؚ͕·Ε͍ͯ·͢ɻͦͷͨΊɺ ͜ͷ෼໺ͷ෼ੳχʔζʹԠ͑ΔͨΊʹ͸ɺࣗવݴޠॲཧʢNatural Language Processing: NLPʣͷద༻͕ඞཁͱͳΓ·͢ɻNLPͷਐา͸ɺ࣮༻Խ΍ֶज़ݚ ڀͷܗͰɺ๏཯෼໺Λ͸͡Ίͱ͢Δ༷ʑͳྖҬʹ޿͕͍ͬͯ·͢ɻ๏཯ͷઐ ໳Ոʹͱͬͯɺૌুʹ͓͚Δॏཁͳจষɺࣄ࣮ɺٞ࿦Λಛఆ͢Δ͜ͱ͸ɺୀ ۶ͳ࡞ۀͰ͢ɻຊݚڀͰ͸ɺϚϧνΫϥε෼ྨͷͨΊͷจຒΊࠐΈͷར༻Λ ݕ౼͠ɺૌুࣄ݅ͷओཁͳ౰ࣄऀͷ؍఺͔Βɺૌুࣄ݅ʹ͓͚ΔॏཁͳจΛ ಛఆ͢Δɻ·ͨɺΧςΰϦʔผͷΫϩεΤϯτϩϐʔଛࣦΛར༻͢Δ͜ͱ Ͱɺਫ਼౓Λ޲্ͤ͞ΔͨΊʹɺλεΫݻ༗ͷଛࣦؔ਺Λఆٛ͠·͢ɻ w ໨తɿ๏཯ͷઐ໳Ոͷ๏ྫ෼ੳࢧԉ w ੒Ռɿ๏ྫͷॏཁจΛϚϧνΫϥε෼ྨ͢ΔϞσϧͷ࣮૷ w ํ๏ɿ#&35 ಠࣗଛࣦؔ਺ఆٛ w ݻ༗໊ɿ4FNBOUJD4JNJMBSJUZ4DPSF454 w ஶऀॴଐɿ6OJWFSTJUZPG.PSBUVXB ϞϥτΡϫେֶ http://arxiv.org/abs/2111.05721v2
  19. 2. γϯϋϥޠͷηϯνϝϯτΛٻΊͯγϯϋϥޠͷ౤ߘʹର͢ΔFacebookͷ൓ԠΛ༧ଌ͢Δ (ݪจ: Seeking Sinhala Sentiment: Predicting Facebook Reactions of

    Sinhala Posts) FacebookͷωοτϫʔΫͰ͸ɺϢʔβʔ͕ςΩετʹର͢Δ൓ԠΛɺײ৘ͷྨܕԽʹΑͬͯه࿥͢Δ ͜ͱ͕Ͱ͖·͢ɻ͜ͷωοτϫʔΫ͸ɺେن໛Ͱ͋ΔͨΊɺ஫ऍ෇͖ͷηϯνϝϯτσʔλͷओཁͳ σʔληοτͱͳ͍ͬͯ·͢ɻຊ࿦จͰ͸ɺεϦϥϯΧͷจ຺Λத৺ͱͨ͠10೥෼ͷFacebookͷ౤ ߘσʔλ͔ΒಘΒΕͨ਺ඦສͷ൓ԠΛ༻͍ͯɺΦϯϥΠϯͷγϯϋϥޠςΩετίϯςϯπͷηϯν ϝϯτݕग़ʹର͢ΔʮݟΔਓͷ໨ʯͷΞϓϩʔνΛϞσϧԽ͢Δɻ3छྨͷηϯνϝϯτ෼ੳϞσϧ ͕ߏங͞Ε͓ͯΓɺϦΞΫγϣϯͷݶఆ͞Εͨαϒηοτɺ͢΂ͯͷϦΞΫγϣϯɺϙδςΟϒ/ωΨ ςΟϒͷ੕ධՁ஋Λಋ͖ग़͢Ϟσϧ͕ߟྀ͞Ε͍ͯ·͢ɻͦͯ͠ɺ͜ΕΒͷϞσϧ͕؍࡯ऀͷ൓ԠΛ ଊ͑Δͷʹ༗ޮͰ͋Δ͔Ͳ͏͔Λܭࢉ͠ɺٞ࿦ͨ͠ɻ෼ੳͷ݁ՌɺγϯϋϥޠͷίϯςϯπͰ͸ɺϦ ΞΫγϣϯͷೋ஋෼ྨ͕ଞͷΞϓϩʔνʹൺ΂ͯஶ͘͠ਖ਼֬Ͱ͋Δ͜ͱ͕Θ͔ͬͨɻ͞ΒʹɺࣅͨΑ ͏ͳϦΞΫγϣϯΛؚΊΔͱɺଞͷϦΞΫγϣϯΛਖ਼֬ʹ༧ଌ͢Δ͜ͱ͕Ͱ͖ͳ͘ͳΓ·͢ɻ w ໨తɿࢿݯ͕ශࠔͳ஍ҬݴޠɺγϯϋϥޠͷࢿݯΪϟοϓΛຒΊΔͨΊͷ࣮ݧతͳࢼΈ w ੒Ռɿγϯϋϥޠͷײ৘༧ଌϞσϧͷ࣮૷ w ํ๏ɿγϯϋϥޠͷ'BDFCPPL౤ߘͷੜͷίʔύεΛֶशσʔλͱͯ͠࢖͏ w ݻ༗໊ɿͳ͠ w ஶऀॴଐɿ6OJWFSTJUZPG.PSBUVXB-*3/&BTJB http://arxiv.org/abs/2112.00468v1
  20. 4. ͋ͳͨͷ࡞඼Λݟ͍ͤͯͩ͘͞ɻεΫϥονύουʹΑΔݴޠϞσϧΛ࢖ͬͨதڃऀ޲͚ܭࢉػ (ݪจ: Show Your Work: Scratchpads for Intermediate Computation

    with Language Models) ֶशࡁΈͷେن໛ͳݴޠϞσϧ͸ɺݱ࣮తͳςΩετͷੜ੒΍ίϯϐϡʔλϓϩά ϥϜͷ߹੒ͳͲɺʮϫϯύεʯͰ࣮ߦͰ͖ΔλεΫͰ͸ඇৗʹߴ͍ੑೳΛൃش͢ Δɻ͔͠͠ɺ੔਺ͷ଍͠ࢉ΍ϓϩάϥϜͷ࣮ߦͳͲɺແݶͷଟஈ֊ܭࢉΛඞཁͱ͢ ΔλεΫͰ͸ۤઓ͠·͢ɻڻ͘΂͖͜ͱʹɺ͜ΕΒͷϞσϧ͸ɺෳࡶͳଟஈ֊ܭࢉ Λɺͨͱ͑਺γϣοτͷྖҬͰ͋ͬͯ΋ɺ్தͷܭࢉ݁ՌΛࣔ͠ͳ͕Βʮεςο ϓɾόΠɾεςοϓʯͰ࣮ߦ͢ΔΑ͏ʹٻΊΒΕΔͱɺ࣮ߦͰ͖Δ͜ͱ͕Θ͔ͬͨɻ ಛʹɺதؒతͳܭࢉεςοϓΛʮεΫϥονύουʯʹग़ྗ͢ΔΑ͏ʹࢦࣔ͢Δ͜ ͱͰɺଟஈ֊ͷܭࢉΛ࣮ߦͰ͖ΔΑ͏ʹτϥϯεϑΥʔϚʔΛ܇࿅͠·͢ɻ௕͍଍͠ ࢉ͔Β೚ҙͷϓϩάϥϜͷ࣮ߦ·ͰɺঃʑʹෳࡶʹͳΔҰ࿈ͷλεΫʹ͓͍ͯɺεΫ ϥονύου͕ݴޠϞσϧͷଟஈ֊ܭࢉͷೳྗΛܶతʹ޲্ͤ͞Δ͜ͱΛࣔͨ͠ɻ w ໨తɿϓϩάϥϛϯάݴޠͳͲͷܭࢉॲཧͷλεΫֶशվળ w ํ๏ɿεΫϥονύουʹεςοϓͣͭதؒͷ݁ՌΛࣔͯ͠5SBOTGPSNFSΛֶशͤ͞Δ w ੒ՌɿҰ౓ʹ݁ՌΛਪଌ͠Α͏ͱࣦͯ͠ഊ͢ΔιʔείʔυͰ΋εςοϓόΠεςοϓͳΒਖ਼ ࣮͘͠ߦͰ͖Δ͜ͱ͕Θ͔ͬͨ w ݻ༗໊ɿͳ͠ w ஶऀॴଐɿ.*5(PPHMF3FTFBSDI #SBJO5FBN#MVFTIJGU5FBN  http://arxiv.org/abs/2112.00114v1
  21. 5. BANMo: ଟ͘ͷΧδϡΞϧϏσΦ͔ΒΞχϝʔγϣϯՄೳͳ3DਆܦϞσϧΛߏங͢Δ (ݪจ: BANMo: Building Animatable 3D Neural Models

    from Many Casual Videos) ଟؔઅܕͷ3࣍ݩܗঢ়෮ݩͷͨΊͷઌߦݚڀ͸ɺଟ͘ͷ৔߹ɺಛघͳηϯαʔʢྫɿಉظͨ͠ϚϧνΧϝϥγεςϜʣ΍ɺ͋Β ͔͡Ίߏங͞Εͨ3࣍ݩมܗϞσϧʢྫɿSMAL΍SMPLʣʹґଘ͍ͯ͠·͢ɻ͜ͷΑ͏ͳख๏͸ɺࣗવքͷଟ༷ͳ෺ମͷηοτ ʹରԠ͢Δ͜ͱ͕Ͱ͖·ͤΜɻBANMo͸ɺಛघͳηϯαʔ΍ࣄલʹఆٛ͞ΕͨςϯϓϨʔτܗঢ়Λඞཁͱ͠ͳ͍ํ๏Ͱ͢ɻ BANMo͸ɺඍ෼ՄೳͳϨϯμϦϯάϑϨʔϜϫʔΫΛ༻͍ͯɺଟ͘ͷ୯؟ΧδϡΞϧϏσΦ͔Βɺߴ஧࣮౓Ͱؔઅͷ͋Δ3DϞ σϧʢܗঢ়ͱΞχϝʔγϣϯՄೳͳεΩχϯά΢ΣΠτΛؚΉʣΛߏங͠·͢ɻଟ͘ͷϏσΦΛ࢖༻͢Δ͜ͱͰɺΧϝϥϏϡʔ ͱΦϒδΣΫτͷΞʔςΟΩϡϨʔγϣϯΛΑΓଟ͘Χόʔ͢Δ͜ͱ͕Ͱ͖·͕͢ɺഎܠ΍র໌৚݅ͳͲ͕ҟͳΔγʔϯؒͷର Ԡؔ܎Λཱ֬͢Δ͜ͱʹେ͖ͳ՝୊͕͋Γ·͢ɻզʑͷॏཁͳಎ࡯͸ɺʢ1ʣؔઅࠎͱϒϨϯυεΩχϯάΛར༻ͨ͠ݹయతͳ มܗՄೳͳܗঢ়Ϟσϧɺʢ2ʣޯ഑ϕʔεͷ࠷దԽʹదͨ͠ମੵχϡʔϥϧϥδΞϯεϑΟʔϧυʢNeRFʣɺʢ3ʣϐΫηϧͱ ؔઅϞσϧͷؒͷରԠؔ܎Λੜ੒͢Δਖ਼४ຒΊࠐΈɺͱ͍͏3ͭͷྲّྀΛ౷߹͢Δ͜ͱͰ͢ɻຊݚڀͰ͸ɺඍ෼Մೳ͓Αͼ൓స ՄೳͳؔઅมܗΛՄೳʹ͢ΔχϡʔϥϧϒϨϯυεΩχϯάϞσϧΛಋೖͨ͠ɻ͜ͷΑ͏ͳϞσϧΛਖ਼४ຒΊࠐΈͱ૊Έ߹Θͤ Δ͜ͱͰɺϏσΦؒͷີͳରԠؔ܎Λཱ֬͢Δ͜ͱ͕Ͱ͖ɺαΠΫϧҰ؏ੑΛ࣋ͬͨࣗݾڭࢣԽ͕ՄೳͱͳΔɻBANMo͸ɺ࣮ σʔλ͓Αͼ߹੒σʔλʹ͓͍ͯɺਓؒ΍ಈ෺Λର৅ͱͨ͠ઌߦݚڀΑΓ΋ߴ͍஧࣮౓ͷ3D࠶ߏ੒Λࣔ͠ɺ৽͍͠ࢹ఺΍ϙʔζ ͔ΒϦΞϧͳը૾ΛϨϯμϦϯά͢ΔೳྗΛඋ͍͑ͯ·͢ɻϓϩδΣΫτͷ΢Σϒϖʔδɿbanmo-www.github.io w ໨తɿҰൠతͳಈը͔Βɺ̏%ը૾σʔλΛ࡞Δ w ੒Ռɿਓؒͱ࢛଍าߦಈ෺ʢೣͳͲʣͷ̏࣍ݩը૾σʔλΛ࡞ΔϞσϧΛ։ൃͨ͠ w ํ๏ɿಉҰͷਓ෺ɾಈ෺ʹ͍ͭͯͷෳ਺ಈը͔Βͷ%FOTF1PTF$4&ֶश w ݻ༗໊ɿ#"/.P w ஶऀॴଐɿ.FUB"*$BSOFHJF.FMMPO6OJWFSTJUZ.FUB3FBMJUZ-BCT http://arxiv.org/abs/2112.12761v2
  22. 8. ήʔϜͷϓϨΠϠʔ (ݪจ: Player of Games) ήʔϜ͸௕͍ؒɺਓ޻஌ೳͷਐาͷࢦඪͱͯ͠༻͍ΒΕ͖ͯ·ͨ͠ɻ࠷ۙͰ͸ɺ୳ࡧͱֶशΛ༻͍ͨ Ξϓϩʔν͕ɺҰ࿈ͷ׬શ৘ใήʔϜʹ͓͍ͯڧྗͳੑೳΛ͓ࣔͯ͠ΓɺήʔϜཧ࿦తͳਪ࿦ͱֶश Λ༻͍ͨΞϓϩʔν͸ɺಛఆͷෆ׬શ৘ใϙʔΧʔͷมछʹ͓͍ͯڧྗͳੑೳΛ͍ࣔͯ͠ΔɻPlayer of

    Games͸ɺΨΠυ෇͖୳ࡧɺࣗݾֶशɺήʔϜཧ࿦తਪ࿦Λ૊Έ߹Θͤͨɺ͜Ε·ͰͷΞϓϩʔν Λ౷߹ͨ͠൚༻తͳΞϧΰϦζϜΛ঺հ͢ΔɻPlayer of Gamesʯ͸ɺେن໛ͳ׬શɾෆ׬શ৘ใήʔ Ϝʹ͓͍ͯɺܦݧతʹڧྗͳύϑΥʔϚϯεΛୡ੒ͨ͠ॳΊͯͷΞϧΰϦζϜͰ͋Γɺ೚ҙͷ؀ڥʹ ରԠ͢ΔਅͷҙຯͰͷ൚༻ΞϧΰϦζϜʹ޲͚ͨॏཁͳҰาͱͳΔɻ͜ͷΞϧΰϦζϜ͸ɺ೚ҙͷ؀ ڥʹରͯ͠ਅʹҰൠతͳΞϧΰϦζϜΛఏڙ͢ΔͨΊͷॏཁͳεςοϓͰ͋ΔɻPlayer of Games͸ɺ νΣεͱғޟͰڧྗͳੑೳΛൃش͠ɺϔουΞοϓɾϊʔϦϛοτɾςΩαεɾϗʔϧσϜɾϙʔΧʔ Ͱެ։͞Ε͍ͯΔ࠷ڧͷΤʔδΣϯτʢSlumbotʣΛഁΓɺΨΠυ෇͖୳ࡧɺֶशɺήʔϜཧ࿦తਪ ࿦ͷՁ஋Λࣔ͢ෆ׬શ৘ใήʔϜͰ͋ΔScotland Yardͷ࠷ઌ୺ͷΤʔδΣϯτΛഁͬͨɻ w ໨తɿ׬શɾෆ׬શ৘ใήʔϜʹద༻Ͱ͖Δ൚༻ήʔϜΞϧΰϦζϜͷݚڀ w ੒Ռɿ୳ࡧɺֶशɺήʔϜཧ࿦తਪ࿦Λ૊Έ߹Θͤͨ౷ҰΞϧΰϦζϜʮ1P(ʯͷ։ൃ w ํ๏ɿ(5$'3 HSPXJOHUSFFDPVOUFSGBDUVBMSFHSFUNJOJNJ[BUJPO α΢ϯυηϧϑϓϨΠ w ݻ༗໊ɿ1P( 1MBZFSPG(BNFT  w ஶऀॴଐɿ%FFQ.JOE http://arxiv.org/abs/2112.03178v1
  23. 9. NL-Augmenter:λεΫʹԠͨࣗ͡વݴޠ֦ுͷͨΊͷϑϨʔϜϫʔΫ (ݪจ: NL-Augmenter: A Framework for Task-Sensitive Natural Language

    Augmentation) σʔλΦʔάϝϯςʔγϣϯ͸ɺࣗવݴޠॲཧ(NLP)ʹ͓͚ΔϞσϧͷϩόετੑධՁ΍ɺ ֶशσʔλͷଟ༷ੑΛߴΊΔͨΊͷॏཁͳཁૉͰ͋Δɻຊ࿦จͰ͸ɺPythonϕʔεͷࢀՃ ܕࣗવݴޠॲཧϑϨʔϜϫʔΫͰ͋ΔNL-AugmenterΛ঺հ͠·͢ɻ͜ͷϑϨʔϜϫʔΫ ͸ɺม׵ʢσʔλͷमਖ਼ʣͱϑΟϧλʢಛఆͷಛ௃ʹԠͨ͡σʔλͷ෼ׂʣͷ྆ํͷ࡞੒ Λαϙʔτ͠·͢ɻ͜ͷϑϨʔϜϫʔΫͱɺ༷ʑͳࣗવݴޠλεΫͷͨΊͷ117ͷม׵ͱ 23ͷϑΟϧλͷॳظηοτʹ͍ͭͯઆ໌͢Δɻ·ͨɺ͍͔ͭ͘ͷม׵Λ༻͍ͯҰൠతͳࣗ વݴޠϞσϧͷϩόετੑΛ෼ੳ͢Δ͜ͱͰɺNL-Augmenterͷ༗ޮੑΛ࣮ূ͢ΔɻΠϯ ϑϥετϥΫνϟʔɺσʔλΧʔυɺϩόετωε෼ੳ݁Ռ͸ɺNL-AugmenterͷϦϙδ τϦ https://github.com/GEM-benchmark/NL-Augmenter Ͱެ։͞Ε͍ͯ·͢ɻ w ໨తɿࣗવݴޠॲཧͷϩόετੑධՁɾσʔλͷଟ༷ੑΛߴΊΔ w ੒ՌɿࢀՃܕࣗવݴޠॲཧ"VHVNFOUBUJPOϑϨʔϜϫʔΫͷެ։ w ํ๏ɿλεΫʹԠͨ͡ม׵ॲཧ܈ͱɺಛ௃ʹσʔλ෼ׂͷͨΊͷϑΟϧλ܈ͷఏڙ w ݻ༗໊ɿ/-"VHVNFOUFS w ஶऀॴଐɿ(PPHMF#SBJO(PPHMF3FTFBSDIଞଟ਺ ౦ژେֶͳͲ http://arxiv.org/abs/2112.02721v1
  24. 10. FuseDream:CLIP+GANۭؒͷ࠷దԽʹΑΔֶशෆཁͷςΩετը૾ੜ੒γεςϜ (ݪจ: FuseDream: Training-Free Text-to-Image Generation with Improved CLIP+GAN

    Space Optimization) ࣗવݴޠʹΑΔ໋ྩ͔Βը૾Λੜ੒͢Δ͜ͱ͸ɺڵຯਂ͘΋ඇৗʹࠔ೉ͳ՝୊Ͱ͢ɻզʑ͸ɺ࠶ֶश͞ΕͨCLIPදݱͷྗͱط੡ͷը૾ ੜ੒ثʢGANʣΛ૊Έ߹ΘͤΔ͜ͱͰɺςΩετ͔Βը૾΁ͷੜ੒ʹΞϓϩʔν͍ͯ͠·͢ɻGANͷજࡏۭؒͰ࠷దԽΛߦ͍ɺ༩͑ ΒΕͨೖྗςΩετͰ࠷େͷCLIPείΞΛୡ੒͢Δը૾Λݟ͚ͭग़͠·͢ɻςΩετ͔Βը૾΁ͷੜ੒ϞσϧΛθϩ͔Βֶश͢Δैདྷ ͷख๏ͱൺֱͯ͠ɺCLIP+GANͷΞϓϩʔν͸ɺֶशෆཁɺθϩγϣοτͰɺҟͳΔδΣωϨʔλͰ؆୯ʹΧελϚΠζ͢Δ͜ͱ͕Ͱ ͖·͢ɻ ͔͠͠ɺGANۭؒͰCLIPείΞΛ࠷దԽ͢Δ͜ͱ͸ඇৗʹࠔ೉ͳ࠷దԽ໰୊Λ౤͔͚͓͛ͯΓɺAdamͳͲͷط੡ͷΦϓ ςΟϚΠβʔͰ͸ຬ଍ͷ͍݁͘ՌΛಘΔ͜ͱ͕Ͱ͖ͳ͍ɻຊݚڀͰ͸ɺFuseDreamύΠϓϥΠϯΛఏҊ͠ɺCLIP+GANΞϓϩʔνΛ3 ͭͷॏཁͳٕज़Ͱվળ͠·͢ɻ1ʣAugCLIPείΞɿը૾ʹϥϯμϜͳ֦ுΛՃ͑Δ͜ͱͰɺCLIPͷ໨తΛϩόετԽ͢Δɻ2) ࠷దԽ ͷͨΊͷ৽͍͠ॳظԽ͓ΑͼΦʔόʔύϥϝʔλԽઓུʹΑΓɺGANۭؒʹ͓͚Δඇತͷ஍ܗΛޮ཰తʹφϏήʔτ͢Δ͜ͱ͕Ͱ͖ Δɻ3) ৽نͷೋஈ֊࠷దԽํࣜΛར༻ͯ͠ɺෳ਺ͷը૾Λ߹੒͠ɺGANۭؒΛ֦ுͯ͠σʔλόΠΞεΛࠀ෰͢Δ߹੒ੜ੒ٕज़ɻ FuseDream͸ɺҟͳΔೖྗςΩετʹΑͬͯଅਐ͞Εͨ৔߹ɺ༷ʑͳΦϒδΣΫτɺഎܠɺܳज़తελΠϧɺ͞Βʹ͸զʑ͕࢖༻͢Δ GANͷτϨʔχϯάσʔλʹ͸ݱΕͳ͍৽͍͠൓࣮ࡏͷίϯηϓτΛ࣋ͭߴ඼࣭ͷը૾Λੜ੒͢Δ͜ͱ͕Ͱ͖Δɻఆྔతʹ͸ɺ FuseDreamʹΑͬͯੜ੒͞Εͨը૾͸ɺΞʔΩςΫνϟͷઃܭ΍τϨʔχϯάΛ௥Ճ͢Δ͜ͱͳ͘ɺMS COCOσʔληοτͰτοϓ ϨϕϧͷInceptionείΞͱFIDείΞΛಘΔ͜ͱ͕Ͱ͖·͢ɻզʑͷίʔυ͸ https://github.com/gnobitab/FuseDream Ͱެ։͞Ε ͍ͯ·͢ɻ w ໨తɿࣗવݴޠ͔Βͷը૾ੜ੒Ϟσϧ$-*1 ("/ͷվྑ w ੒Ռɿτοϓ.4$0$0σʔληοτͰτοϓϨϕϧͷ*ODFQUJPO '*%είΞΛಘͨ w ํ๏ɿ"VH$-*1είΞɺΦʔόʔύϥϝʔλԽɺೋஈ֊࠷దԽΛ௥Ճ w ݻ༗໊ɿ'VTF%SFBN w ஶऀॴଐɿςΩαεେֶΦʔεςΟϯߍΧϦϑΥϧχΞେֶαϯσΟΤΰߍ http://arxiv.org/abs/2112.01573v1