GLIDE: Towards Photorealistic Image Generation and Editing with Text- Guided Diffusion Models 3. Uni-Perceiver: Pre-training Uni fi ed Architecture for Generic Perception for Zero-shot and Few-shot Tasks 4. Masked Feature Prediction for Self-Supervised Visual Pre-Training 5. Self-attention Does Not Need $O(n^2)$ Memory 6. Mega-NeRF: Scalable Construction of Large-Scale NeRFs for Virtual Fly- Throughs 7. Exploring the Equivalence of Siamese Self-Supervised Learning via A Uni fi ed Gradient Framework 8. Improving language models by retrieving from trillions of tokens 9. BEVT: BERT Pretraining of Video Transformers 10. SLIP: Self-supervision meets Language-Image Pre-training
Cases Using Multi-Class Classi fi cation 2. Seeking Sinhala Sentiment: Predicting Facebook Reactions of Sinhala Posts 3. Plenoxels: Radiance Fields without Neural Networks 4. Show Your Work: Scratchpads for Intermediate Computation with Language Models 5. BANMo: Building Animatable 3D Neural Models from Many Casual Videos 6. GLIDE: Towards Photorealistic Image Generation and Editing with Text- Guided Diffusion Models 7. Ef fi cient Geometry-aware 3D Generative Adversarial Networks 8. Player of Games 9. NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation 10. FuseDream: Training-Free Text-to-Image Generation with Improved CLIP+GAN Space Optimization 1JDL6Q
2. GLIDE: Towards Photorealistic Image Generation and Editing with Text- Guided Diffusion Models 3. Uni-Perceiver: Pre-training Uni fi ed Architecture for Generic Perception for Zero-shot and Few-shot Tasks 4. Masked Feature Prediction for Self-Supervised Visual Pre-Training 5. Self-attention Does Not Need $O(n^2)$ Memory 6. Mega-NeRF: Scalable Construction of Large-Scale NeRFs for Virtual Fly- Throughs 7. Exploring the Equivalence of Siamese Self-Supervised Learning via A Uni fi ed Gradient Framework 8. Improving language models by retrieving from trillions of tokens 9. BEVT: BERT Pretraining of Video Transformers 10. SLIP: Self-supervision meets Language-Image Pre-training CLIP CLIP NeRF NeRF Video Video NLP SSL SSL SSL Transformer Transformer Transformer Transformer
Legal Cases Using Multi-Class Classi fi cation 2. Seeking Sinhala Sentiment: Predicting Facebook Reactions of Sinhala Posts 3. Plenoxels: Radiance Fields without Neural Networks 4. Show Your Work: Scratchpads for Intermediate Computation with Language Models 5. BANMo: Building Animatable 3D Neural Models from Many Casual Videos 6. GLIDE: Towards Photorealistic Image Generation and Editing with Text- Guided Diffusion Models 7. Ef fi cient Geometry-aware 3D Generative Adversarial Networks 8. Player of Games 9. NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation 10. FuseDream: Training-Free Text-to-Image Generation with Improved CLIP+GAN Space Optimization CLIP NeRF Video NLP NLP NLP NLP Transformer GAN GAN NeRF CLIP NeRF
3D Generative Adversarial Networks) ୯؟ͷ2DࣸਅΛ༻͍ͯɺଟ؟తʹ߹ੑͷ͋Δߴ࣭ͳը૾3Dܗঢ়Λڭࢣͳ͠Ͱੜ͢Δ͜ͱ ɺͷ՝Ͱͨ͠ɻطଘͷ3࣍ݩGANɺܭࢉྔ͕ଟ͍͔ɺ3࣍ݩతʹ߹ੑͷͳ͍ۙࣅΛߦ ͏͔ͷ͍ͣΕ͔Ͱ͋Γɺલऀੜ͞ΕΔը૾ͷ࣭ͱղ૾Λ੍ݶ͠ɺޙऀଟࢹͷ߹ੑ ͱܗঢ়ͷ࣭ʹѱӨڹΛ༩͑ΔɻຊݚڀͰɺ͜ΕΒͷۙࣅʹաʹґଘ͢Δ͜ͱͳ͘ɺ3D GAN ͷܭࢉޮͱը࣭Λ্ͤ͞Δɻ͜ͷతͷͨΊʹɺզʑදݱྗ๛͔ͳ໌ࣔత-ඇ໌ࣔతϋΠϒ ϦουωοτϫʔΫΞʔΩςΫνϟΛಋೖ͠ɺଞͷઃܭ্ͷબͱ߹Θͤͯɺߴղ૾ͷଟࢹҰ ؏ੑͷ͋Δը૾ΛϦΞϧλΠϜͰ߹͢Δ͚ͩͰͳ͘ɺߴ࣭ͷ3Dܗঢ়Λੜ͠·͢ɻಛྔͷ ੜͱχϡʔϥϧϨϯμϦϯάΛΓ͢͜ͱͰɺզʑͷϑϨʔϜϫʔΫStyleGAN2ͷΑ͏ͳ࠷ ઌͷ2D CNNδΣωϨʔλΛ׆༻͠ɺͦͷޮੑͱදݱྗΛड͚ܧ͙͜ͱ͕Ͱ͖·͢ɻFFHQͱ AFHQ CatsΛ༻͍࣮ͨݧͳͲʹΑΓɺ࠷ઌͷ3DରԠ߹Λ࣮ূ͍ͯ͠·͢ɻ w తɿ%ࣸਅ͔Β࣍ݩΛߟྀͨ͠%ࣸਅΛੜ͢Δ("/ͷੑೳ্ w ՌɿਓؒɾಈͷإࣸਅΛཱମతʹॲཧͯ͠ը૾Λੜ͢Δ("/ͷ4P5"Λ։ൃ w ํ๏ɿ4UZMF("/ 7PMVNF3FOEFSFS w ݻ༗໊ɿ&(% & ff i DJFOU(FPNFUSZBXBSF%(FOFSBUJWF"EWFSTBSJBM/FUXPSLT w ஶऀॴଐɿ4UBOGPSE6OJWFSTJUZ/7*%*" http://arxiv.org/abs/2112.07945v1
(plenoptic voxels)Λհ͠·͢ɻPlenoxelsɺγʔϯΛٿ໘ௐͷ ͋Δૄͳ3࣍ݩάϦουͱͯ͠දݱ͠·͢ɻ͜ͷදݱɺΩϟϦϒ Ϩʔγϣϯ͞Εͨը૾͔Βɺޯ๏ͱਖ਼ଇԽʹΑͬͯɺਆܦΛ ؚ·ͣʹ࠷దԽ͢Δ͜ͱ͕Ͱ͖·͢ɻඪ४తͳϕϯνϚʔΫλεΫ ʹ͓͍ͯɺPlenoxelsNeural Radiance FieldsΑΓ2ܻҎ্ߴʹ ࠷దԽ͞Εɺࢹ֮తͳ࣭Λଛͳ͏͜ͱ͋Γ·ͤΜɻ w తɿϏϡʔ߹γεςϜͷվળ w Ռɿ/F3' ͱಉਫ਼Ͱɺഒֶ͘शͰ͖ΔϞσϧ͕Ͱ͖ͨ w ํ๏ɿ/F3'ΞϧΰϦζϜΛࢀߟʹඇχϡʔϥϧωοτͰ࠶࣮ͨ͠ 57 4) w ݻ༗໊ɿ1MFOPYFMT QMFOPQUJDWPYFMT w ஶऀॴଐɿ6$#FSLFMFZ http://arxiv.org/abs/2112.05131v1
Generic Perception for Zero-shot and Few-shot Tasks) ಈͷੜֶతͳೳγεςϜɺҟͳΔϞμϦςΟͷใΛ౷߹͠ɺ༷ʑͳλεΫͷͨΊʹಉ࣌ʹॲཧ͢Δ͜ͱͰ ੈքΛೝ͍ࣝͯ͠·͢ɻҰํɺݱࡏͷػցֶशͷݚڀɺλεΫʹಛԽͨ͠ύϥμΠϜʹै͍ͬͯΔͨΊɺλεΫؒ ͷඇޮͳ࿈ܞɺ৽͍͠λεΫͷͨΊͷ֮Ϟσϧͷ։ൃʹ͔͔Δߴ͍ݶքίετʹͭͳ͕͍ͬͯΔɻຊจͰ ɺUni-Perceiverͱ໊͚ΒΕͨ൚༻తͳ֮ΞʔΩςΫνϟΛհ͢ΔɻUni-Perceiverɺ౷Ұ͞ΕͨϞσϦϯά ͱڞ༗ύϥϝʔλͰ༷ʑͳϞμϦςΟͱλεΫΛॲཧ͢Δɻ۩ମతʹɺUni-PerceiverɺҙͷϞμϦςΟ͔Βͷ ҟͳΔλεΫೖྗͱλʔήοτΛɺϞμϦςΟʹͱΒΘΕͳ͍TransformerΤϯίʔμͱܰྔͳϞμϦςΟݻ༗ͷ τʔΫϯԽثΛ༻͍ͯɺ౷Ұ͞ΕͨදݱۭؒʹΤϯίʔυ͠·͢ɻҟͳΔ֮λεΫɺಉ͡ఆࣜԽͱͯ͠ϞσϧԽ ͞Ε·͢ɻͭ·ΓɺͦΕͧΕͷೖྗʹର͢Δ࠷େͷλʔήοτΛɺදݱͷྨࣅੑΛ௨ͯ͠ݟ͚ͭΔͷͰ͢ɻ͜ͷ Ϟσϧɺ͍͔ͭ͘ͷϢχϞʔμϧ͓ΑͼϚϧνϞʔμϧͳλεΫͰࣄલʹֶश͞Εɺࣄલֶशͷஈ֊Ͱొ͠ͳ ͔ͬͨ৽نλεΫΛؚΉɺ͞·͟·ͳԼྲྀλεΫͰධՁ͞ΕΔɻͦͷ݁Ռɺνϡʔχϯάͳ͠Ͱࣄલֶशͨ͠Ϟσϧ ɺ৽نλεΫͰ͋ͬͯଥͳੑೳΛୡͰ͖Δ͜ͱ͕Θ͔ͬͨɻ·ͨɺԼྲྀλεΫͷ1%ͷσʔλʹରͯ͠ਝͳ νϡʔχϯάΛߦ͏͜ͱͰɺ࠷ઌͷख๏ʹ͍ۙϨϕϧ·ͰੑೳΛ্ͤ͞Δ͜ͱ͕Ͱ͖Δɻ͞ΒʹɺϑϧσʔλͰ ͷඍௐΛߦ͏͜ͱͰɺ࠷ઌͷख๏ͱಉҎ্ͷ݁ՌΛಘΔ͜ͱ͕Ͱ͖ΔɻίʔυΛެ։͠·͢ɻ w తɿ5SBOTGPSNFSͷϚϧνλεΫֶशɾར༻ͷվળ w Ռɿଟ༷ͳೖग़ྗܗࣜʹରԠͨ͠5SBOTGPSNFSʮ6OJ1FSDFJWFSʯͷ։ൃ w ํ๏ɿ5SBOTGPSNFS ϞμϦςΟݻ༗ͷτʔΫϯԽث w ݻ༗໊ɿ6OJ1FSDFJWFS w ஶऀॴଐɿ4FOTF5JNF3FTFBSDI҆ަ௨େֶ߳ߓதจେֶ http://arxiv.org/abs/2112.01522v1
զʑɺϏσΦϞσϧͷࣗݾڭࢣ͖ࣄલֶशͷͨΊͷMasked Feature Prediction (MaskFeat)Λൃද͢ Δɻຊख๏Ͱɺ·ͣɺೖྗγʔέϯεͷҰ෦ΛϥϯμϜʹϚεΫ͠ɺ࣍ʹɺϚεΫ͞ΕͨྖҬͷಛ Λ༧ଌ͢Δɻ5छྨͷಛྔΛݕ౼ͨ݁͠Ռɼख࡞Γͷಛྔهड़ࢠͰ͋ΔHistograms of Oriented GradientsʢHOGʣ͕ɼੑೳͱޮͷ྆໘Ͱಛʹ༏Ε͍ͯΔ͜ͱ͕Θ͔ͬͨɽ·ͨɺHOGͷہॴతͳί ϯτϥετͷਖ਼نԽɺྑͳ݁ՌΛಘΔͨΊʹෆՄܽͰ͋Γɺ͜ΕɺHOGΛࢹ֮ೝࣝʹ༻͍ͨҎલ ͷݚڀͱಉ༷Ͱ͋Δ͜ͱ͕Θ͔ͬͨɻզʑͷΞϓϩʔνɺ๛ͳࢹ֮తࣝΛֶश͠ɺେنͳ TransformerϕʔεͷϞσϧΛۦಈ͢Δ͜ͱ͕Ͱ͖·͢ɻϞσϧͷॏΈࢹΛՃ͢Δ͜ͱͳ͘ɺϥϕ ϧͷͳ͍ϏσΦͰࣄલֶश͞ΕͨMaskFeatɺKinetics-400ͷMViT-LͰ86.7%ɺKinetics-600Ͱ 88.3%ɺKinetics-700Ͱ80.4%ɺAVAͰ38.8mAPɺSSv2Ͱ75.0%ͱ͍͏ɺ͜Ε·Ͱʹͳ͍݁ՌΛୡ͠ ͨɻMaskFeat͞Βʹɺ1ϑϨʔϜͷಈըͱղऍͰ͖Δը૾ೖྗʹҰൠԽ͠ɺImageNetͰڝ૪ྗͷ ͋Δ݁ՌΛಘ͍ͯ·͢ɻ w తɿϏσΦΛରʹͨ͠ϚεΩϯάը૾ิͷֶश w Ռɿࣗݾڭࢣ͖ͭϏσΦֶशํ๏.BTL'FBUͷ։ൃ w ํ๏ɿϚεΫ͞Εͨίϯςϯπͷಛ )0( Λճؼͯ͠ࣄલֶश͢Δ w ݻ༗໊ɿ.BTL'FBU .BTLFE'FBUVSF1SFEJDUJPO w ஶऀॴଐɿ'BDFCPPL"*3FTFBSDI+PIOT)PQLJOT6OJWFSTJUZ http://arxiv.org/abs/2112.09133v1
ຊจͰɺྻͷ͞ʹରͯ͠O1 ͷϝϞϦΛඞཁͱ͢Δඇৗʹ୯७ͳҙͷΞϧΰϦζ ϜͱɺOn2ͷϝϞϦΛඞཁͱ͢Δࣗݾҙͷ֦ுΛࣔ͠·͢ɻ͜Εɺࣗݾҙ͕On2ͷ ϝϞϦΛඞཁͱ͢ΔͱΑ͘ݴΘΕΔͷͱରরతͰ͢ɻ࣌ؒతͳෳࡶ͞ґવͱͯ͠On2Ͱ ͕͢ɺ࠷ۙͷՃثͰɺܭࢉೳྗͰͳ͘σόΠεͷϝϞϦ੍͕ݶཁҼͱͳΔ͜ͱ͕Α͘ ͋Γ·͢ɻͦͷͨΊɺΞςϯγϣϯʹඞཁͳϝϞϦྔΛݮΒ͢͜ͱͰɺଞͷํ๏Ͱ࣮ݱͰ ͖ͳ͍Α͏ͳ͍γʔέϯεͷॲཧ͕ՄೳʹͳΓ·͢ɻຊݚڀͰɺΞΫηϥϨʔλ༻ͷ࣮ ༻తͳ࣮Λఏڙ͠·͢ɻ͜ͷ࣮ͰɺO√nͷϝϞϦΛඞཁͱ͠ɺతʹ҆ఆ͓ͯ͠ Γɺඪ४తͳΞςϯγϣϯͷ࣮ͷϥϯλΠϜͷύʔηϯτҎʹऩ·͍ͬͯ·͢ɻ· ͨɺϝϞϦޮΛҡ࣋͠ͳ͕ΒؔΛඍ͢Δํ๏Λࣔ͠·͢ɻྻ16384ʹରͯ͠ɺࣗ ݾͷϝϞϦΦʔόʔϔουɺਪͰ59ഒɺඍͰ32ഒʹݮ͞Εͨɻ w తɿ4FMGBUUFOUJPOϞσϧͷলྗԽ w Ռɿ5SBOTGPSNFSͷϝϞϦඞཁྔΛOͷ͔Β㲋ʹݮ w ํ๏ɿ5SBOTGPSNFSͷϝϞϦίετ͕ߴ͍࣮Λ෦తʹมߋͨ͠ w ݻ༗໊ɿͳ͠ w ஶऀॴଐɿ(PPHMF3FTFBSDI http://arxiv.org/abs/2112.05682v2
for Virtual Fly-Throughs) ຊݚڀͰɺओʹυϩʔϯσʔλ͔Βऩूͨ͠ɺϏϧ֗۠ʹ·͕ͨΔେنͳϏδϡΞϧΩϟϓνϟ͔ΒɺχϡʔϥϧϥδΞ ϯεϑΟʔϧυʢNeRFʣΛ׆༻ͯ͠ɺΠϯλϥΫςΟϒͳ3DڥΛߏங͢Δํ๏Λݕ౼͍ͯ͠·͢ɻैདྷɺNeRF͕ධՁ͞Εͯ ͖ͨ୯Ұମͷγʔϯͱରরతʹɺ͜ͷઃఆͰɺ(1)γʔϯͷখ͞ͳαϒηοτ͔͠ଊ͍͑ͯͳ͍ɺর໌݅ͷҟͳΔԿઍ ͷը૾ΛऔΓࠐΉඞཁ͕͋Δɺ(2)୯ҰͷGPUͰૉʹֶशͰ͖ΔൣғΛ͑ͨɺ๏֎ʹߴ͍Ϟσϧ༰ྔͱϨΠαϯϓϦϯάͷ ཁٻ͕͋Δɺ(3)ҙͷͷՄೳͳࢹ͕͋ΔͨΊɺ(ϦΞϧλΠϜNeRFϨϯμϥʔ͕௨ৗߦ͏Α͏ʹ)ͯ͢ͷؔ࿈ใΛࣄલʹ ܭࢉ͢Δ͜ͱෆՄೳͰ͋Δɺͱ͍ͬͨෳͷ՝͕͋Γ·͢ɻ͜ΕΒͷ՝Λղܾ͢ΔͨΊʹɺ·ͣɺେنͳγʔϯͷՄࢹ ੑ౷ܭΛੳ͠ɺύϥϝʔλ͕γʔϯͷҟͳΔྖҬʹಛԽ͞Ε͍ͯΔૄͳωοτϫʔΫߏͷಈػ͚Λߦ͍·͢ɻ͞Βʹɺγ ϯϓϧͳزԿֶతΫϥελϦϯάΞϧΰϦζϜΛಋೖ͠ɺτϨʔχϯάը૾ʢͱ͍͏ΑΓϐΫηϧʣΛɺฒྻʹτϨʔχϯά Ͱ͖ΔҟͳΔNeRFαϒϞδϡʔϧʹׂ͠·͢ɻQuad 6kσʔληοτɺUrbanScene3Dσʔληοτɺ͓Αͼզʑͷυϩʔ ϯө૾͔Βऔಘͨ͠γʔϯΛରʹɺզʑͷΞϓϩʔνΛධՁͨ͠ͱ͜ΖɺฏۉͰPSNRΛ11ˋҎ্্ͤ͞ͳ͕Βɺֶश Λ3ഒʹ্ͤ͞Δ͜ͱ͕Ͱ͖·ͨ͠ɻଓ͍ͯɺMega-NeRFʹՃ͑ͯ࠷ۙͷNeRFߴϨϯμϥʔͷ࣮ূධՁΛߦ͍ɺ࣌ؒతͳ ίώʔϨϯεΛར༻ͨ͠৽͍͠ख๏Λհ͠·͢ɻզʑͷख๏ɺPSNR࣭Λ0.5dbҎʹ͑ͳ͕ΒɺैདྷͷNeRFϨϯμϦ ϯάʹൺͯ40ഒͷߴԽΛୡ͠ɺطଘͷߴϨϯμϥʔͷ࣮Λ্ճΔ݁Ռͱͳͬͨɻ w తɿΠϯλϥΫςΟϒͳେن/F3'%ڥͷߏங w Ռɿ/F3' ͱಉੑೳͰ̏ഒ͍େنۭ͚ؒ/F3'Λ։ൃ w ํ๏ɿυϩʔϯө૾σʔληοτΛରʹׂۭؒͯ͠ฒྻͰ/F3'ֶशͨ͠ w ݻ༗໊ɿ.FHB/F3' w ஶऀॴଐɿ$BSOFHJF.FMMPO6OJWFSTJUZ"SHP"* ΧϦϑΥϧχΞͷࣗಈӡసελʔτΞοϓ http://arxiv.org/abs/2112.10703v1
Self-Supervised Learning via A Uni fi ed Gradient Framework) ࣗݾڭࢣֶ͖शɺਓؒͷΞϊςʔγϣϯͳ͠Ͱڧྗͳࢹ֮දݱΛநग़͢ΔͨΊͷେ͖ͳՄೳੑΛ͍ࣔͯ͠Δɻࣗ ݾڭࢣֶ͖शΛ༷ʑͳ؍͔Βѻ͏ͨΊʹɺ༷ʑͳ࡞͕ఏҊ͞Ε͍ͯΔɻ(1) ରൺֶश๏ʢMoCo, SimCLRͳͲʣ ɺֶशͷํੑΛܾΊΔͨΊʹਖ਼ෛ྆ํͷαϯϓϧΛར༻͢Δɻ(2) ඇରশωοτϫʔΫ๏ʢBYOL, SimSiamͳ Ͳʣɺ༧ଌωοτϫʔΫͷಋೖͱఀࢭޯૢ࡞ʹΑͬͯෛͷαϯϓϧΛऔΓআ͘ɻ(3) ಛ০๏ʢBarlow Twins, VICRegͳͲʣɺಛ࣍ݩؒͷੑΛݮΒ͢͜ͱΛతͱ͍ͯ͠Δɻ͜ΕΒͷख๏ɺ༷ʑͳಈػ͔Βɺઃܭ͞ Εͨଛࣦ͕͔ؔͳΓҟͳ͍ͬͯΔΑ͏Ͱ͢ɻ·ͨɺ࠷ऴతͳਫ਼༷ʑͰɺ࡞ʹΑͬͯҟͳΔωοτϫʔΫτ ϦοΫ͕ར༻͞Ε͍ͯ·͢ɻຊݚڀͰɺ͜ΕΒͷख๏͕ಉ͡ܗࣜʹ౷ҰͰ͖Δ͜ͱΛࣔ͠·͢ɻͦΕͧΕͷଛࣦؔ Λൺֱ͢ΔͷͰͳ͘ɺޯੳʹΑͬͯ౷Ұ͞ΕͨࣜΛಋ͖ग़͠·͢ɻ͞ΒʹɺެฏͰৄࡉͳ࣮ݧΛߦ͍ɺ྆ऀ ͷੑೳΛൺֱ͠·ͨ͠ɻͦͷ݁Ռɺ͜ΕΒͷख๏ͷؒʹ΄ͱΜͲΪϟοϓ͕ͳ͘ɺϞϝϯλϜΤϯίʔμͷ༻͕ ੑೳΛ্ͤ͞ΔॏཁͳཁૉͰ͋Δ͜ͱ͕Θ͔ͬͨɻ͜ͷ౷Ұ͞ΕͨϑϨʔϜϫʔΫ͔ΒɺզʑUniGradΛఏҊ͠ ·͢ɻUniGradɺࣗݾڭࢣֶ͖शͷͨΊͷγϯϓϧͰޮՌతͳޯܗࣜͰ͢ɻUniGradɺϝϞϦόϯΫ༧ଌ ωοτϫʔΫΛඞཁͱ͠ͳ͍͕ɺ࠷ઌͷੑೳΛୡ͢Δ͜ͱ͕Ͱ͖ɺଞͷֶशઓུΛ༰қʹ࠾༻͢Δ͜ͱ͕Ͱ͖ Δɻ·ͨɺઢܗධՁଟ͘ͷԼྲྀλεΫͰͷൣͳ࣮ݧʹΑΓɺͦͷ༗ޮੑ͕ࣔ͞Ε͍ͯ·͢ɻίʔυΛެ։͢Δɻ w తɿࣗݾڭࢣֶ͖ͭशΞϧΰϦζϜͷ౷ҰԽ w ํ๏ɿ֤ࣗݾڭࢣֶ͖शख๏Λੳ w Ռɿࣗݾڭࢣֶ͖ͭशͷͨΊͷڞ௨ͷࣜΛಋ͖ग़ͯ͠ɺطଘͱ΄΅ಉͷਫ਼ͩͬͨ w ݻ༗໊ɿ6OJ(SBE w ஶऀॴଐɿਗ਼՚େֶ4FOTF5JNF3FTFBSDIᔳߐେֶژਓೳݚڀӃ http://arxiv.org/abs/2112.05141v1
of tokens) େنίʔύε͔Βݕࡧ͞ΕͨจॻνϟϯΫΛɺઌߦ͢ΔτʔΫϯͱͷہॴతͳྨࣅੑʹج͍ͮ ͚ͯ݅͢Δ͜ͱͰɺࣗಈճؼܕݴޠϞσϧΛڧԽ͢Δɻ2ஹݸͷτʔΫϯσʔλϕʔεΛ༻ ͍ͯɺզʑͷRetrieval-Enhanced Transformer (RETRO)ɺ25ഒগͳ͍ύϥϝʔλΛ༻͍͍ͯΔ ʹ͔͔ΘΒͣɺthe PileσʔληοτͰGPT-3Jurassic-1ͱಉͷੑೳΛಘΔ͜ͱ͕Ͱ͖ͨɻ RETROͷੑೳɺඍௐͷޙɺ࣭ԠͷΑ͏ͳԼྲྀͷࣝूܕͷλεΫʹม͞Ε·͢ɻ RETROfrozen Bert retrieverɺdifferential encoderɺchunked cross-attentionػߏΛΈ߹Θ ͤͯɺֶश࣌ʹ௨ৗফඅ͞ΕΔσʔλΑΓܻҧ͍ʹଟ͘ͷσʔλʹج͍ͮͯτʔΫϯΛ༧ଌ͠ ·͢ɻRETRO௨ৗεΫϥον͔Βֶश͠·͕͢ɺࣄલʹֶशͨ͠มثΛݕࡧ͠ͳ͕Βਝʹ RETRO fi t͢Δ͜ͱͰ͖ɺྑͳੑೳΛಘΔ͜ͱ͕Ͱ͖·͢ɻࢲͨͪͷݚڀɺ͜Ε·Ͱʹͳ͍ نͷ໌ࣔతͳهԱʹΑͬͯݴޠϞσϧΛվળ͢ΔͨΊͷ৽ͨͳಓΛ։͘ͷͰ͢ɻ w తɿ(15ɾ+VSBTTJDͷΑ͏ͳࣗಈճؼܕݴޠϞσϧͷվળ w ՌɿطଘͷࣗવݴޠֶशϞσϧ405"ͱಉੑೳͰഒܰྔͳ3&530Λ࣮ w ํ๏ɿGSP[FO#FSUSFUSJFWFS EJ ff FSFOUJBMFODPEFS DIVOLFEDSPTTBUUFOUJPO w ݻ༗໊ɿ3&530 3FUSJFWBM&OIBODFE5SBOTGPSNFS w ஶऀॴଐɿ%FFQ.JOE http://arxiv.org/abs/2112.04426v1
Pretraining of Video Transformers) ຊจɺϏσΦมثͷBERTࣄલֶशʹ͍ͭͯݚڀ͍ͯ͠·͢ɻ͜Ε؆୯ͳ͜ͱͰ͕͢ɺ࠷ۙͷը૾มͷBERT ࣄલֶशͷޭΛߟ͑Δͱɺݚڀ͢ΔՁͷ͋Δ֦ுͰ͢ɻຊจͰɺϏσΦදݱֶशΛۭؒදݱֶशͱ࣌ؒతμΠ φϛΫεֶशʹ͢ΔBEVTΛಋೖ͠·͢ɻ۩ମతʹɺBEVTɺ·ͣը૾σʔλʹରͯ͠ϚεΩϯά͞Εͨը૾Ϟ σϦϯάΛߦ͍ɺ࣍ʹϏσΦσʔλʹରͯ͠ϚεΩϯά͞ΕͨϏσΦϞσϦϯάͱಉ࣌ʹϚεΩϯά͞Εͨը૾ϞσϦ ϯάΛߦ͍·͢ɻ͜ͷઃܭͷಈػɺ࣍ͷ2ͭͷʹ͋Γ·͢ɻ1) ը૾σʔλͰֶश͞ΕͨมثɼదͳۭؒϓϦʔ ΞΛఏڙ͠ɼεΫϥονͰֶश͞Εͨ߹ʹ͠͠ܭࢉෛՙ͕͔͔ΔϏσΦมثͷֶशΛ༰қʹ͢Δ͜ͱ͕Ͱ͖ Δɽ 2) ਖ਼͍͠༧ଌΛߦ͏ͨΊʹඞཁͳࣝผతͳख͕͔Γɼ͢ͳΘۭͪؒత͓Αͼ࣌ؒతͳใɼΫϥε͓ΑͼΫϥ εؒͷมಈ͕େ͖͍ͨΊɼҟͳΔϏσΦؒͰมԽ͢ΔɽBEVT͕ඇৗʹ༗ͳ݁ՌΛಘͨ3ͭͷνϟϨϯδϯάͳϏσΦ ϕϯνϚʔΫͰɺൣғͳ࣮ݧΛߦ͍·ͨ͠ɻKinetics 400Ͱɺೝࣝओʹࣝผతͳۭؒදݱʹґଘ͓ͯ͠ΓɺBEVT ڧྗͳڭࢣ͖ϕʔεϥΠϯͱಉͷ݁ՌΛୡ͠·ͨ͠ɻ·ͨɺSomething-Something-V2ͱDiving 48Ͱɺ࣌ؒ తͳμΠφϛΫεʹґଘ͢ΔϏσΦΛରͱ͍ͯ͠·͕͢ɺBEVTଞͷͯ͢ͷϕʔεϥΠϯΑΓ໌Β͔ʹ༏Ε͓ͯ ΓɺͦΕͧΕ70.6%ͱ86.7%ͷτοϓ1ਫ਼ͱ͍͏࠷ઌͷੑೳΛୡ͠·ͨ͠ɻ w తɿ#&35ͰϏσΦม͢Δݚڀ w Ռɿ4PNFUIJOH4PNFUIJOH7ͱ%JWJOHͰτοϓਫ਼Λୡ w ํ๏ɿը૾ϚεΩϯά#&35ϞσϧͱϏσΦϚεΩϯά#&35ϞσϧΛڠௐֶͤͯ͞श w ݻ༗໊ɿ#&75 w ஶऀॴଐɿ෮୴େֶίϯ ピ ϡʔλαΠΤϯεֶ෦্ւೳใॲཧΩʔϥϘ.JDSPTPGU$MPVE "* http://arxiv.org/abs/2112.01529v1
Cases Using Multi-Class Classi fi cation) ๏ͰɺςΩετܗࣜͷେͳσʔλؚ͕·Ε͍ͯ·͢ɻͦͷͨΊɺ ͜ͷͷੳχʔζʹԠ͑ΔͨΊʹɺࣗવݴޠॲཧʢNatural Language Processing: NLPʣͷద༻͕ඞཁͱͳΓ·͢ɻNLPͷਐาɺ࣮༻Խֶज़ݚ ڀͷܗͰɺ๏Λ͡Ίͱ͢Δ༷ʑͳྖҬʹ͕͍ͬͯ·͢ɻ๏ͷઐ Ոʹͱͬͯɺૌুʹ͓͚Δॏཁͳจষɺࣄ࣮ɺٞΛಛఆ͢Δ͜ͱɺୀ ۶ͳ࡞ۀͰ͢ɻຊݚڀͰɺϚϧνΫϥεྨͷͨΊͷจຒΊࠐΈͷར༻Λ ݕ౼͠ɺૌুࣄ݅ͷओཁͳࣄऀͷ؍͔Βɺૌুࣄ݅ʹ͓͚ΔॏཁͳจΛ ಛఆ͢Δɻ·ͨɺΧςΰϦʔผͷΫϩεΤϯτϩϐʔଛࣦΛར༻͢Δ͜ͱ Ͱɺਫ਼Λ্ͤ͞ΔͨΊʹɺλεΫݻ༗ͷଛࣦؔΛఆٛ͠·͢ɻ w తɿ๏ͷઐՈͷ๏ྫੳࢧԉ w Ռɿ๏ྫͷॏཁจΛϚϧνΫϥεྨ͢ΔϞσϧͷ࣮ w ํ๏ɿ#&35 ಠࣗଛࣦؔఆٛ w ݻ༗໊ɿ4FNBOUJD4JNJMBSJUZ4DPSF454 w ஶऀॴଐɿ6OJWFSTJUZPG.PSBUVXB ϞϥτΡϫେֶ http://arxiv.org/abs/2111.05721v2
Sinhala Posts) FacebookͷωοτϫʔΫͰɺϢʔβʔ͕ςΩετʹର͢ΔԠΛɺײͷྨܕԽʹΑͬͯه͢Δ ͜ͱ͕Ͱ͖·͢ɻ͜ͷωοτϫʔΫɺେنͰ͋ΔͨΊɺऍ͖ͷηϯνϝϯτσʔλͷओཁͳ σʔληοτͱͳ͍ͬͯ·͢ɻຊจͰɺεϦϥϯΧͷจ຺Λத৺ͱͨ͠10ͷFacebookͷ ߘσʔλ͔ΒಘΒΕͨඦສͷԠΛ༻͍ͯɺΦϯϥΠϯͷγϯϋϥޠςΩετίϯςϯπͷηϯν ϝϯτݕग़ʹର͢ΔʮݟΔਓͷʯͷΞϓϩʔνΛϞσϧԽ͢Δɻ3छྨͷηϯνϝϯτੳϞσϧ ͕ߏங͞Ε͓ͯΓɺϦΞΫγϣϯͷݶఆ͞Εͨαϒηοτɺͯ͢ͷϦΞΫγϣϯɺϙδςΟϒ/ωΨ ςΟϒͷධՁΛಋ͖ग़͢Ϟσϧ͕ߟྀ͞Ε͍ͯ·͢ɻͦͯ͠ɺ͜ΕΒͷϞσϧ͕؍ऀͷԠΛ ଊ͑Δͷʹ༗ޮͰ͋Δ͔Ͳ͏͔Λܭࢉ͠ɺٞͨ͠ɻੳͷ݁ՌɺγϯϋϥޠͷίϯςϯπͰɺϦ ΞΫγϣϯͷೋྨ͕ଞͷΞϓϩʔνʹൺͯஶ͘͠ਖ਼֬Ͱ͋Δ͜ͱ͕Θ͔ͬͨɻ͞ΒʹɺࣅͨΑ ͏ͳϦΞΫγϣϯΛؚΊΔͱɺଞͷϦΞΫγϣϯΛਖ਼֬ʹ༧ଌ͢Δ͜ͱ͕Ͱ͖ͳ͘ͳΓ·͢ɻ w తɿࢿݯ͕ශࠔͳҬݴޠɺγϯϋϥޠͷࢿݯΪϟοϓΛຒΊΔͨΊͷ࣮ݧతͳࢼΈ w Ռɿγϯϋϥޠͷײ༧ଌϞσϧͷ࣮ w ํ๏ɿγϯϋϥޠͷ'BDFCPPLߘͷੜͷίʔύεΛֶशσʔλͱͯ͠͏ w ݻ༗໊ɿͳ͠ w ஶऀॴଐɿ6OJWFSTJUZPG.PSBUVXB-*3/&BTJB http://arxiv.org/abs/2112.00468v1
with Language Models) ֶशࡁΈͷେنͳݴޠϞσϧɺݱ࣮తͳςΩετͷੜίϯϐϡʔλϓϩά ϥϜͷ߹ͳͲɺʮϫϯύεʯͰ࣮ߦͰ͖ΔλεΫͰඇৗʹߴ͍ੑೳΛൃش͢ Δɻ͔͠͠ɺͷ͠ࢉϓϩάϥϜͷ࣮ߦͳͲɺແݶͷଟஈ֊ܭࢉΛඞཁͱ͢ ΔλεΫͰۤઓ͠·͢ɻڻ͖͘͜ͱʹɺ͜ΕΒͷϞσϧɺෳࡶͳଟஈ֊ܭࢉ Λɺͨͱ͑γϣοτͷྖҬͰ͋ͬͯɺ్தͷܭࢉ݁ՌΛࣔ͠ͳ͕Βʮεςο ϓɾόΠɾεςοϓʯͰ࣮ߦ͢ΔΑ͏ʹٻΊΒΕΔͱɺ࣮ߦͰ͖Δ͜ͱ͕Θ͔ͬͨɻ ಛʹɺதؒతͳܭࢉεςοϓΛʮεΫϥονύουʯʹग़ྗ͢ΔΑ͏ʹࢦࣔ͢Δ͜ ͱͰɺଟஈ֊ͷܭࢉΛ࣮ߦͰ͖ΔΑ͏ʹτϥϯεϑΥʔϚʔΛ܇࿅͠·͢ɻ͍͠ ࢉ͔ΒҙͷϓϩάϥϜͷ࣮ߦ·ͰɺঃʑʹෳࡶʹͳΔҰ࿈ͷλεΫʹ͓͍ͯɺεΫ ϥονύου͕ݴޠϞσϧͷଟஈ֊ܭࢉͷೳྗΛܶతʹ্ͤ͞Δ͜ͱΛࣔͨ͠ɻ w తɿϓϩάϥϛϯάݴޠͳͲͷܭࢉॲཧͷλεΫֶशվળ w ํ๏ɿεΫϥονύουʹεςοϓͣͭதؒͷ݁ՌΛࣔͯ͠5SBOTGPSNFSΛֶशͤ͞Δ w ՌɿҰʹ݁ՌΛਪଌ͠Α͏ͱࣦͯ͠ഊ͢ΔιʔείʔυͰεςοϓόΠεςοϓͳΒਖ਼ ࣮͘͠ߦͰ͖Δ͜ͱ͕Θ͔ͬͨ w ݻ༗໊ɿͳ͠ w ஶऀॴଐɿ.*5(PPHMF3FTFBSDI #SBJO5FBN#MVFTIJGU5FBN http://arxiv.org/abs/2112.00114v1
from Many Casual Videos) ଟؔઅܕͷ3࣍ݩܗঢ়෮ݩͷͨΊͷઌߦݚڀɺଟ͘ͷ߹ɺಛघͳηϯαʔʢྫɿಉظͨ͠ϚϧνΧϝϥγεςϜʣɺ͋Β ͔͡Ίߏங͞Εͨ3࣍ݩมܗϞσϧʢྫɿSMALSMPLʣʹґଘ͍ͯ͠·͢ɻ͜ͷΑ͏ͳख๏ɺࣗવքͷଟ༷ͳମͷηοτ ʹରԠ͢Δ͜ͱ͕Ͱ͖·ͤΜɻBANMoɺಛघͳηϯαʔࣄલʹఆٛ͞ΕͨςϯϓϨʔτܗঢ়Λඞཁͱ͠ͳ͍ํ๏Ͱ͢ɻ BANMoɺඍՄೳͳϨϯμϦϯάϑϨʔϜϫʔΫΛ༻͍ͯɺଟ͘ͷ୯؟ΧδϡΞϧϏσΦ͔Βɺߴ࣮Ͱؔઅͷ͋Δ3DϞ σϧʢܗঢ়ͱΞχϝʔγϣϯՄೳͳεΩχϯάΣΠτΛؚΉʣΛߏங͠·͢ɻଟ͘ͷϏσΦΛ༻͢Δ͜ͱͰɺΧϝϥϏϡʔ ͱΦϒδΣΫτͷΞʔςΟΩϡϨʔγϣϯΛΑΓଟ͘Χόʔ͢Δ͜ͱ͕Ͱ͖·͕͢ɺഎܠর໌݅ͳͲ͕ҟͳΔγʔϯؒͷର ԠؔΛཱ֬͢Δ͜ͱʹେ͖ͳ՝͕͋Γ·͢ɻզʑͷॏཁͳಎɺʢ1ʣؔઅࠎͱϒϨϯυεΩχϯάΛར༻ͨ͠ݹయతͳ มܗՄೳͳܗঢ়Ϟσϧɺʢ2ʣޯϕʔεͷ࠷దԽʹదͨ͠ମੵχϡʔϥϧϥδΞϯεϑΟʔϧυʢNeRFʣɺʢ3ʣϐΫηϧͱ ؔઅϞσϧͷؒͷରԠؔΛੜ͢Δਖ਼४ຒΊࠐΈɺͱ͍͏3ͭͷྲّྀΛ౷߹͢Δ͜ͱͰ͢ɻຊݚڀͰɺඍՄೳ͓Αͼస ՄೳͳؔઅมܗΛՄೳʹ͢ΔχϡʔϥϧϒϨϯυεΩχϯάϞσϧΛಋೖͨ͠ɻ͜ͷΑ͏ͳϞσϧΛਖ਼४ຒΊࠐΈͱΈ߹Θͤ Δ͜ͱͰɺϏσΦؒͷີͳରԠؔΛཱ֬͢Δ͜ͱ͕Ͱ͖ɺαΠΫϧҰ؏ੑΛ࣋ͬͨࣗݾڭࢣԽ͕ՄೳͱͳΔɻBANMoɺ࣮ σʔλ͓Αͼ߹σʔλʹ͓͍ͯɺਓؒಈΛରͱͨ͠ઌߦݚڀΑΓߴ͍࣮ͷ3D࠶ߏΛࣔ͠ɺ৽͍͠ࢹϙʔζ ͔ΒϦΞϧͳը૾ΛϨϯμϦϯά͢ΔೳྗΛඋ͍͑ͯ·͢ɻϓϩδΣΫτͷΣϒϖʔδɿbanmo-www.github.io w తɿҰൠతͳಈը͔Βɺ̏%ը૾σʔλΛ࡞Δ w Ռɿਓؒͱ࢛าߦಈʢೣͳͲʣͷ̏࣍ݩը૾σʔλΛ࡞ΔϞσϧΛ։ൃͨ͠ w ํ๏ɿಉҰͷਓɾಈʹ͍ͭͯͷෳಈը͔Βͷ%FOTF1PTF$4&ֶश w ݻ༗໊ɿ#"/.P w ஶऀॴଐɿ.FUB"*$BSOFHJF.FMMPO6OJWFSTJUZ.FUB3FBMJUZ-BCT http://arxiv.org/abs/2112.12761v2
GamesɺΨΠυ͖୳ࡧɺࣗݾֶशɺήʔϜཧతਪΛΈ߹Θͤͨɺ͜Ε·ͰͷΞϓϩʔν Λ౷߹ͨ͠൚༻తͳΞϧΰϦζϜΛհ͢ΔɻPlayer of Gamesʯɺେنͳશɾෆશใήʔ Ϝʹ͓͍ͯɺܦݧతʹڧྗͳύϑΥʔϚϯεΛୡͨ͠ॳΊͯͷΞϧΰϦζϜͰ͋Γɺҙͷڥʹ ରԠ͢ΔਅͷҙຯͰͷ൚༻ΞϧΰϦζϜʹ͚ͨॏཁͳҰาͱͳΔɻ͜ͷΞϧΰϦζϜɺҙͷ ڥʹରͯ͠ਅʹҰൠతͳΞϧΰϦζϜΛఏڙ͢ΔͨΊͷॏཁͳεςοϓͰ͋ΔɻPlayer of Gamesɺ νΣεͱғޟͰڧྗͳੑೳΛൃش͠ɺϔουΞοϓɾϊʔϦϛοτɾςΩαεɾϗʔϧσϜɾϙʔΧʔ Ͱެ։͞Ε͍ͯΔ࠷ڧͷΤʔδΣϯτʢSlumbotʣΛഁΓɺΨΠυ͖୳ࡧɺֶशɺήʔϜཧతਪ ͷՁΛࣔ͢ෆશใήʔϜͰ͋ΔScotland Yardͷ࠷ઌͷΤʔδΣϯτΛഁͬͨɻ w తɿશɾෆશใήʔϜʹద༻Ͱ͖Δ൚༻ήʔϜΞϧΰϦζϜͷݚڀ w Ռɿ୳ࡧɺֶशɺήʔϜཧతਪΛΈ߹Θͤͨ౷ҰΞϧΰϦζϜʮ1P(ʯͷ։ൃ w ํ๏ɿ(5$'3 HSPXJOHUSFFDPVOUFSGBDUVBMSFHSFUNJOJNJ[BUJPO αϯυηϧϑϓϨΠ w ݻ༗໊ɿ1P( 1MBZFSPG(BNFT w ஶऀॴଐɿ%FFQ.JOE http://arxiv.org/abs/2112.03178v1
Augmentation) σʔλΦʔάϝϯςʔγϣϯɺࣗવݴޠॲཧ(NLP)ʹ͓͚ΔϞσϧͷϩόετੑධՁɺ ֶशσʔλͷଟ༷ੑΛߴΊΔͨΊͷॏཁͳཁૉͰ͋ΔɻຊจͰɺPythonϕʔεͷࢀՃ ܕࣗવݴޠॲཧϑϨʔϜϫʔΫͰ͋ΔNL-AugmenterΛհ͠·͢ɻ͜ͷϑϨʔϜϫʔΫ ɺมʢσʔλͷमਖ਼ʣͱϑΟϧλʢಛఆͷಛʹԠͨ͡σʔλͷׂʣͷ྆ํͷ࡞ Λαϙʔτ͠·͢ɻ͜ͷϑϨʔϜϫʔΫͱɺ༷ʑͳࣗવݴޠλεΫͷͨΊͷ117ͷมͱ 23ͷϑΟϧλͷॳظηοτʹ͍ͭͯઆ໌͢Δɻ·ͨɺ͍͔ͭ͘ͷมΛ༻͍ͯҰൠతͳࣗ વݴޠϞσϧͷϩόετੑΛੳ͢Δ͜ͱͰɺNL-Augmenterͷ༗ޮੑΛ࣮ূ͢ΔɻΠϯ ϑϥετϥΫνϟʔɺσʔλΧʔυɺϩόετωεੳ݁ՌɺNL-AugmenterͷϦϙδ τϦ https://github.com/GEM-benchmark/NL-Augmenter Ͱެ։͞Ε͍ͯ·͢ɻ w తɿࣗવݴޠॲཧͷϩόετੑධՁɾσʔλͷଟ༷ੑΛߴΊΔ w ՌɿࢀՃܕࣗવݴޠॲཧ"VHVNFOUBUJPOϑϨʔϜϫʔΫͷެ։ w ํ๏ɿλεΫʹԠͨ͡มॲཧ܈ͱɺಛʹσʔλׂͷͨΊͷϑΟϧλ܈ͷఏڙ w ݻ༗໊ɿ/-"VHVNFOUFS w ஶऀॴଐɿ(PPHMF#SBJO(PPHMF3FTFBSDIଞଟ ౦ژେֶͳͲ http://arxiv.org/abs/2112.02721v1