2”によりテキストから⽣成された画像 [Ramesh(OpenAI)+,2022/04/13] vibrant portrait painting of Salvador Dalí with a robotic half face a shiba inu wearing a beret and black turtleneck https://cdn.openai.com/papers/dall-e-2.pdf https://arxiv.org/abs/2204.14198
[Patashnik+, ICCV’21] https://openaccess.thecvf.com/content/ICCV2021/papers/Patashnik_StyleCL IP_Text-Driven_Manipulation_of_StyleGAN_Imagery_ICCV_2021_paper.pdf A female face A surprised female face CLIP空間 Style空間 射影 58
2021/07] Z-vector VQGAN Decoder CLIP 類似度のlossで学習 学習パラメータ an astronaut in the style of van Gogh https://arxiv.org/abs/2204.08583 blue whales swimming through neon city 59 https://twitter.com/ak92501/status/1413360535685435396
WACV’22] Q. How many females are affected by diabetes A. 3.6% Q. What percentage of cases can not be prevented A. 40% (100 – 60) Q. What could lead to blindness or stroke A. diabetes https://rrc.cvc.uab.es/?ch=17
the patient that his shift would be ending in an hour. The “his” refers to … the patient ? the nurse? 指⽰語の性別バイアスの評価 ステレオタイプと異なる 組み合わせだと精度落ちる プロンプトに続く⽣成テキストが有害となる分布 特定宗教に関して有害なテ キストを⽣成しやすい スコア⼤︓有害 https://arxiv.org/abs/2204.02311 83 PaLM [Chowdhery (Google)+, 2022/04/19]
NIPS 2017: 5998-6008 2. Jacob Devlin et al.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. NAACL-HLT (1) 2019: 4171-4186 3. Tom B. Brown et al.: Language Models are Few-Shot Learners. NeurIPS 2020 4. Colin Raffel et al.: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. J. Mach. Learn. Res. 21: 140:1-140:67 (2020) 5. Dzmitry Bahdanau et al.: Neural Machine Translation by Jointly Learning to Align and Translate. ICLR 2015 6. Pranav Rajpurkar et al.: SQuAD: 100, 000+ Questions for Machine Comprehension of Text. EMNLP 2016: 2383- 2392 7. Jared Kaplan et al.: Scaling Laws for Neural Language Models. CoRR abs/2001.08361 (2020) 8. Opher Lieber et al.: Jurassic-1: Technical Details and Evaluation, Tech. Report, AI21 Labs (2021) 9. Aditya Ramesh et al.: Hierarchical Text-Conditional Image Generation with CLIP Latents. CoRR abs/2204.06125 (2022) 10. Jean-Baptiste Alayrac et al.: Flamingo: a Visual Language Model for Few-Shot Learning. CoRR abs/2204.14198 (2022) 11. Shaoqing Ren, Kaiming He, Ross B. Girshick, Jian Sun: Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. NIPS 2015: 91-99 12. Liunian Harold Li et al.: VisualBERT: A Simple and Performant Baseline for Vision and Language. CoRR abs/1908.03557 (2019) 13. Pengchuan Zhang et al: VinVL: Revisiting Visual Representations in Vision-Language Models. CVPR 2021: 5579- 5588 14. Alexey Dosovitskiy et al.: An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. ICLR 2021 15. Alec Radford et al.: Learning Transferable Visual Models From Natural Language Supervision. ICML 2021: 8748- 8763 参考⽂献 98
Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, Wen-tau Yih: Dense Passage Retrieval for Open-Domain Question Answering. EMNLP (1) 2020: 6769-6781 17. Or Patashnik et al.: StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery. ICCV 2021: 2065-2074 18. Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, Timo Aila: Analyzing and Improving the Image Quality of StyleGAN. CVPR 2020: 8107-8116 19. Katherine Crowson et al: VQGAN-CLIP: Open Domain Image Generation and Editing with Natural Language Guidance. CoRR abs/2204.08583 (2022) 20. Patrick Esser, Robin Rombach, Björn Ommer: Taming Transformers for High-Resolution Image Synthesis. CVPR 2021: 12873-12883 21. Xiuye Gu et al.: Zero-Shot Detection via Vision and Language Knowledge Distillation. ICLR 2022 22. Yael Vinker et al.: CLIPasso: Semantically-Aware Object Sketching. SIGGRAPH 2022. 23. Guy Tevet et al: MotionCLIP: Exposing Human Motion Generation to CLIP Space. CoRR abs/2203.08063 (2022) 24. Jonathan Ho, Ajay Jain, Pieter Abbeel: Denoising Diffusion Probabilistic Models. NeurIPS 2020 25. Minesh Mathew et al.: DocVQA: A Dataset for VQA on Document Images. WACV 2021: 2199-2208 26. Ryota Tanaka et al: VisualMRC: Machine Reading Comprehension on Document Images. AAAI 2021: 13878-13888 27. Yupan Huang et al: LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking. CoRR abs/2204.08387 (2022) 28. Minesh Mathew et al: InfographicVQA. WACV 2022: 2582-2591 29. ⽥中涼太 et al: テキストと視覚的に表現された情報の融合理解に基づくインフォグラフィック質問応答, NLP 2022 30. Geewook Kim et al.: Donut: Document Understanding Transformer without OCR. CoRR abs/2111.15664 (2021) 参考⽂献 99
Analysis & Insights from Training Gopher. CoRR abs/2112.11446 (2021) 32. Jordan Hoffmann et al. : Training Compute-Optimal Large Language Models. CoRR abs/2203.15556 (2022) 33. Aakanksha Chowdhery et al.: PaLM: Scaling Language Modeling with Pathways. CoRR abs/2204.02311 (2022) 34. William Fedus et al.: Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity. CoRR abs/2101.03961 (2021) 35. Ze Liu et al: Swin Transformer V2: Scaling Up Capacity and Resolution. CVPR 2022 36. Romal Thoppilan et al.: LaMDA: Language Models for Dialog Applications. CoRR abs/2201.08239 (2022) 37. Stephen H. Bach et al.: PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts. ACL (demo) 2022: 93-104 38. Long Ouyang et al.: Training language models to follow instructions with human feedback. CoRR abs/2203.02155 (2022) 39. Victor Sanh et al.: Multitask Prompted Training Enables Zero-Shot Task Generalization. ICLR 2022 40. Haokun Liu et al.: Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning. CoRR abs/2205.05638 (2022) 41. Jason Wei et al: Chain of Thought Prompting Elicits Reasoning in Large Language Models. CoRR abs/2201.11903 (2022) 42. Yujie Lu et al.: Imagination-Augmented Natural Language Understanding. NAACL-HLT 2022. 43. Mohit Shridhar et al.: CLIPort: What and Where Pathways for Robotic Manipulation. CoRL 2021: 894-906 44. Andrea Burns et al.: Mobile App Tasks with Iterative Feedback (MoTIF): Addressing Task Feasibility in Interactive Visual Environments. CoRR abs/2104.08560 (2021) 45. Rowan Zellers et al.: MERLOT Reserve: Neural Script Knowledge through Vision and Language and Sound. CVPR 2022 46. Scott E. Reed et al.: A Generalist Agent. CoRR abs/2205.06175 (2022) 参考⽂献 100