and Dumitru Erhan. Show and tell: A neural image caption generator. CVPR 2015. [Agrawal+, 2016] Stanislaw Antol, Aishwarya Agrawal, Jiasen Lu, Margaret Mitchell, Dhruv Batra, C. Lawrence Zitnick, and Devi Parikh. VQA: visual question answering. ICCV2015. [Das+, 2018] Abhishek Das, Samyak Datta, Georgia Gkioxari, Stefan Lee, Devi Parikh, Dhruv Batra. Embodied Question Answering. CVPR2018. [Xu+, 2018] Tao Xu, Pengchuan Zhang, Qiuyuan Huang, Han Zhang, Zhe Gan, Xiaolei Huang, Xiaodong He. AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks. CVPR2018. [Bisk+, 2016] Yonatan Bisk, Deniz Yuret, Daniel Marcu. Natural Language Communication with Robots. NAACL2016. P.6 [Okada, 1980] Naoyuki Okada. Conceptual taxonomy of Japanese verbs for understanding natural language and picture patterns. COLING1980. [Hiyoshi+, 1994] Mayumi Hiyoshi and Hideo Shimazu. Drawing pictures with natural language and direct manipulation. COLING1994. 56/54