& Mei, T. X-linear attention networks for image captioning. CVPR2020 [Ziegler+, 2019] Ziegler, D. M., Stiennon, N., Wu, J., Brown, T. B., Radford, A., Amodei, D., Christiano, P., & Irving, G. Fine-Tuning Language Models from Human Preferences. arXiv. http://arxiv.org/abs/1909.08593 [Stiennon+, 2020] Stiennon, N., Ouyang, L., Wu, J., Ziegler, D. M., Lowe, R., Voss, C., Radford, A., Amodei, D., & Christiano, P. Learning to summarize from human feedback. NeurIPS2020. P.72 [Benjio+,2015] Samy Bengio, Oriol Vinyals, Navdeep Jaitly, and Noam Shazeer. 2015. Scheduled sampling for sequence prediction with recurrent Neural networks. NIPS2015. MIT Press, Cambridge, MA, USA, 1171–1179. P.78 [Schulman+,2017] Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. Proximal Policy Optimization Algorithms. arXiv2017. P.79 [Choshen+, ICLR2020] Choshen, L., Fox, L., Aizenbud, Z., & Abend, O. (2019). On the Weaknesses of Reinforcement Learning for Neural Machine Translation. ICLR2020. P.84 [Benotti+,2021] Benotti, L., & Blackburn, P. Grounding as a Collaborative Process. EACL2021. 515–531. P.86 [Nguyen+, 2019] Khanh Nguyen, Hal Daumé III. Help, Anna! Visual Navigation with Natural Multimodal Assistance via Retrospective Curiosity-Encouraging Imitation Learning. EMNLP2019. 97/97