Corrado, and J. Dean. Distributed Representations of Words and Phrases and their Compositionality. In NeurIPS, 2013b. • T. Mikolov, E. Grave, P. Bojanowski, C. Puhrsch, and A. Joulin. Advances in Pre-Training Distributed Word Representations. In LREC, pages 52–55, 2018. • J. Pennington, R. Socher, and C. D. Manning. GloVe: Global Vectors for Word Representation. In EMNLP, pages 1532–1543, 2014. • M. E. Peters, M. Neumann, M. Gardner, C. Clark, K. Lee, and L. Zettlemoyer. Deep Contextualized Word Representations. In NAACL-HLT, pages 2227– 2237, 2018. • T. Schnabel, I. Labutov, D. Mimno, and T. Joachims. Evaluation Methods for Unsupervised Word Embed- dings. In EMNLP, pages 298–307, 2015. • L. Vilnis and A. McCallum. Word Representations via Gaussian Embedding. In ICLR, 2015. • A. Wang, Y. Pruksachatkun, N. Nangia, A. Singh, J. Michael, F. Hill, O. Levy, and S. R. Bowman. SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems. arXiv, 2019a. • A. Wang, A. Singh, J. Michael, F. Hill, O. Levy, and S. R. Bowman. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Under- standing. In ICLR, 2019b. • J. Wieting, M. Bansal, K. Gimpel, and K. Livescu. To- wards Universal Paraphrastic Sentence Embeddings. In ICLR, 2016. 44