A Survey March (2020) 17 Table 5: Resources of PTMs Resource Description URL Open-Source Implementations § word2vec CBOW,Skip-Gram https://github.com/tmikolov/word2vec GloVe Pre-trained word vectors https://nlp.stanford.edu/projects/glove FastText Pre-trained word vectors https://github.com/facebookresearch/fastText Transformers Framework: PyTorch&TF, PTMs: BERT, GPT-2, RoBERTa, XLNet, etc. https://github.com/huggingface/transformers Fairseq Framework: PyTorch, PTMs:English LM, German LM, RoBERTa, etc. https://github.com/pytorch/fairseq Flair Framework: PyTorch, PTMs:BERT, ELMo, GPT, RoBERTa, XLNet, etc. https://github.com/flairNLP/flair AllenNLP [47] Framework: PyTorch, PTMs: ELMo, BERT, GPT-2, etc. https://github.com/allenai/allennlp fastNLP Framework: PyTorch, PTMs: RoBERTa, GPT, etc. https://github.com/fastnlp/fastNLP UniLMs Framework: PyTorch, PTMs: UniLM v1&v2, MiniLM, LayoutLM, etc. https://github.com/microsoft/unilm Chinese-BERT [29] Framework: PyTorch&TF, PTMs: BERT, RoBERTa, etc. (for Chinese) https://github.com/ymcui/Chinese-BERT-wwm BERT [36] Framework: TF, PTMs: BERT, BERT-wwm https://github.com/google-research/bert RoBERTa [117] Framework: PyTorch https://github.com/pytorch/fairseq/tree/master/examples/roberta XLNet [209] Framework: TF https://github.com/zihangdai/xlnet/ ALBERT [93] Framework: TF https://github.com/google-research/ALBERT T5 [144] Framework: TF https://github.com/google-research/text-to-text-transfer-transformer ERNIE(Baidu) [170, 171] Framework: PaddlePaddle https://github.com/PaddlePaddle/ERNIE CTRL [84] Conditional Transformer Language Model for Controllable Generation. https://github.com/salesforce/ctrl BertViz [185] Visualization Tool https://github.com/jessevig/bertviz exBERT [65] Visualization Tool https://github.com/bhoov/exbert TextBrewer [210] PyTorch-based toolkit for distillation of NLP models. https://github.com/airaria/TextBrewer DeepPavlov Conversational AI Library. PTMs for the Russian, Polish, Bulgarian, Czech, and informal English. https://github.com/deepmipt/DeepPavlov Corpora OpenWebText Open clone of OpenAI’s unreleased WebText dataset. https://github.com/jcpeterson/openwebtext Common Crawl A very large collection of text. http://commoncrawl.org/ WikiEn English Wikipedia dumps. https://dumps.wikimedia.org/enwiki/ Other Resources Paper List https://github.com/thunlp/PLMpapers Paper List https://github.com/tomohideshibata/BERT-related-papers Paper List https://github.com/cedrickchee/awesome-bert-nlp Bert Lang Street A collection of BERT models with reported performances on di↵erent datasets, tasks and languages. https://bertlang.unibocconi.it/ § Most papers for PTMs release their links of o cial version. Here we list some popular third-party and o cial implementations. However, motivated by the fact that the progress in recent years has eroded headroom on the GLUE benchmark dra- matically, a new benchmark called SuperGLUE [189] was presented. Compared to GLUE, SuperGLUE has more chal- lenging tasks and more diverse task formats (e.g., coreference resolution and question answering). State-of-the-art PTMs are listed in the corresponding leader- board4) 5). (HotpotQA) [208]. BERT creatively transforms the extractive QA task to the spans prediction task that predicts the starting span as well as the ending span of the answer [36]. After that, PTM as an encoder for predicting spans has become a competitive baseline. For extractive QA, Zhang et al. [215] proposed a ret- rospective reader architecture and initialize the encoder with PTM (e.g., ALBERT). For multi-round generative QA, Ju