transformers import BertTokenizer tknzr = BertTokenizer.from_pretrained('bert-base-cased’) import maxMatchTokenizer mmt = maxMatchTokenizer.MaxMatchTokenizer() mmt.loadBertTokenizer(tknzr, doNaivePreproc=True) mmt.tokenize(‘hello, wordpiece!’, p=0.5) # outputs: ['hello', ',', 'w', '##ord', '##piece', '!'] BertTokenizer can be directly loaded! BertTokenizerに組み込んでpull request出したいなあと思いつつ半年が経っている 2022/12/14 NLPコロキウム 49