| x1:n) = 1 Z(x1:n) n Y i=1 (yi | yi 1, x1:n) = 1 Z(x1:n) n Y i=1 exp wT f(yi, yi 1, x1:n) Z(x1:n) = X y0 n Y i=1 exp wT f(y0 i , y0 i 1 , x1:n) େҬతͳਖ਼نԽ ؍ଌ͞Ε͍ͯΔܥྻ ༧ଌ͍ͨ͠ϥϕϧྻ yt 2 yt yt 1 yt+1 yt+2 … … x1:n
2011͔Β?] ͷྲྀߦ ࠓճͷൃදͰRNNΛͬͨϞσϧͨͪΛհ͠·͢ Collobert, Ronan, et al. "Natural language processing (almost) from scratch." Journal of Machine Learning Research 12.Aug (2011): 2493-2537.
Architectures for Named Entity Recognition Yulia, Wang Ling Lin Chu-Cheng, et al. "Not all contexts are created equal: Better word representations with variable attention.” Proc. EMNLP (2015).
Pennington Jeffrey, Richard Socher, and Christopher Manning. "Glove: Global vectors for word representation.” Proc. EMNLP. 2014. -45. $// $3'Ϟσϧࣗମ͔ͳΓෳࡶʹͳ͖ͬͯͨ ଞͷλεΫσʔλͱͷΈ߹ΘͤͰ͖Δʁ • ୯ޠಛྔ (GloVe 100d [Pennington2014]) • จࣈಛྔ (จࣈCNNͷΈ)
Peters+, ACL2017 Semi-supervised sequence tagging with bidirectional language models Chelba et al. "One billion word benchmark for measuring progress in statistical language modeling." arXiv preprint arXiv:1312.3005 (2013).
Empower Sequence Labeling with Task-Aware Neural Language Model ਤจத͔ΒҾ༻ Srivastava, Rupesh Kumar, Klaus Greff, and Jürgen Schmidhuber. “Highway networks." arXiv preprint arXiv:1505.00387 (2015).