We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. • Self-Attention ◦ 入力のすべての要素との関連性を計算する機構 ◦ 局所的・全体的な関係も学習する
T., Brown, T. B., Chess, B., Child, R., Gray, S., Radford, A., Wu, J., & Amodei, D. (2020, January 23). Scaling Laws for Neural Language Models. arXiv. https://arxiv.org/abs/2001.08361
Jason Wei and Yi Tay, Research Scientists, Google Research, Brain Team (2022), Characterizing Emergent Phenomena in Large Language Models https://research.google/blog/characterizing-emergent-phenomena-in-large-language-models/