q4 k4 v4 q5 k5 v5 [α1,1 α1,2 α1,3 α1,4 α1,5 ] [α2,1 α2,2 α2,3 α2,4 α2,5 ] [α3,1 α3,2 α3,3 α3,4 α3,5 ] [α4,1 α4,2 α4,3 α4,4 α4,5 ] [α5,1 α5,2 α5,3 α5,4 α5,5 ] [ ̂ α1,1 ̂ α1,2 ̂ α1,3 ̂ α1,4 ̂ α1,5 ] [ ̂ α2,1 ̂ α2,2 ̂ α2,3 ̂ α2,4 ̂ α2,5 ] [ ̂ α3,1 ̂ α3,2 ̂ α3,3 ̂ α3,4 ̂ α3,5 ] [ ̂ α4,1 ̂ α4,2 ̂ α4,3 ̂ α4,4 ̂ α4,5 ] [ ̂ α5,1 ̂ α5,2 ̂ α5,3 ̂ α5,4 ̂ α5,5 ] ⊕ ⊗ ⊗ ⊗ ⊗ ⊗ ⊕ ⊗ ⊗ ⊗ ⊗ ⊗ ⊕ ⊗ ⊗ ⊗ ⊗ ⊗ ⊕ ⊗ ⊗ ⊗ ⊗ ⊗ ⊕ ⊗ ⊗ ⊗ ⊗ ⊗ output1 output2 output3 output4 output5 TPGUNBY TPGUNBY TPGUNBY TPGUNBY TPGUNBY x1 e1 x2 e2 x3 e3 x4 e4 x5 e5 &NCFEEJOH &NCFEEJOH &NCFEEJOH &NCFEEJOH &NCFEEJOH 1& Figure 1: The Transformer - model architecture. der and Decoder Stacks Figure 1: The Transformer - model architecture. 1& Figure 1: The Transformer - model architecture. 3.1 Encoder and Decoder Stacks Figure 1: The Transformer - model architecture. 1& Figure 1: The Transformer - model architecture. 3.1 Encoder and Decoder Stacks Figure 1: The Transformer - model architecture. 1& Figure 1: The Transformer - model architecture. 3.1 Encoder and Decoder Stacks Figure 1: The Transformer - model architecture. 1& Figure 1: The Transformer - model architecture. 3.1 Encoder and Decoder Stacks Figure 1: The Transformer - model architecture. Figure 1: The Transformer - model architecture. 3.1 Encoder and Decoder Stacks Encoder: The encoder is composed of a stack of N = 6 identical layers. Each "UUFOUJPOXFJHIUͱ7BMVFಛྔΛࢉ͠ใ༩͢Δ͜ͱͰɼ࣌ࠁؒͷಛྔͷؔੑΛٻΊΔ Attention(Q, K, V) = ̂ αV