൛Ͱ LSTM ͕ͩͬͨվగ൛ (v2) Ͱ RNN+GRU ʹมߋ 1. hR i = RNN(hR i−1 , w(pi)), hL i = RNN(hL i+1 , w(pi)) 2. pi = concat(hR i , hL i ) 2. Attention 1. αi = softmaxi q⊤Wspi 2. o = ∑ i αipi α: prob. distribution (=attention), q: question embedding, pi: contextual embedding for pi (i-th word in the passage), Ws: weight matrix used for a bilinear term (it frexibly computes a similarity between q and pi), o: output vector 3. Prediction a = argmaxa∈p W⊤ a o 18 / 26