and Memory-Efficient Exact Attention with IO-Awareness Making Deep Learning Go Brrrr From First Principles https://horace.io/brrr_intro.html MatMul Softmax Matmul
and Memory-Efficient Exact Attention with IO-Awareness Making Deep Learning Go Brrrr From First Principles https://horace.io/brrr_intro.html Flash Attention!
𝑽]に適応 Flash Attention Dao, FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness Online normalizer calculation for softmax, https://arxiv.org/abs/1805.02867 Attention中は ここにデータを留める!
AttentionはBackwardも対応。 Flash Attention Dao, FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness [ref] Online normalizer calculation for softmax, https://arxiv.org/abs/1805.02867 Softmax式