Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The mathematics of transformers

The mathematics of transformers

Talk in honor of Stéphane Mallat's CNRS gold medal.

Avatar for Gabriel Peyré

Gabriel Peyré

January 27, 2026
Tweet

More Decks by Gabriel Peyré

Other Decks in Research

Transcript

  1. Gabriel Peyré É C O L E N O R

    M A L E S U P É R I E U R E The mathematics of transformers Takashi Furuya Maarten de Hoop Valérie Castin Pierre Ablin
  2. Generative AI: text vs. images Tell the story of a

    CNRS researcher presenting generative AI to a large audience in a funny way. Dr. Martin Lefèvre, the kind of CNRS researcher who refers to debugging as “therapy,” was scheduled to present generative AI to a huge audience—hundreds of people, two drones for some reason, and a guy in the front row eating chips directly out of his backpack. Martin strolled on stage, slightly sweaty, holding a USB stick like it was the One Ring. “Ladies and gentlemen,” he began, “generative AI is revolutionizing research, art, and occasionally my grocery list.” … Depict a CNRS mathematics researcher presenting generative AI in front of a large audience. DALL·E 2
  3. Generative AI: text vs. images Tell the story of a

    CNRS researcher presenting generative AI to a large audience in a funny way. Dr. Martin Lefèvre, the kind of CNRS researcher who refers to debugging as “therapy,” was scheduled to present generative AI to a huge audience—hundreds of people, two drones for some reason, and a guy in the front row eating chips directly out of his backpack. Martin strolled on stage, slightly sweaty, holding a USB stick like it was the One Ring. “Ladies and gentlemen,” he began, “generative AI is revolutionizing research, art, and occasionally my grocery list.” … Depict a CNRS mathematics researcher presenting generative AI in front of a large audience. DALL·E 2 Pre-training: denoising. Generation: dynamic transport. Pre-training: next token prediction. Generation: auto-regressive. Dr. Martin Lefèvre, the kind of CNRS researcher who refers to debugging as
  4. LLMs and very long contexts What is the 100th term

    of the arithmetic sequence 6, 10, 14, 18, ...? Answer: 412 Prompt: Pattern: 
 each term + 4 Rule: 
 a_n=6+(n−1)·4 Answer: n=100, 
 6+99·4=402 Training Small texts Next token prediction Reinforcement Learning Inference Very long Prompts Chain of thoughts Math reasonning
  5. LLMs and very long contexts What is the 100th term

    of the arithmetic sequence 6, 10, 14, 18, ...? Answer: 412 Prompt: Pattern: 
 each term + 4 Rule: 
 a_n=6+(n−1)·4 Answer: n=100, 
 6+99·4=402 Tell the story Training Small texts Next token prediction Reinforcement Learning Inference Very long Prompts Chain of thoughts Math reasonning
  6. LLMs and very long contexts What is the 100th term

    of the arithmetic sequence 6, 10, 14, 18, ...? Answer: 412 Prompt: Pattern: 
 each term + 4 Rule: 
 a_n=6+(n−1)·4 Answer: n=100, 
 6+99·4=402 Tell the story of a CNRS researcher Training Small texts Next token prediction Reinforcement Learning Inference Very long Prompts Chain of thoughts Math reasonning
  7. LLMs and very long contexts What is the 100th term

    of the arithmetic sequence 6, 10, 14, 18, ...? Answer: 412 Prompt: Pattern: 
 each term + 4 Rule: 
 a_n=6+(n−1)·4 Answer: n=100, 
 6+99·4=402 in a funny way. […] Tell the story of a CNRS researcher Training Small texts Next token prediction Reinforcement Learning Inference Very long Prompts Chain of thoughts Math reasonning
  8. Transformers and attention mechanism … + <latexit sha1_base64="7Z/IumRXp79HdyogVfnC2DA+LeM=">AAA9iHictVttcxu3EYbTt1h9c9qPnelcq7iTdFyNpHqadjKaifViSbFiySYlOwltDV9O1NknHs0jJdkM/0e/tn+kv6P/oP3Uv9DdBXDAkbhbQHV1IwkH4nl2sQcsdoFjZ5gm+Xh19Z+3Pvje93/wwx99eHvpxz/56c9+fuejX5zk2WTUjY+7WZqNnnfaeZwmg/h4nIzT+PlwFLcvOmn8rPN6Cz9/dhmP8iQbNMdvh/GLi3Z/kJwl3fYYql62iGE6inuz69P10zvLqyur9BMtFtZUYVmon6Pso1+vi5boiUx0xURciFgMxBjKqWiLHK5vxZpYFUOoeyGmUDeCUkKfx2ImlgA7gVYxtGhD7Wv424e7b1XtAO6RMyd0F6Sk8DsCZCTuAiaDdiMoo7SIPp8QM9ZWcU+JE3V7C/87iusCasfiHGo5nG7pi8O+jMWZ+DP1IYE+DakGe9dVLBOyCmoeWb0aA8MQ6rDcg89HUO4SUts5IkxOfUfbtunzf1FLrMX7rmo7Ef8mLe/CFYmG6n1WMLTFJfFH9DQn8JnUJwXJfWCIVR+xdEW2vqDeD6D9FOofwzWjkrZJB64p1c5qkVtwuZBbLHIXLhdyl0UewOVCHrDII7hcyCOFROyIbO7GN+By4Rus5CdwuZBPWORTuFzIpyzyBC4X8oRFfgOXC/kNi3wIlwv5kEU+gsuFfMQim3C5kE0WeQyXC3nMInfgciF3FLJ6po7gyognYWblAyiXZaCnSKHmAavfJnlHF3bTY053K7D8rN6G/27stodN4wrsjse4O6vA8iNvF3ykG8v7oj1aTVzYPRa7DyPAjd1nsV+KVxXYLz1m2usKLD/XDqCdG8t736/gzo39isU+hpIby69Rh1Djxh56rBjDCuwRi30i3lRgfbz+qALL+/0G+BU3ll+nmtDejfXxppMKLO9PTyCCcWP51eoZ1Lqxz1jsc3FdgX3OYr8G7+7Gfu2xwr6rwOo1dolWkD7FIzHM2Dq2djErsTQEtjYjPy3WlpRi4w7Uc5h+gekT5oJF7BaIXU/EQYE48NYrL/xoTvEuL6VRIBqeiE6xNmFpzLbvFe2xlHogtgvE9hyiLiLFZ637cknRha7hkONi5cKST5+ywn9jKVbjod7zasRhCSHH9jmN/HuULWEGhZaqYzsv1niJjOi+DnFF2ZvupZbB48aFV7BR1yyq40B1WNRbB+oti5o4UBMWdelAXbIoM/NtXMtjBBj747OY0p0cATJGrr4iiAoewKqzB3M0gvFzBFHgU6o5hP8Nyr25q04zzOZxncRdjhclTzyC0lQsQ73JCrcpv05phsWgmWx5qHJ8vMO9jamac9ILz4qVPCp2TPx5EtKnX/BgtBjRfArjeUQ1M4ruZCkMv1fMe10Kw++QxWcUxctSGH6stB/fQPemwjZvgG3AbBoq65tyKIfcf5EcurxEqy56XHyqF2rMIN91IP++ejL7N3guW1SS9jHlMI7c6l9e6l8Ih7Fzbtk5jAWjJxn16lIU3JOByntNOVSHjFbRgdLD3IU+GWzTU09Gl8M4jiDi2qKce2qVQ0fvsOiNKYdxnAi57zmjSF6Xwzj6dC/tYcphHLjb0lZ5vimHena0gMydTTnUqw9oFxj3gOSYlzUmKhpRnDRRbAnFB/W7NXbMv7iO4Z7NyyJHqGcysW01T6dYy+o10vFCDF5tHKgHxhcTKwYrc0zFOptfSR3GpfV9kces8Wj5A7BiBLNfngFwe+YpaKj3JNB7p8C4xmZd5Z5p3DqLw1FyNodqqdoxGy0auXLXqFx3SrVcXmZ6a+zYIn+d09gbUkx4QJbl7HBQ+YSrGDkLHZQsxPOF2O6dmq9l66+yuOEcYliMtC6dCMmTtPo81WX1hmXju+qUZwyXPPMx4xd3m8+Ut8GcJyNfhLrUybTb6X0kuw7X1XvC7HHLzyJ6ouivLslrJHQilbNZqN4tltH4lO4N9zGdyaEMydGF5xgplqGQp2a4i4776RF5VNvfcrLRXnqHTpZz8rraH9ej+xa670CH5zhbsGI8hlITcoZjuGt6ZDlLha0ysvhI/KE4Hc3oCdZn9GnJQ2oO6W/ikoesy7LPSyxXgMbRILN0f455Ho1vLTDxWb9LH5O7lj3/XTq51efbbRrj1aO5eiemR1LXSWpEs0ae6sq7eQlSg6nzk3WKX+t7ifJCJKIP5aS+tCRLuwzoxD+mDHZIkXFKs42bHeXW9v7U/Cda0pHQZ+d4mp2Rh4zI/0WwPmU0JiP6td8d0Cfo0iOk5CN9/E5SRDeuWCdhx5iJ4xIh32ow4y0mXzYh+ZrXnl05jUWZMch1YDY3trVNDigWjEnqSHl3M7frVx9Emvck7FEiGc1Y+YTkf0p/9a8eJ8sLIwItjE8gV77O9TwyylnQRm1a5et9kG5ra/lxocNLpbVZ/4xOH5c026aMC/XB1boHkrt0L2XhKBmR3vlCG7mO1u3mIvNwzo7Y2zPK4qXf76sVGPW+R6vkMs25Fo2SPoyCcZFF6LbcLvK83HpZZXY/7vz/wm5sXbYaMkbC7OBKC3H7+zFla7aWKYxqOX5f02xyW30016pezoDG4oU1l7+D2t/AX623vvfj6ZS8wiaNAclg7oxFZE200MJP1mZJlh6ZmsvcG3lmTOpWds1N8mvp3UyOfRnMckSj5lrtWujyTTheWRyvPG3YpLNGY0Vdrz3RKZtbNNVppa+8EGnNAOYJy8xHZBqVeGhp51J+rD2Wlc/xNeody7XKcrVhttqnAfac90G65/r87P6uWN0j8ZBimy5FYDJ/6dEsTSjm0rX1mZpkQMn3lX+1Z3+LalB6hzwoMsv3OHHGyFOnLl2zQtPfqZUtIz9vPIJ+b+lKtdE+tkXlPy4gL2hO5DQvNeI+tYiV/rYe0ZxHWrFijoh2/tsUU8m4oz5ntlubZxKV4gmTb8pZZWTJTGFA9ud23vYXstd9K3+NKCecqOi6A1zhTxgZJEbvJLgjy5yeEK5y8iRBRrQd8p+Lfkqe4g0sjVZI66nY8PAxMus1Y90eW7rHum+/h5ZodfPUXS14eam3RE7eTU702rSqXagYdTp3fzOutlrlyvd1dpjMyTX2mFAbO7MwWV4Z0xKfe0uRGoVJkRgfKWG9CNE/TPMQneXplC+zbq2ZyzsN0secU77EvQeKCFd094kzmvuU6Udnga9DWJtN1nBMuBuXqf0B29PirtTthXVI1t6uXY1SayWqWik0u71aGP8tPWRM3i8V3J6NbG3r3iplKfwujGToCvlGb1V+aHN+Dhf+jYQrO9QSffYOGxDfPhBbYuc9vA3xRpXljmZENegLenO5d1v1s9yi3kZvLHab30eCv4wEbM1pn9BKGqq7ZOY1t9n9+a/IC4xEzGpvWob3wZbC92RRUkh/EvJsfG8Sob+LE9oXLcGnJ2Up/nLkuQbXizOhv9MU1gfNzvegLCFEhn6Pwe+Zm9bhsmxJ9fZalOIrQ64C+sRF4/DkrzpXMe18PNTIeiLvXwJ6h7Madr1a/K/90HKMpHBZvtJy+q7ZK4+nLtvFakcW4+HwOWOk+Yzmaon+MrOidyZacsuTcV8U9KQyqzfvnx/jUTMGtKypkPugvHYSb48io68vC54LuHTIxH/EP27x30Z4U3BU6RHCpM8pqtl0C55Nf+PS1Tv9mY9OhqdKpzKbySMa9EbsltgXD+F3q4gAQ98Old+llP8R6/7+bA9qz8h76F10uXPQorqYdj/MKVqP7tUe4+md5bX5byEvFk7WV9b+tHL/yf3lLzbVN5Q/FL8Sv4W8ZE18Jr4Qe9DfYzpX+Kv4m/j7xtLG6sZnG3+RTT+4pTC/FKWfjc3/AnoD2dI=</latexit> x2 Points

    cloud Positional encoding Token encoding Tokenize Tell the story of a C N R S r e s e a r c h e r presenting generative AI to a large audience in a funny way. Tell the story of a C N R S r e s e a r c h e r presenting generative AI to a large audience in a funny way. <latexit sha1_base64="aTL0Qvb1dLhAur6wfZM9PGylzLY=">AAA9iHictVttcxu3EYbTt1h9c9qPnelcq7iTdFyNpHqadjKaifViSbFiySYlOwltDV9O1NknHs0jJdkM/0e/tn+kv6P/oP3Uv9DdBXDAkbhbQHV1IwkH4nl2sQcsdoFjZ5gm+Xh19Z+3Pvje93/wwx99eHvpxz/56c9+fuejX5zk2WTUjY+7WZqNnnfaeZwmg/h4nIzT+PlwFLcvOmn8rPN6Cz9/dhmP8iQbNMdvh/GLi3Z/kJwl3fYYql62iGE6inuz69O10zvLqyur9BMtFtZUYVmon6Pso1+vi5boiUx0xURciFgMxBjKqWiLHK5vxZpYFUOoeyGmUDeCUkKfx2ImlgA7gVYxtGhD7Wv424e7b1XtAO6RMyd0F6Sk8DsCZCTuAiaDdiMoo7SIPp8QM9ZWcU+JE3V7C/87iusCasfiHGo5nG7pi8O+jMWZ+DP1IYE+DakGe9dVLBOyCmoeWb0aA8MQ6rDcg89HUO4SUts5IkxOfUfbtunzf1FLrMX7rmo7Ef8mLe/CFYmG6n1WMLTFJfFH9DQn8JnUJwXJfWCIVR+xdEW2vqDeD6D9FOofwzWjkrZJB64p1c5qkVtwuZBbLHIXLhdyl0UewOVCHrDII7hcyCOFROyIbO7GN+By4Rus5CdwuZBPWORTuFzIpyzyBC4X8oRFfgOXC/kNi3wIlwv5kEU+gsuFfMQim3C5kE0WeQyXC3nMInfgciF3FLJ6po7gyognYWblAyiXZaCnSKHmAavfJnlHF3bTY053K7D8rN6G/27stodN4wrsjse4O6vA8iNvF3ykG8v7oj1aTVzYPRa7DyPAjd1nsV+KVxXYLz1m2usKLD/XDqCdG8t736/gzo39isU+hpIby69Rh1Djxh56rBjDCuwRi30i3lRgfbz+qALL+/0G+BU3ll+nmtDejfXxppMKLO9PTyCCcWP51eoZ1Lqxz1jsc3FdgX3OYr8G7+7Gfu2xwr6rwOo1dolWkD7FIzHM2Dq2djErsTQEtjYjPy3WlpRi4w7Uc5h+gekT5oJF7BaIXU/EQYE48NYrL/xoTvEuL6VRIBqeiE6xNmFpzLbvFe2xlHogtgvE9hyiLiLFZ637cknRha7hkONi5cKST5+ywn9jKVbjod7zasRhCSHH9jmN/HuULWEGhZaqYzsv1niJjOi+DnFF2ZvupZbB48aFV7BR1yyq40B1WNRbB+oti5o4UBMWdelAXbIoM/NtXMtjBBj747OY0p0cATJGrr4iiAoewKqzB3M0gvFzBFHgU6o5hP8Nyr25q04zzOZxncRdjhclTzyC0lQsQ73JCrcpv05phsWgmWx5qHJ8vMO9jamac9ILz4qVPCp2TPx5EtKnX/BgtBjRfArjeUQ1M4ruZCkMv1fMe10Kw++QxWcUxctSGH6stB/fQPemwjZvgG3AbBoq65tyKIfcf5EcurxEqy56XHyqF2rMIN91IP++ejL7N3guW1SS9jHlMI7c6l9e6l8Ih7Fzbtk5jAWjJxn16lIU3JOByntNOVSHjFbRgdLD3IU+GWzTU09Gl8M4jiDi2qKce2qVQ0fvsOiNKYdxnAi57zmjSF6Xwzj6dC/tYcphHLjb0lZ5vimHena0gMydTTnUqw9oFxj3gOSYlzUmKhpRnDRRbAnFB/W7NXbMv7iO4Z7NyyJHqGcysW01T6dYy+o10vFCDF5tHKgHxhcTKwYrc0zFOptfSR3GpfV9kces8Wj5A7BiBLNfngFwe+YpaKj3JNB7p8C4xmZd5Z5p3DqLw1FyNodqqdoxGy0auXLXqFx3SrVcXmZ6a+zYIn+d09gbUkx4QJbl7HBQ+YSrGDkLHZQsxPOF2O6dmq9l66+yuOEcYliMtC6dCMmTtPo81WX1hmXju+qUZwyXPPMx4xd3m8+Ut8GcJyNfhLrUybTb6X0kuw7X1XvC7HHLzyJ6ouivLslrJHQilbNZqN4tltH4lO4N9zGdyaEMydGF5xgplqGQp2a4i4776RF5VNvfcrLRXnqHTpZz8rraH9ej+xa670CH5zhbsGI8hlITcoZjuGt6ZDlLha0ysvhI/KE4Hc3oCdZn9GnJQ2oO6W/ikoesy7LPSyxXgMbRILN0f455Ho1vLTDxWb9LH5O7lj3/XTq51efbbRrj1aO5eiemR1LXSWpEs0ae6sq7eQlSg6nzk3WKX+t7ifJCJKIP5aS+tCRLuwzoxD+mDHZIkXFKs42bHeXW9v7U/Cda0pHQZ+d4mp2Rh4zI/0WwPmU0JiP6td8d0Cfo0iOk5CN9/E5SRDeuWCdhx5iJ4xIh32ow4y0mXzYh+ZrXnl05jUWZMch1YDY3trVNDigWjEnqSHl3M7frVx9Emvck7FEiGc1Y+YTkf0p/9a8eJ8sLIwItjE8gV77O9TwyylnQRm1a5et9kG5ra/lxocNLpbVZ/4xOH5c026aMC/XB1boHkrt0L2XhKBmR3vlCG7mO1u3mIvNwzo7Y2zPK4qXf76sVGPW+R6vkMs25Fo2SPoyCcZFF6LbcLvK83HpZZXY/7vz/wm5sXbYaMkbC7OBKC3H7+zFla7aWKYxqOX5f02xyW30016pezoDG4oU1l7+D2t/AX623vvfj6ZS8wiaNAclg7oxFZE200MJP1mZJlh6ZmsvcG3lmTOpWds1N8mvp3UyOfRnMckSj5lrtWujyTTheWRyvPG3YpLNGY0Vdrz3RKZtbNNVppa+8EGnNAOYJy8xHZBqVeGhp51J+rD2Wlc/xNeody7XKcrVhttqnAfac90G65/r87P6uWN0j8ZBimy5FYDJ/6dEsTSjm0rX1mZpkQMn3lX+1Z3+LalB6hzwoMsv3OHHGyFOnLl2zQtPfqZUtIz9vPIJ+b+lKtdE+tkXlPy4gL2hO5DQvNeI+tYiV/rYe0ZxHWrFijoh2/tsUU8m4oz5ntlubZxKV4gmTb8pZZWTJTGFA9ud23vYXstd9K3+NKCecqOi6A1zhTxgZJEbvJLgjy5yeEK5y8iRBRrQd8p+Lfkqe4g0sjVZI66nY8PAxMus1Y90eW7rHum+/h5ZodfPUXS14eam3RE7eTU702rSqXagYdTp3fzOutlrlyvd1dpjMyTX2mFAbO7MwWV4Z0xKfe0uRGoVJkRgfKWG9CNE/TPMQneXplC+zbq2ZyzsN0secU77EvQeKCFd094kzmvuU6Udnga9DWJtN1nBMuBuXqf0B29PirtTthXVI1t6uXY1SayWqWik0u71aGP8tPWRM3i8V3J6NbG3r3iplKfwujGToCvlGb1V+aHN+Dhf+jYQrO9QSffYOGxDfPhBbYuc9vA3xRpXljmZENegLenO5d1v1s9yi3kZvLHab30eCv4wEbM1pn9BKGqq7ZOY1t9n9+a/IC4xEzGpvWob3wZbC92RRUkh/EvJsfG8Sob+LE9oXLcGnJ2Up/nLkuQbXizOhv9MU1gfNzvegLCFEhn6Pwe+Zm9bhsmxJ9fZalOIrQ64C+sRF4/DkrzpXMe18PNTIeiLvXwJ6h7Madr1a/K/90HKMpHBZvtJy+q7ZK4+nLtvFakcW4+HwOWOk+Yzmaon+MrOidyZacsuTcV8U9KQyqzfvnx/jUTMGtKypkPugvHYSb48io68vC54LuHTIxH/EP27x30Z4U3BU6RHCpM8pqtl0C55Nf+PS1Tv9mY9OhqdKpzKbySMa9EbsltgXD+F3q4gAQ98Old+llP8R6/7+bA9qz8h76F10uXPQorqYdj/MKVqP7tUe4+md5bX5byEvFk7WV9b+tHL/yf3lLzbVN5Q/FL8Sv4W8ZE18Jr4Qe9DfYzpX+Kv4m/j7xtLG6sZnG3+RTT+4pTC/FKWfjc3/Ajzs2dE=</latexit> x1 Tokenize xn {xi }n i=1
  9. Transformers and attention mechanism … + <latexit sha1_base64="7Z/IumRXp79HdyogVfnC2DA+LeM=">AAA9iHictVttcxu3EYbTt1h9c9qPnelcq7iTdFyNpHqadjKaifViSbFiySYlOwltDV9O1NknHs0jJdkM/0e/tn+kv6P/oP3Uv9DdBXDAkbhbQHV1IwkH4nl2sQcsdoFjZ5gm+Xh19Z+3Pvje93/wwx99eHvpxz/56c9+fuejX5zk2WTUjY+7WZqNnnfaeZwmg/h4nIzT+PlwFLcvOmn8rPN6Cz9/dhmP8iQbNMdvh/GLi3Z/kJwl3fYYql62iGE6inuz69P10zvLqyur9BMtFtZUYVmon6Pso1+vi5boiUx0xURciFgMxBjKqWiLHK5vxZpYFUOoeyGmUDeCUkKfx2ImlgA7gVYxtGhD7Wv424e7b1XtAO6RMyd0F6Sk8DsCZCTuAiaDdiMoo7SIPp8QM9ZWcU+JE3V7C/87iusCasfiHGo5nG7pi8O+jMWZ+DP1IYE+DakGe9dVLBOyCmoeWb0aA8MQ6rDcg89HUO4SUts5IkxOfUfbtunzf1FLrMX7rmo7Ef8mLe/CFYmG6n1WMLTFJfFH9DQn8JnUJwXJfWCIVR+xdEW2vqDeD6D9FOofwzWjkrZJB64p1c5qkVtwuZBbLHIXLhdyl0UewOVCHrDII7hcyCOFROyIbO7GN+By4Rus5CdwuZBPWORTuFzIpyzyBC4X8oRFfgOXC/kNi3wIlwv5kEU+gsuFfMQim3C5kE0WeQyXC3nMInfgciF3FLJ6po7gyognYWblAyiXZaCnSKHmAavfJnlHF3bTY053K7D8rN6G/27stodN4wrsjse4O6vA8iNvF3ykG8v7oj1aTVzYPRa7DyPAjd1nsV+KVxXYLz1m2usKLD/XDqCdG8t736/gzo39isU+hpIby69Rh1Djxh56rBjDCuwRi30i3lRgfbz+qALL+/0G+BU3ll+nmtDejfXxppMKLO9PTyCCcWP51eoZ1Lqxz1jsc3FdgX3OYr8G7+7Gfu2xwr6rwOo1dolWkD7FIzHM2Dq2djErsTQEtjYjPy3WlpRi4w7Uc5h+gekT5oJF7BaIXU/EQYE48NYrL/xoTvEuL6VRIBqeiE6xNmFpzLbvFe2xlHogtgvE9hyiLiLFZ637cknRha7hkONi5cKST5+ywn9jKVbjod7zasRhCSHH9jmN/HuULWEGhZaqYzsv1niJjOi+DnFF2ZvupZbB48aFV7BR1yyq40B1WNRbB+oti5o4UBMWdelAXbIoM/NtXMtjBBj747OY0p0cATJGrr4iiAoewKqzB3M0gvFzBFHgU6o5hP8Nyr25q04zzOZxncRdjhclTzyC0lQsQ73JCrcpv05phsWgmWx5qHJ8vMO9jamac9ILz4qVPCp2TPx5EtKnX/BgtBjRfArjeUQ1M4ruZCkMv1fMe10Kw++QxWcUxctSGH6stB/fQPemwjZvgG3AbBoq65tyKIfcf5EcurxEqy56XHyqF2rMIN91IP++ejL7N3guW1SS9jHlMI7c6l9e6l8Ih7Fzbtk5jAWjJxn16lIU3JOByntNOVSHjFbRgdLD3IU+GWzTU09Gl8M4jiDi2qKce2qVQ0fvsOiNKYdxnAi57zmjSF6Xwzj6dC/tYcphHLjb0lZ5vimHena0gMydTTnUqw9oFxj3gOSYlzUmKhpRnDRRbAnFB/W7NXbMv7iO4Z7NyyJHqGcysW01T6dYy+o10vFCDF5tHKgHxhcTKwYrc0zFOptfSR3GpfV9kces8Wj5A7BiBLNfngFwe+YpaKj3JNB7p8C4xmZd5Z5p3DqLw1FyNodqqdoxGy0auXLXqFx3SrVcXmZ6a+zYIn+d09gbUkx4QJbl7HBQ+YSrGDkLHZQsxPOF2O6dmq9l66+yuOEcYliMtC6dCMmTtPo81WX1hmXju+qUZwyXPPMx4xd3m8+Ut8GcJyNfhLrUybTb6X0kuw7X1XvC7HHLzyJ6ouivLslrJHQilbNZqN4tltH4lO4N9zGdyaEMydGF5xgplqGQp2a4i4776RF5VNvfcrLRXnqHTpZz8rraH9ej+xa670CH5zhbsGI8hlITcoZjuGt6ZDlLha0ysvhI/KE4Hc3oCdZn9GnJQ2oO6W/ikoesy7LPSyxXgMbRILN0f455Ho1vLTDxWb9LH5O7lj3/XTq51efbbRrj1aO5eiemR1LXSWpEs0ae6sq7eQlSg6nzk3WKX+t7ifJCJKIP5aS+tCRLuwzoxD+mDHZIkXFKs42bHeXW9v7U/Cda0pHQZ+d4mp2Rh4zI/0WwPmU0JiP6td8d0Cfo0iOk5CN9/E5SRDeuWCdhx5iJ4xIh32ow4y0mXzYh+ZrXnl05jUWZMch1YDY3trVNDigWjEnqSHl3M7frVx9Emvck7FEiGc1Y+YTkf0p/9a8eJ8sLIwItjE8gV77O9TwyylnQRm1a5et9kG5ra/lxocNLpbVZ/4xOH5c026aMC/XB1boHkrt0L2XhKBmR3vlCG7mO1u3mIvNwzo7Y2zPK4qXf76sVGPW+R6vkMs25Fo2SPoyCcZFF6LbcLvK83HpZZXY/7vz/wm5sXbYaMkbC7OBKC3H7+zFla7aWKYxqOX5f02xyW30016pezoDG4oU1l7+D2t/AX623vvfj6ZS8wiaNAclg7oxFZE200MJP1mZJlh6ZmsvcG3lmTOpWds1N8mvp3UyOfRnMckSj5lrtWujyTTheWRyvPG3YpLNGY0Vdrz3RKZtbNNVppa+8EGnNAOYJy8xHZBqVeGhp51J+rD2Wlc/xNeody7XKcrVhttqnAfac90G65/r87P6uWN0j8ZBimy5FYDJ/6dEsTSjm0rX1mZpkQMn3lX+1Z3+LalB6hzwoMsv3OHHGyFOnLl2zQtPfqZUtIz9vPIJ+b+lKtdE+tkXlPy4gL2hO5DQvNeI+tYiV/rYe0ZxHWrFijoh2/tsUU8m4oz5ntlubZxKV4gmTb8pZZWTJTGFA9ud23vYXstd9K3+NKCecqOi6A1zhTxgZJEbvJLgjy5yeEK5y8iRBRrQd8p+Lfkqe4g0sjVZI66nY8PAxMus1Y90eW7rHum+/h5ZodfPUXS14eam3RE7eTU702rSqXagYdTp3fzOutlrlyvd1dpjMyTX2mFAbO7MwWV4Z0xKfe0uRGoVJkRgfKWG9CNE/TPMQneXplC+zbq2ZyzsN0secU77EvQeKCFd094kzmvuU6Udnga9DWJtN1nBMuBuXqf0B29PirtTthXVI1t6uXY1SayWqWik0u71aGP8tPWRM3i8V3J6NbG3r3iplKfwujGToCvlGb1V+aHN+Dhf+jYQrO9QSffYOGxDfPhBbYuc9vA3xRpXljmZENegLenO5d1v1s9yi3kZvLHab30eCv4wEbM1pn9BKGqq7ZOY1t9n9+a/IC4xEzGpvWob3wZbC92RRUkh/EvJsfG8Sob+LE9oXLcGnJ2Up/nLkuQbXizOhv9MU1gfNzvegLCFEhn6Pwe+Zm9bhsmxJ9fZalOIrQ64C+sRF4/DkrzpXMe18PNTIeiLvXwJ6h7Madr1a/K/90HKMpHBZvtJy+q7ZK4+nLtvFakcW4+HwOWOk+Yzmaon+MrOidyZacsuTcV8U9KQyqzfvnx/jUTMGtKypkPugvHYSb48io68vC54LuHTIxH/EP27x30Z4U3BU6RHCpM8pqtl0C55Nf+PS1Tv9mY9OhqdKpzKbySMa9EbsltgXD+F3q4gAQ98Old+llP8R6/7+bA9qz8h76F10uXPQorqYdj/MKVqP7tUe4+md5bX5byEvFk7WV9b+tHL/yf3lLzbVN5Q/FL8Sv4W8ZE18Jr4Qe9DfYzpX+Kv4m/j7xtLG6sZnG3+RTT+4pTC/FKWfjc3/AnoD2dI=</latexit> x2 Points

    cloud Positional encoding Token encoding Tokenize Tell the story of a C N R S r e s e a r c h e r presenting generative AI to a large audience in a funny way. Tell the story of a C N R S r e s e a r c h e r presenting generative AI to a large audience in a funny way. <latexit sha1_base64="aTL0Qvb1dLhAur6wfZM9PGylzLY=">AAA9iHictVttcxu3EYbTt1h9c9qPnelcq7iTdFyNpHqadjKaifViSbFiySYlOwltDV9O1NknHs0jJdkM/0e/tn+kv6P/oP3Uv9DdBXDAkbhbQHV1IwkH4nl2sQcsdoFjZ5gm+Xh19Z+3Pvje93/wwx99eHvpxz/56c9+fuejX5zk2WTUjY+7WZqNnnfaeZwmg/h4nIzT+PlwFLcvOmn8rPN6Cz9/dhmP8iQbNMdvh/GLi3Z/kJwl3fYYql62iGE6inuz69O10zvLqyur9BMtFtZUYVmon6Pso1+vi5boiUx0xURciFgMxBjKqWiLHK5vxZpYFUOoeyGmUDeCUkKfx2ImlgA7gVYxtGhD7Wv424e7b1XtAO6RMyd0F6Sk8DsCZCTuAiaDdiMoo7SIPp8QM9ZWcU+JE3V7C/87iusCasfiHGo5nG7pi8O+jMWZ+DP1IYE+DakGe9dVLBOyCmoeWb0aA8MQ6rDcg89HUO4SUts5IkxOfUfbtunzf1FLrMX7rmo7Ef8mLe/CFYmG6n1WMLTFJfFH9DQn8JnUJwXJfWCIVR+xdEW2vqDeD6D9FOofwzWjkrZJB64p1c5qkVtwuZBbLHIXLhdyl0UewOVCHrDII7hcyCOFROyIbO7GN+By4Rus5CdwuZBPWORTuFzIpyzyBC4X8oRFfgOXC/kNi3wIlwv5kEU+gsuFfMQim3C5kE0WeQyXC3nMInfgciF3FLJ6po7gyognYWblAyiXZaCnSKHmAavfJnlHF3bTY053K7D8rN6G/27stodN4wrsjse4O6vA8iNvF3ykG8v7oj1aTVzYPRa7DyPAjd1nsV+KVxXYLz1m2usKLD/XDqCdG8t736/gzo39isU+hpIby69Rh1Djxh56rBjDCuwRi30i3lRgfbz+qALL+/0G+BU3ll+nmtDejfXxppMKLO9PTyCCcWP51eoZ1Lqxz1jsc3FdgX3OYr8G7+7Gfu2xwr6rwOo1dolWkD7FIzHM2Dq2djErsTQEtjYjPy3WlpRi4w7Uc5h+gekT5oJF7BaIXU/EQYE48NYrL/xoTvEuL6VRIBqeiE6xNmFpzLbvFe2xlHogtgvE9hyiLiLFZ637cknRha7hkONi5cKST5+ywn9jKVbjod7zasRhCSHH9jmN/HuULWEGhZaqYzsv1niJjOi+DnFF2ZvupZbB48aFV7BR1yyq40B1WNRbB+oti5o4UBMWdelAXbIoM/NtXMtjBBj747OY0p0cATJGrr4iiAoewKqzB3M0gvFzBFHgU6o5hP8Nyr25q04zzOZxncRdjhclTzyC0lQsQ73JCrcpv05phsWgmWx5qHJ8vMO9jamac9ILz4qVPCp2TPx5EtKnX/BgtBjRfArjeUQ1M4ruZCkMv1fMe10Kw++QxWcUxctSGH6stB/fQPemwjZvgG3AbBoq65tyKIfcf5EcurxEqy56XHyqF2rMIN91IP++ejL7N3guW1SS9jHlMI7c6l9e6l8Ih7Fzbtk5jAWjJxn16lIU3JOByntNOVSHjFbRgdLD3IU+GWzTU09Gl8M4jiDi2qKce2qVQ0fvsOiNKYdxnAi57zmjSF6Xwzj6dC/tYcphHLjb0lZ5vimHena0gMydTTnUqw9oFxj3gOSYlzUmKhpRnDRRbAnFB/W7NXbMv7iO4Z7NyyJHqGcysW01T6dYy+o10vFCDF5tHKgHxhcTKwYrc0zFOptfSR3GpfV9kces8Wj5A7BiBLNfngFwe+YpaKj3JNB7p8C4xmZd5Z5p3DqLw1FyNodqqdoxGy0auXLXqFx3SrVcXmZ6a+zYIn+d09gbUkx4QJbl7HBQ+YSrGDkLHZQsxPOF2O6dmq9l66+yuOEcYliMtC6dCMmTtPo81WX1hmXju+qUZwyXPPMx4xd3m8+Ut8GcJyNfhLrUybTb6X0kuw7X1XvC7HHLzyJ6ouivLslrJHQilbNZqN4tltH4lO4N9zGdyaEMydGF5xgplqGQp2a4i4776RF5VNvfcrLRXnqHTpZz8rraH9ej+xa670CH5zhbsGI8hlITcoZjuGt6ZDlLha0ysvhI/KE4Hc3oCdZn9GnJQ2oO6W/ikoesy7LPSyxXgMbRILN0f455Ho1vLTDxWb9LH5O7lj3/XTq51efbbRrj1aO5eiemR1LXSWpEs0ae6sq7eQlSg6nzk3WKX+t7ifJCJKIP5aS+tCRLuwzoxD+mDHZIkXFKs42bHeXW9v7U/Cda0pHQZ+d4mp2Rh4zI/0WwPmU0JiP6td8d0Cfo0iOk5CN9/E5SRDeuWCdhx5iJ4xIh32ow4y0mXzYh+ZrXnl05jUWZMch1YDY3trVNDigWjEnqSHl3M7frVx9Emvck7FEiGc1Y+YTkf0p/9a8eJ8sLIwItjE8gV77O9TwyylnQRm1a5et9kG5ra/lxocNLpbVZ/4xOH5c026aMC/XB1boHkrt0L2XhKBmR3vlCG7mO1u3mIvNwzo7Y2zPK4qXf76sVGPW+R6vkMs25Fo2SPoyCcZFF6LbcLvK83HpZZXY/7vz/wm5sXbYaMkbC7OBKC3H7+zFla7aWKYxqOX5f02xyW30016pezoDG4oU1l7+D2t/AX623vvfj6ZS8wiaNAclg7oxFZE200MJP1mZJlh6ZmsvcG3lmTOpWds1N8mvp3UyOfRnMckSj5lrtWujyTTheWRyvPG3YpLNGY0Vdrz3RKZtbNNVppa+8EGnNAOYJy8xHZBqVeGhp51J+rD2Wlc/xNeody7XKcrVhttqnAfac90G65/r87P6uWN0j8ZBimy5FYDJ/6dEsTSjm0rX1mZpkQMn3lX+1Z3+LalB6hzwoMsv3OHHGyFOnLl2zQtPfqZUtIz9vPIJ+b+lKtdE+tkXlPy4gL2hO5DQvNeI+tYiV/rYe0ZxHWrFijoh2/tsUU8m4oz5ntlubZxKV4gmTb8pZZWTJTGFA9ud23vYXstd9K3+NKCecqOi6A1zhTxgZJEbvJLgjy5yeEK5y8iRBRrQd8p+Lfkqe4g0sjVZI66nY8PAxMus1Y90eW7rHum+/h5ZodfPUXS14eam3RE7eTU702rSqXagYdTp3fzOutlrlyvd1dpjMyTX2mFAbO7MwWV4Z0xKfe0uRGoVJkRgfKWG9CNE/TPMQneXplC+zbq2ZyzsN0secU77EvQeKCFd094kzmvuU6Udnga9DWJtN1nBMuBuXqf0B29PirtTthXVI1t6uXY1SayWqWik0u71aGP8tPWRM3i8V3J6NbG3r3iplKfwujGToCvlGb1V+aHN+Dhf+jYQrO9QSffYOGxDfPhBbYuc9vA3xRpXljmZENegLenO5d1v1s9yi3kZvLHab30eCv4wEbM1pn9BKGqq7ZOY1t9n9+a/IC4xEzGpvWob3wZbC92RRUkh/EvJsfG8Sob+LE9oXLcGnJ2Up/nLkuQbXizOhv9MU1gfNzvegLCFEhn6Pwe+Zm9bhsmxJ9fZalOIrQ64C+sRF4/DkrzpXMe18PNTIeiLvXwJ6h7Madr1a/K/90HKMpHBZvtJy+q7ZK4+nLtvFakcW4+HwOWOk+Yzmaon+MrOidyZacsuTcV8U9KQyqzfvnx/jUTMGtKypkPugvHYSb48io68vC54LuHTIxH/EP27x30Z4U3BU6RHCpM8pqtl0C55Nf+PS1Tv9mY9OhqdKpzKbySMa9EbsltgXD+F3q4gAQ98Old+llP8R6/7+bA9qz8h76F10uXPQorqYdj/MKVqP7tUe4+md5bX5byEvFk7WV9b+tHL/yf3lLzbVN5Q/FL8Sv4W8ZE18Jr4Qe9DfYzpX+Kv4m/j7xtLG6sZnG3+RTT+4pTC/FKWfjc3/Ajzs2dE=</latexit> x1 ˜ xi := ∑ j e⟨Qxi ,Kxj ⟩ ∑ ℓ e⟨Qxi ,Kxℓ ⟩ Vxj xi xj (Unmasked) Attention layer Tokenize xn {xi }n i=1 … next token probabilities Attention Norm MLP Classif … T ×
  10. Transformers and attention mechanism … + <latexit sha1_base64="7Z/IumRXp79HdyogVfnC2DA+LeM=">AAA9iHictVttcxu3EYbTt1h9c9qPnelcq7iTdFyNpHqadjKaifViSbFiySYlOwltDV9O1NknHs0jJdkM/0e/tn+kv6P/oP3Uv9DdBXDAkbhbQHV1IwkH4nl2sQcsdoFjZ5gm+Xh19Z+3Pvje93/wwx99eHvpxz/56c9+fuejX5zk2WTUjY+7WZqNnnfaeZwmg/h4nIzT+PlwFLcvOmn8rPN6Cz9/dhmP8iQbNMdvh/GLi3Z/kJwl3fYYql62iGE6inuz69P10zvLqyur9BMtFtZUYVmon6Pso1+vi5boiUx0xURciFgMxBjKqWiLHK5vxZpYFUOoeyGmUDeCUkKfx2ImlgA7gVYxtGhD7Wv424e7b1XtAO6RMyd0F6Sk8DsCZCTuAiaDdiMoo7SIPp8QM9ZWcU+JE3V7C/87iusCasfiHGo5nG7pi8O+jMWZ+DP1IYE+DakGe9dVLBOyCmoeWb0aA8MQ6rDcg89HUO4SUts5IkxOfUfbtunzf1FLrMX7rmo7Ef8mLe/CFYmG6n1WMLTFJfFH9DQn8JnUJwXJfWCIVR+xdEW2vqDeD6D9FOofwzWjkrZJB64p1c5qkVtwuZBbLHIXLhdyl0UewOVCHrDII7hcyCOFROyIbO7GN+By4Rus5CdwuZBPWORTuFzIpyzyBC4X8oRFfgOXC/kNi3wIlwv5kEU+gsuFfMQim3C5kE0WeQyXC3nMInfgciF3FLJ6po7gyognYWblAyiXZaCnSKHmAavfJnlHF3bTY053K7D8rN6G/27stodN4wrsjse4O6vA8iNvF3ykG8v7oj1aTVzYPRa7DyPAjd1nsV+KVxXYLz1m2usKLD/XDqCdG8t736/gzo39isU+hpIby69Rh1Djxh56rBjDCuwRi30i3lRgfbz+qALL+/0G+BU3ll+nmtDejfXxppMKLO9PTyCCcWP51eoZ1Lqxz1jsc3FdgX3OYr8G7+7Gfu2xwr6rwOo1dolWkD7FIzHM2Dq2djErsTQEtjYjPy3WlpRi4w7Uc5h+gekT5oJF7BaIXU/EQYE48NYrL/xoTvEuL6VRIBqeiE6xNmFpzLbvFe2xlHogtgvE9hyiLiLFZ637cknRha7hkONi5cKST5+ywn9jKVbjod7zasRhCSHH9jmN/HuULWEGhZaqYzsv1niJjOi+DnFF2ZvupZbB48aFV7BR1yyq40B1WNRbB+oti5o4UBMWdelAXbIoM/NtXMtjBBj747OY0p0cATJGrr4iiAoewKqzB3M0gvFzBFHgU6o5hP8Nyr25q04zzOZxncRdjhclTzyC0lQsQ73JCrcpv05phsWgmWx5qHJ8vMO9jamac9ILz4qVPCp2TPx5EtKnX/BgtBjRfArjeUQ1M4ruZCkMv1fMe10Kw++QxWcUxctSGH6stB/fQPemwjZvgG3AbBoq65tyKIfcf5EcurxEqy56XHyqF2rMIN91IP++ejL7N3guW1SS9jHlMI7c6l9e6l8Ih7Fzbtk5jAWjJxn16lIU3JOByntNOVSHjFbRgdLD3IU+GWzTU09Gl8M4jiDi2qKce2qVQ0fvsOiNKYdxnAi57zmjSF6Xwzj6dC/tYcphHLjb0lZ5vimHena0gMydTTnUqw9oFxj3gOSYlzUmKhpRnDRRbAnFB/W7NXbMv7iO4Z7NyyJHqGcysW01T6dYy+o10vFCDF5tHKgHxhcTKwYrc0zFOptfSR3GpfV9kces8Wj5A7BiBLNfngFwe+YpaKj3JNB7p8C4xmZd5Z5p3DqLw1FyNodqqdoxGy0auXLXqFx3SrVcXmZ6a+zYIn+d09gbUkx4QJbl7HBQ+YSrGDkLHZQsxPOF2O6dmq9l66+yuOEcYliMtC6dCMmTtPo81WX1hmXju+qUZwyXPPMx4xd3m8+Ut8GcJyNfhLrUybTb6X0kuw7X1XvC7HHLzyJ6ouivLslrJHQilbNZqN4tltH4lO4N9zGdyaEMydGF5xgplqGQp2a4i4776RF5VNvfcrLRXnqHTpZz8rraH9ej+xa670CH5zhbsGI8hlITcoZjuGt6ZDlLha0ysvhI/KE4Hc3oCdZn9GnJQ2oO6W/ikoesy7LPSyxXgMbRILN0f455Ho1vLTDxWb9LH5O7lj3/XTq51efbbRrj1aO5eiemR1LXSWpEs0ae6sq7eQlSg6nzk3WKX+t7ifJCJKIP5aS+tCRLuwzoxD+mDHZIkXFKs42bHeXW9v7U/Cda0pHQZ+d4mp2Rh4zI/0WwPmU0JiP6td8d0Cfo0iOk5CN9/E5SRDeuWCdhx5iJ4xIh32ow4y0mXzYh+ZrXnl05jUWZMch1YDY3trVNDigWjEnqSHl3M7frVx9Emvck7FEiGc1Y+YTkf0p/9a8eJ8sLIwItjE8gV77O9TwyylnQRm1a5et9kG5ra/lxocNLpbVZ/4xOH5c026aMC/XB1boHkrt0L2XhKBmR3vlCG7mO1u3mIvNwzo7Y2zPK4qXf76sVGPW+R6vkMs25Fo2SPoyCcZFF6LbcLvK83HpZZXY/7vz/wm5sXbYaMkbC7OBKC3H7+zFla7aWKYxqOX5f02xyW30016pezoDG4oU1l7+D2t/AX623vvfj6ZS8wiaNAclg7oxFZE200MJP1mZJlh6ZmsvcG3lmTOpWds1N8mvp3UyOfRnMckSj5lrtWujyTTheWRyvPG3YpLNGY0Vdrz3RKZtbNNVppa+8EGnNAOYJy8xHZBqVeGhp51J+rD2Wlc/xNeody7XKcrVhttqnAfac90G65/r87P6uWN0j8ZBimy5FYDJ/6dEsTSjm0rX1mZpkQMn3lX+1Z3+LalB6hzwoMsv3OHHGyFOnLl2zQtPfqZUtIz9vPIJ+b+lKtdE+tkXlPy4gL2hO5DQvNeI+tYiV/rYe0ZxHWrFijoh2/tsUU8m4oz5ntlubZxKV4gmTb8pZZWTJTGFA9ud23vYXstd9K3+NKCecqOi6A1zhTxgZJEbvJLgjy5yeEK5y8iRBRrQd8p+Lfkqe4g0sjVZI66nY8PAxMus1Y90eW7rHum+/h5ZodfPUXS14eam3RE7eTU702rSqXagYdTp3fzOutlrlyvd1dpjMyTX2mFAbO7MwWV4Z0xKfe0uRGoVJkRgfKWG9CNE/TPMQneXplC+zbq2ZyzsN0secU77EvQeKCFd094kzmvuU6Udnga9DWJtN1nBMuBuXqf0B29PirtTthXVI1t6uXY1SayWqWik0u71aGP8tPWRM3i8V3J6NbG3r3iplKfwujGToCvlGb1V+aHN+Dhf+jYQrO9QSffYOGxDfPhBbYuc9vA3xRpXljmZENegLenO5d1v1s9yi3kZvLHab30eCv4wEbM1pn9BKGqq7ZOY1t9n9+a/IC4xEzGpvWob3wZbC92RRUkh/EvJsfG8Sob+LE9oXLcGnJ2Up/nLkuQbXizOhv9MU1gfNzvegLCFEhn6Pwe+Zm9bhsmxJ9fZalOIrQ64C+sRF4/DkrzpXMe18PNTIeiLvXwJ6h7Madr1a/K/90HKMpHBZvtJy+q7ZK4+nLtvFakcW4+HwOWOk+Yzmaon+MrOidyZacsuTcV8U9KQyqzfvnx/jUTMGtKypkPugvHYSb48io68vC54LuHTIxH/EP27x30Z4U3BU6RHCpM8pqtl0C55Nf+PS1Tv9mY9OhqdKpzKbySMa9EbsltgXD+F3q4gAQ98Old+llP8R6/7+bA9qz8h76F10uXPQorqYdj/MKVqP7tUe4+md5bX5byEvFk7WV9b+tHL/yf3lLzbVN5Q/FL8Sv4W8ZE18Jr4Qe9DfYzpX+Kv4m/j7xtLG6sZnG3+RTT+4pTC/FKWfjc3/AnoD2dI=</latexit> x2 Points

    cloud Positional encoding Token encoding Tokenize Tell the story of a C N R S r e s e a r c h e r presenting generative AI to a large audience in a funny way. Tell the story of a C N R S r e s e a r c h e r presenting generative AI to a large audience in a funny way. <latexit sha1_base64="aTL0Qvb1dLhAur6wfZM9PGylzLY=">AAA9iHictVttcxu3EYbTt1h9c9qPnelcq7iTdFyNpHqadjKaifViSbFiySYlOwltDV9O1NknHs0jJdkM/0e/tn+kv6P/oP3Uv9DdBXDAkbhbQHV1IwkH4nl2sQcsdoFjZ5gm+Xh19Z+3Pvje93/wwx99eHvpxz/56c9+fuejX5zk2WTUjY+7WZqNnnfaeZwmg/h4nIzT+PlwFLcvOmn8rPN6Cz9/dhmP8iQbNMdvh/GLi3Z/kJwl3fYYql62iGE6inuz69O10zvLqyur9BMtFtZUYVmon6Pso1+vi5boiUx0xURciFgMxBjKqWiLHK5vxZpYFUOoeyGmUDeCUkKfx2ImlgA7gVYxtGhD7Wv424e7b1XtAO6RMyd0F6Sk8DsCZCTuAiaDdiMoo7SIPp8QM9ZWcU+JE3V7C/87iusCasfiHGo5nG7pi8O+jMWZ+DP1IYE+DakGe9dVLBOyCmoeWb0aA8MQ6rDcg89HUO4SUts5IkxOfUfbtunzf1FLrMX7rmo7Ef8mLe/CFYmG6n1WMLTFJfFH9DQn8JnUJwXJfWCIVR+xdEW2vqDeD6D9FOofwzWjkrZJB64p1c5qkVtwuZBbLHIXLhdyl0UewOVCHrDII7hcyCOFROyIbO7GN+By4Rus5CdwuZBPWORTuFzIpyzyBC4X8oRFfgOXC/kNi3wIlwv5kEU+gsuFfMQim3C5kE0WeQyXC3nMInfgciF3FLJ6po7gyognYWblAyiXZaCnSKHmAavfJnlHF3bTY053K7D8rN6G/27stodN4wrsjse4O6vA8iNvF3ykG8v7oj1aTVzYPRa7DyPAjd1nsV+KVxXYLz1m2usKLD/XDqCdG8t736/gzo39isU+hpIby69Rh1Djxh56rBjDCuwRi30i3lRgfbz+qALL+/0G+BU3ll+nmtDejfXxppMKLO9PTyCCcWP51eoZ1Lqxz1jsc3FdgX3OYr8G7+7Gfu2xwr6rwOo1dolWkD7FIzHM2Dq2djErsTQEtjYjPy3WlpRi4w7Uc5h+gekT5oJF7BaIXU/EQYE48NYrL/xoTvEuL6VRIBqeiE6xNmFpzLbvFe2xlHogtgvE9hyiLiLFZ637cknRha7hkONi5cKST5+ywn9jKVbjod7zasRhCSHH9jmN/HuULWEGhZaqYzsv1niJjOi+DnFF2ZvupZbB48aFV7BR1yyq40B1WNRbB+oti5o4UBMWdelAXbIoM/NtXMtjBBj747OY0p0cATJGrr4iiAoewKqzB3M0gvFzBFHgU6o5hP8Nyr25q04zzOZxncRdjhclTzyC0lQsQ73JCrcpv05phsWgmWx5qHJ8vMO9jamac9ILz4qVPCp2TPx5EtKnX/BgtBjRfArjeUQ1M4ruZCkMv1fMe10Kw++QxWcUxctSGH6stB/fQPemwjZvgG3AbBoq65tyKIfcf5EcurxEqy56XHyqF2rMIN91IP++ejL7N3guW1SS9jHlMI7c6l9e6l8Ih7Fzbtk5jAWjJxn16lIU3JOByntNOVSHjFbRgdLD3IU+GWzTU09Gl8M4jiDi2qKce2qVQ0fvsOiNKYdxnAi57zmjSF6Xwzj6dC/tYcphHLjb0lZ5vimHena0gMydTTnUqw9oFxj3gOSYlzUmKhpRnDRRbAnFB/W7NXbMv7iO4Z7NyyJHqGcysW01T6dYy+o10vFCDF5tHKgHxhcTKwYrc0zFOptfSR3GpfV9kces8Wj5A7BiBLNfngFwe+YpaKj3JNB7p8C4xmZd5Z5p3DqLw1FyNodqqdoxGy0auXLXqFx3SrVcXmZ6a+zYIn+d09gbUkx4QJbl7HBQ+YSrGDkLHZQsxPOF2O6dmq9l66+yuOEcYliMtC6dCMmTtPo81WX1hmXju+qUZwyXPPMx4xd3m8+Ut8GcJyNfhLrUybTb6X0kuw7X1XvC7HHLzyJ6ouivLslrJHQilbNZqN4tltH4lO4N9zGdyaEMydGF5xgplqGQp2a4i4776RF5VNvfcrLRXnqHTpZz8rraH9ej+xa670CH5zhbsGI8hlITcoZjuGt6ZDlLha0ysvhI/KE4Hc3oCdZn9GnJQ2oO6W/ikoesy7LPSyxXgMbRILN0f455Ho1vLTDxWb9LH5O7lj3/XTq51efbbRrj1aO5eiemR1LXSWpEs0ae6sq7eQlSg6nzk3WKX+t7ifJCJKIP5aS+tCRLuwzoxD+mDHZIkXFKs42bHeXW9v7U/Cda0pHQZ+d4mp2Rh4zI/0WwPmU0JiP6td8d0Cfo0iOk5CN9/E5SRDeuWCdhx5iJ4xIh32ow4y0mXzYh+ZrXnl05jUWZMch1YDY3trVNDigWjEnqSHl3M7frVx9Emvck7FEiGc1Y+YTkf0p/9a8eJ8sLIwItjE8gV77O9TwyylnQRm1a5et9kG5ra/lxocNLpbVZ/4xOH5c026aMC/XB1boHkrt0L2XhKBmR3vlCG7mO1u3mIvNwzo7Y2zPK4qXf76sVGPW+R6vkMs25Fo2SPoyCcZFF6LbcLvK83HpZZXY/7vz/wm5sXbYaMkbC7OBKC3H7+zFla7aWKYxqOX5f02xyW30016pezoDG4oU1l7+D2t/AX623vvfj6ZS8wiaNAclg7oxFZE200MJP1mZJlh6ZmsvcG3lmTOpWds1N8mvp3UyOfRnMckSj5lrtWujyTTheWRyvPG3YpLNGY0Vdrz3RKZtbNNVppa+8EGnNAOYJy8xHZBqVeGhp51J+rD2Wlc/xNeody7XKcrVhttqnAfac90G65/r87P6uWN0j8ZBimy5FYDJ/6dEsTSjm0rX1mZpkQMn3lX+1Z3+LalB6hzwoMsv3OHHGyFOnLl2zQtPfqZUtIz9vPIJ+b+lKtdE+tkXlPy4gL2hO5DQvNeI+tYiV/rYe0ZxHWrFijoh2/tsUU8m4oz5ntlubZxKV4gmTb8pZZWTJTGFA9ud23vYXstd9K3+NKCecqOi6A1zhTxgZJEbvJLgjy5yeEK5y8iRBRrQd8p+Lfkqe4g0sjVZI66nY8PAxMus1Y90eW7rHum+/h5ZodfPUXS14eam3RE7eTU702rSqXagYdTp3fzOutlrlyvd1dpjMyTX2mFAbO7MwWV4Z0xKfe0uRGoVJkRgfKWG9CNE/TPMQneXplC+zbq2ZyzsN0secU77EvQeKCFd094kzmvuU6Udnga9DWJtN1nBMuBuXqf0B29PirtTthXVI1t6uXY1SayWqWik0u71aGP8tPWRM3i8V3J6NbG3r3iplKfwujGToCvlGb1V+aHN+Dhf+jYQrO9QSffYOGxDfPhBbYuc9vA3xRpXljmZENegLenO5d1v1s9yi3kZvLHab30eCv4wEbM1pn9BKGqq7ZOY1t9n9+a/IC4xEzGpvWob3wZbC92RRUkh/EvJsfG8Sob+LE9oXLcGnJ2Up/nLkuQbXizOhv9MU1gfNzvegLCFEhn6Pwe+Zm9bhsmxJ9fZalOIrQ64C+sRF4/DkrzpXMe18PNTIeiLvXwJ6h7Madr1a/K/90HKMpHBZvtJy+q7ZK4+nLtvFakcW4+HwOWOk+Yzmaon+MrOidyZacsuTcV8U9KQyqzfvnx/jUTMGtKypkPugvHYSb48io68vC54LuHTIxH/EP27x30Z4U3BU6RHCpM8pqtl0C55Nf+PS1Tv9mY9OhqdKpzKbySMa9EbsltgXD+F3q4gAQ98Old+llP8R6/7+bA9qz8h76F10uXPQorqYdj/MKVqP7tUe4+md5bX5byEvFk7WV9b+tHL/yf3lLzbVN5Q/FL8Sv4W8ZE18Jr4Qe9DfYzpX+Kv4m/j7xtLG6sZnG3+RTT+4pTC/FKWfjc3/Ajzs2dE=</latexit> x1 ˜ xi := ∑ j e⟨Qxi ,Kxj ⟩ ∑ ℓ e⟨Qxi ,Kxℓ ⟩ Vxj xi xj (Unmasked) Attention layer Tokenize xn {xi }n i=1 Arbitrary number of Tokens Layers n → + ∞ T → + ∞ … next token probabilities Attention Norm MLP Classif … T ×
  11. Mean-field Attentions Γθ [X](x) := ∑ j e⟨Qx,Kxj ⟩ ∑

    ℓ e⟨Qx,Kxℓ ⟩ Vxj Γθ [X] X ˜ X Parameters: θ := (Q, K, V) Tokens: points X := {xi }n i=1
  12. Mean-field Attentions Γθ [X](x) := ∑ j e⟨Qx,Kxj ⟩ ∑

    ℓ e⟨Qx,Kxℓ ⟩ Vxj Γθ [X] X ˜ X μ Γθ [μ] ξ Γθ [μ](x) := ∫ e⟨Qx,Ky⟩ ∫ e⟨Qx,Ky′  ⟩dμ(y) Vy dμ(y) μ = 1 n ∑n i=1 δxi Parameters: θ := (Q, K, V) Tokens: points X := {xi }n i=1
  13. Mean-field Attentions Γθ [X](x) := ∑ j e⟨Qx,Kxj ⟩ ∑

    ℓ e⟨Qx,Kxℓ ⟩ Vxj Γθ [X] X ˜ X Γ˜ θ [ ˜ X] μ Γθ [μ] ξ Γθ′  [ξ] Γθ [μ](x) := ∫ e⟨Qx,Ky⟩ ∫ e⟨Qx,Ky′  ⟩dμ(y) Vy dμ(y) μ = 1 n ∑n i=1 δxi Parameters: θ := (Q, K, V) Tokens: points X := {xi }n i=1 Transformer composition of attentions and MLPs. ≡ Γ
  14. W2 (μ, ν)2 := min T n ∑ i=1 ∥xi

    − yT(i) ∥2 Optimal Transport (Wasserstein) Distance ∥xi − yj ∥2 xi yj T Monge 1784
  15. = inf T♯ μ=ν ∫ ∥x − T(x)∥2dμ(x) μ ν

    T W2 (μ, ν)2 := min T n ∑ i=1 ∥xi − yT(i) ∥2 Optimal Transport (Wasserstein) Distance ∥xi − yj ∥2 xi yj T Monge 1784
  16. = inf T♯ μ=ν ∫ ∥x − T(x)∥2dμ(x) μ ν

    T W2 (μ, ν)2 := min T n ∑ i=1 ∥xi − yT(i) ∥2 Optimal Transport (Wasserstein) Distance ∥xi − yj ∥2 xi yj T Monge 1784 General measures: Kantorovitch relaxation Approximation by discrete measures or Kantorovitch 1942
  17. Universal Approximation Γθ [μ](x) := x + H ∑ h=1

    ∫ e⟨Qhx,Khy⟩ ∫ e⟨Qhx,Khy′  ⟩dμ(y′  ) Vhy dμ(y) Theorem [Furuya, de Hoop, Peyré]: Let be -continuous on a compact . Γ⋆ : 𝒫 (Ω) × Ω → ℝd Wass2 × ℓ2 Ω ⊂ ℝd Γθ [μ](x) := MLPθ (x) or
  18. Universal Approximation Γθ [μ](x) := x + H ∑ h=1

    ∫ e⟨Qhx,Khy⟩ ∫ e⟨Qhx,Khy′  ⟩dμ(y′  ) Vhy dμ(y) Theorem [Furuya, de Hoop, Peyré]: Let be -continuous on a compact . Γ⋆ : 𝒫 (Ω) × Ω → ℝd Wass2 × ℓ2 Ω ⊂ ℝd Γθ [μ](x) := MLPθ (x) or For any there exists and such that ε N (θ1 , …, θN ) ∀(μ, x) ∈ 𝒫 (Ω) × Ω, |Γ⋆[μ](x) − ΓθN ⋄ ⋯ ⋄ Γθ1 [μ](x)| ≤ ε with and . token dimensions ≤ 4d H ≤ d
  19. Universal Approximation Γθ [μ](x) := x + H ∑ h=1

    ∫ e⟨Qhx,Khy⟩ ∫ e⟨Qhx,Khy′  ⟩dμ(y′  ) Vhy dμ(y) Theorem [Furuya, de Hoop, Peyré]: Let be -continuous on a compact . Γ⋆ : 𝒫 (Ω) × Ω → ℝd Wass2 × ℓ2 Ω ⊂ ℝd Γθ [μ](x) := MLPθ (x) or For any there exists and such that ε N (θ1 , …, θN ) ∀(μ, x) ∈ 𝒫 (Ω) × Ω, |Γ⋆[μ](x) − ΓθN ⋄ ⋯ ⋄ Γθ1 [μ](x)| ≤ ε with and . token dimensions ≤ 4d H ≤ d fixed dimensions, arbitrary # tokens. Novelties: Previous works: [Yun, Bhojanapalli, Singh Rawat, Reddi, Kumar, 2019] , dimension #tokens → H = 2 ∼ [Agrachev, Letrouit 2019] abstract genericity hypothesis (Lie algebra/control) → Discrete tokens: transformers are universal Turing machines: e.g. [Elhage et al 2021] [Geshkovski, Rigollet, Ruiz-Balet, 2024] Universal interpolation. →
  20. Infinite Depth as a Neural PDE Γθ [μ](x) := ∫

    e⟨Qx,Ky⟩ ∫ e⟨Qx,Ky′  ⟩dμ(y′  ) Vy dμ(y) xi (t + 1) = xi (t) + 1 T Γθ(t) [μ(t)](xi (t)) μ(t) = 1 n ∑n i=1 δxi (t) θ = (Q, K, V)
  21. Infinite Depth as a Neural PDE Γθ [μ](x) := ∫

    e⟨Qx,Ky⟩ ∫ e⟨Qx,Ky′  ⟩dμ(y′  ) Vy dμ(y) xi (t + 1) = xi (t) + 1 T Γθ(t) [μ(t)](xi (t)) μ(t) = 1 n ∑n i=1 δxi (t) θ = (Q, K, V) T → + ∞ Infinite depth dxi dt (t) = Γθ(t) [μ(t)](xi (t)) Coupled EDOs
  22. Infinite Depth as a Neural PDE Γθ [μ](x) := ∫

    e⟨Qx,Ky⟩ ∫ e⟨Qx,Ky′  ⟩dμ(y′  ) Vy dμ(y) xi (t + 1) = xi (t) + 1 T Γθ(t) [μ(t)](xi (t)) μ(t) = 1 n ∑n i=1 δxi (t) θ = (Q, K, V) T → + ∞ Infinite depth dxi dt (t) = Γθ(t) [μ(t)](xi (t)) Coupled EDOs dμ dt + div(μΓθ [μ] ) = 0 Transformer PDE [Sander, Ablin, Blondel, Peyré, 2022] Mean field Michael Sander
  23. Infinite Depth as a Neural PDE Γθ [μ](x) := ∫

    e⟨Qx,Ky⟩ ∫ e⟨Qx,Ky′  ⟩dμ(y′  ) Vy dμ(y) xi (t + 1) = xi (t) + 1 T Γθ(t) [μ(t)](xi (t)) μ(t) = 1 n ∑n i=1 δxi (t) θ = (Q, K, V) T → + ∞ Infinite depth dxi dt (t) = Γθ(t) [μ(t)](xi (t)) Coupled EDOs dμ dt + div(μΓθ [μ] ) = 0 Transformer PDE [Sander, Ablin, Blondel, Peyré, 2022] Mean field Michael Sander [Geshkovski, Letrouit, Polyanskiy, Rigollet 2023] The attention matrix converges to low-rank. → Clustering of for un-normalized attention. → μ
  24. Gaussian Case and Clustering dμ dt + div(μΓθ [μ]) =

    0 Theorem [Valérie Castin]: If , μ(0) = 𝒩 (m(0), Σ(0)) · m = V(Id+ΣQ⊤K)m · Σ = VΣQ⊤KΣ + ΣK⊤QΣV⊤ Γθ [μ](x) := ∫ e⟨Qx,Ky⟩ ∫ e⟨Qx,Ky′  ⟩dμ(y′  ) Vy dμ(y) θ(t) = (Q(t), K(t), V(t)) then μ(s) = 𝒩 (m(s), Σ(s)) t μ(0) μ(t)
  25. Gaussian Case and Clustering dμ dt + div(μΓθ [μ]) =

    0 Theorem [Valérie Castin]: If , μ(0) = 𝒩 (m(0), Σ(0)) · m = V(Id+ΣQ⊤K)m · Σ = VΣQ⊤KΣ + ΣK⊤QΣV⊤ Γθ [μ](x) := ∫ e⟨Qx,Ky⟩ ∫ e⟨Qx,Ky′  ⟩dμ(y′  ) Vy dμ(y) θ(t) = (Q(t), K(t), V(t)) then μ(s) = 𝒩 (m(s), Σ(s)) t μ(0) μ(t) … t μ(0) μ(∞) Theorem [Valérie Castin]: If and symmetric, stationary points of have rank less than V(t) = Id K(t)⊤Q(t) Σ(t) d/2. Valérie Castin
  26. Open Problems Expressivity Training: Why is Adam normalization needed for

    training? Quantitative approximation bounds? Generalization: What are « optimal » transformed implementing? Memorizing vs Reasoning