Original paper:
Luohe, S., Hongyi, Z., Yao, Y., Zuchao, L., & Hai, Z. (2024). Keep the Cost Down: A Review on Methods to Optimize LLM's KV-Cache Consumption. arXiv preprint arXiv:2407.18003.
https://arxiv.org/abs/2407.18003
Presented in Seminar 2024/1 at Chulalongkorn University.