Vision Language Action Models Rokas Bendikas1, Daniel Dijkman2, Markus Peschl2, Sanjay Haresh2, Pietro Mazzaglia2 1Centre for Artificial Intelligence, UCL, 2Qualcomm AI Research Bendikas, R., Dijkman, D., Peschl, M., Haresh, S., & Mazzaglia, P. Focusing on What Matters: Object-Agent-centric Tokenization for Vision Language Action models. In 9th Annual Conference on Robot Learning. CoRL 2025