al. "Webarena: A realistic web environment for building autonomous agents." In ICLR 2025. • Xie, Tianbao, et al. "Osworld: Benchmarking multimodal agents for open-ended tasks in real computer environments." in NeurIPS 2024. • Rawles, Christopher, et al. "Androidworld: A dynamic benchmarking environment for autonomous agents." In ICLR 2025. • Li, Wei, et al. "On the effects of data scale on ui control agents." In NeurIPS 2024. • Qin, Yujia, et al. "Ui-tars: Pioneering automated gui interaction with native agents." arXiv preprint arXiv:2501.12326 (2025). 4