Satinder P. Singh. โEligibility Traces for Off-Policy Policy Evaluation.โ ICML, 2000. https://scholarworks.umass.edu/cgi/viewcontent.cgi?article=1079&context=cs_faculty_pubs [Strehl+, 10] Alex Strehl, John Langford, Sham Kakade, and Lihong Li. โLearning from Logged Implicit Exploration Data.โ NeurIPS, 2010. https://arxiv.org/abs/1003.0120 [Li+, 18] Shuai Li, Yasin Abbasi-Yadkori, Branislav Kveton, S. Muthukrishnan, Vishwa Vinay, and Zheng Wen. โOffline Evaluation of Ranking Policies with Click Models.โ KDD, 2018. https://arxiv.org/abs/1804.10488 [McInerney+, 20] James McInerney, Brian Brost, Praveen Chandar, Rishabh Mehrotra, and Ben Carterette. โCounterfactual Evaluation of Slate Recommendations with Sequential Reward Interactions.โ KDD, 2020. https://arxiv.org/abs/2007.12986 [Dudรญk+, 14] Miroslav Dudรญk, Dumitru Erhan, John Langford, and Lihong Li. โDoubly Robust Policy Evaluation and Optimization.โ ICML, 2011. https://arxiv.org/abs/1503.02834 [Jiang&Li, 16] Nan Jiang and Lihong Li. โDoubly Robust Off-policy Value Evaluation for Reinforcement Learning.โ ICML, 2016. https://arxiv.org/abs/1511.03722 February 2022 Cascade Doubly Robust Off-Policy Evaluation @ WSDM2022 87