Upgrade to PRO for Only $50/Yearโ€”Limited-Time Offer! ๐Ÿ”ฅ

Adaptive Experimental Design for Efficient Ave...

Avatar for MasaKat0 MasaKat0
July 08, 2025

Adaptive Experimental Design for Efficient Average Treatment Effect Estimation and Treatmentย Choice

2025 CUHK-Shenzhen Econometrics Workshop

Avatar for MasaKat0

MasaKat0

July 08, 2025
Tweet

More Decks by MasaKat0

Other Decks in Research

Transcript

  1. 1 Adaptive Experimental Design for Efficient Average Treatment Effect Estimation

    and Treatment Choice 2025 CUHK-Shenzhen Econometrics Workshop Masahiro Kato Osaka Metropolitan University Mizuho-DL Financial Technology Co., Ltd.
  2. 2 Experimental approach for causal inference โ—ผ The gold standard

    for causal inference is randomized controlled trials (RCTs). โ€ข We randomly allocate treatments for experimental units. โ—ผ RCTs are the gold standard but often costly and inefficient. โ†’ We want to design more efficient experiments (in some sense). โžข Adaptive experiment design. โ€ข We can update treatment-allocation during an experiment to gain efficiency. โ€ข How do we define an ideal treatment-allocation probability (propensity score)?
  3. 3 Table of contents โ—ผ In this talk, we explain

    how we design adaptive experiments for average treatment effect estimation and treatment choice. 1. General approach for adaptive experimental design. 2. Adaptive experimental design for ATE estimation. 3. Adaptive experimental design for treatment choice. 4. Adaptive experimental design for policy learning. 5. Technical issues in experimental design for treatment choice
  4. 4 Papers this talk is based on โ€ข Kato, M.,

    Ishihara, T., Honda, J., and Narita, Y. (2020) E๏ฌƒcient adaptive experimental design for average treatment e๏ฌ€ect estimation Revised and Resubmit for JASA. โ€ข Kato, M. Minimax and Bayes optimal best-arm identification: adaptive experimental design for treatment choice Preprint โ€ข Komiyama, J., Ariu, K., Kato, M., and Qin, C. (2021). Rate-optimal Bayesian simple regret in best arm identification Mathematics of Operations Research. โ€ข Kato, M., Okumura, K., Ishihara, T., and Kitagawa, T. (2024) Adaptive experimental design for policy learning. Preprint. โ€ข Ariu, K., Kato, M., Komiyama, J., McAlinn, K., and Qin C. (2025). A comment on โ€œadaptive treatment assignment in experiments for policy choice.โ€
  5. 6 How to Design Adaptive Experiments? โ—ผ In many studies,

    adaptive experiment are designed with the following steps: Step 1. Define the goal of causal inference and the performance measure. Step 2. Compute a lower bound and an ideal treatment-allocation probability. Step 3. Adaptive experiment Allocate treatment arms while estimating the ideal treatment-allocation probability. Return a target of interest using the observation in the adaptive experiment. Step 4. Investigate the performance of the designed experiment and confirm the optimality.
  6. 7 Step 1: goals and performance measures โ—ผ Goal 1:

    Average treatment effect (ATE) estimation. * The number of treatment arms is usually two (binary treatments). โ€ข Goal. Estimate the ATE. โ€ข Performance measure. The (Asymptotic) Variance. โ€ข Smaller asymptotic variance is better. โ—ผ Goal 2: Treatment choice, also known as best-arm identification. * The number of treatments is more than or equal to two (multiple treatments). โ€ข Goal. Choose the best treatment = a treatment whose expected outcome is the highest. โ€ข Performance measure. The probability of misidentifying the best treatment or the regret.
  7. 8 Step 2: lower bounds and ideal allocation probability โ—ผ

    After deciding the goal and performance metric, we develop a lower bound. โ—ผ ATE estimation. โ€ข Semiparametric efficiency bound (Hahn, 1998). โ—ผ Best-arm identification. โ€ข Various lower bounds have been proposed. No consensus for which one we should use. โœ“ Lower bounds are often functions of treatment-allocation probabilities (propensity score). โ†’ We can minimize the lower bound regarding treatment-allocation probabilities. We refer to probabilities that minimize lower bounds as an ideal treatment-allocation probabilities.
  8. 9 Step 3: adaptive experiment โ—ผ Ideal allocation probabilities usually

    depend on unknown parameters of a distribution. โ€ข We estimate the probabilities (unknown parameters) during an experiment. โ—ผ We run an adaptive experiment, which consists of the following two phases. 1. Treatment-allocation phase: in each round ๐‘ก = 1,2, โ€ฆ , ๐‘‡: โ€ข Estimate the ideal treatment-allocation probability based on past observations. โ€ข Allocate treatment following the estimated ideal treatment-allocation probability. 2. Decision-making phase: at the end of the experiment: โ€ข Return an estimate of a target of interest.
  9. 10 Step 4: upper bound and optimality โ—ผ For the

    designed experiment, we investigate its theoretical performance. โžข ATE estimation. โ€ข We prove the asymptotic normality and check the asymptotic variance. โ€ข If the asymptotic variance matches the efficiency bound minimized for the treatment- allocation probability, the design is (asymptotically) optimal. โžข Best-arm identification. โ€ข We investigate the probability of misidentifying the best arm or the regret. โ€ข Distribution-dependent, minimax and Bayes optimality.
  10. 11 More details (Section 2) I explain the details of

    adaptive experimental design for ATE estimation. (Section 3) Then, I discuss how to design adaptive experiment for treatment choice. (Section 4) I introduce adaptive experimental design for policy learning. (Section 5) I discuss technical issues in adaptive experimental design for treatment choice.
  11. 13 Setup โ—ผ Binary treatments, 1 and 0. โ—ผ Sample

    size, ๐‘‡. Each unit is indexed by 1,2, โ€ฆ , ๐‘‡. โ—ผ Potential outcome, ๐‘Œ1,๐‘ก , ๐‘Œ0,๐‘ก โˆˆ โ„๏ผŽ โ€ข ๐œ‡๐‘Ž (๐‘‹) and ๐œŽ๐‘Ž 2(๐‘‹): conditional mean and variance of ๐‘Œ๐‘Ž given ๐‘‹. โ—ผ ๐‘‘-dimensional covariates, ๐‘‹๐‘ก โˆˆ ๐’ณ โŠ‚ โ„๐‘‘. E.g., Age, occupation, etc. โ—ผ Average treatment effect, ๐œ = ๐”ผ[๐‘Œ1 โˆ’ ๐‘Œ0 ].
  12. 14 Adaptive experiment 1. Treatment-allocation phase: in each round ๐‘ก

    = 1,2, โ€ฆ , ๐‘‡: โ€ข Observe covariates ๐‘‹๐‘ก . โ€ข Allocate treatment ๐ด๐‘ก โˆˆ {1,0} based on ๐‘‹๐‘  , ๐ด๐‘  , ๐‘Œ๐‘  ๐‘ =1 ๐‘กโˆ’1 and ๐‘‹๐‘ก . โ€ข Observe the outcome ๐‘Œ๐‘ก = 1 ๐ด๐‘ก = 1 ๐‘Œ1,๐‘ก + 1 ๐ด๐‘ก = 0 ๐‘Œ0,๐‘ก . 2. Decision-making phase: at the end of the experiment (after observing ๐‘‹๐‘ก , ๐ด๐‘ก , ๐‘Œ๐‘ก ๐‘ก=1 ๐‘‡ ). โ€ข Estimate the ATE ๐œ = ๐”ผ ๐‘Œ1 โˆ’ ๐”ผ ๐‘Œ0 . Vโ‰ˆ Round ๐’• Unit ๐‘ก with Covariates ๐‘‹๐‘ก Treatment ๐ด๐‘ก Covariate ๐‘‹๐‘ก Outcome ๐‘Œ๐‘ก After round ๐‘ป โ€ข Estimate the ATE.
  13. 15 Performance measure and lower bound โ—ผ We aim to

    construct an asymptotically normal estimator ฦธ ๐œ๐‘› with a smaller asymptotic variance: ๐‘‡ ฦธ ๐œ๐‘‡ โˆ’ ๐œ โ†’๐‘‘ ๐’ฉ 0, ๐‘‰ as ๐‘› โ†’ โˆž. โ—ผ Efficiency bound for ATE estimation (Hahn, 1998): โ€ข Assume that observations are i.i.d. with a fixed treatment-allocation probability ๐‘ค๐‘Ž (๐‘ฅ). โ€ข Then, the efficiency bound is given by ๐‘‰ ๐‘ค โ‰” ๐”ผ ๐œŽ1 2(๐‘‹) ๐‘ค1 (๐‘‹) + ๐œŽ0 2(๐‘‹) ๐‘ค0 (๐‘‹) + ๐œ ๐‘‹ โˆ’ ๐œ 2 . โ€ข ๐œ ๐‘ฅ is the conditional ATE defined as ๐œ ๐‘‹ โ‰” ๐”ผ ๐‘Œ1 ๐‘‹ โˆ’ ๐”ผ ๐‘Œ0 ๐‘‹ = ๐œ‡1 ๐‘‹ โˆ’ ๐œ‡0 (๐‘‹).
  14. 16 Ideal treatment-allocation probability โ—ผ The efficiency bound is a

    functional of the treatment-allocation probability. โ€ข The bound can be further minimized for the probability (Hahn, Hirano, and Karlan, 2011): ๐‘คโˆ— โ‰” arg min ๐‘ค ๐‘‰ ๐‘ค = arg min ๐‘ค ๐”ผ ๐œŽ1 2(๐‘‹) ๐‘ค1 (๐‘‹) + ๐œŽ0 2(๐‘‹) ๐‘ค0 (๐‘‹) + ๐œ ๐‘‹ โˆ’ ๐œ 2 . โ—ผ Neyman allocation (Neyman, 1932; Hahn, Hirano, and Karlan, 2011). โ€ข ๐‘คโˆ— has the following closed-form solution, which is called the Neyman allocation: ๐‘ค1 โˆ—(๐‘ฅ) = ๐œŽ1 ๐‘ฅ ๐œŽ1 ๐‘ฅ + ๐œŽ0 ๐‘ฅ , ๐‘ค0 โˆ—(๐‘ฅ) = ๐œŽ0 (๐‘ฅ) ๐œŽ1 (๐‘ฅ) + ๐œŽ0 (๐‘ฅ) . โ€ข Allocate treatments with a ratio of the standard deviations. โ€ข Estimate the variances and ๐‘คโˆ— during an experiment. Vโ‰ˆ
  15. 17 Adaptive experimental design for ATE estimation Kato, Ishihara, Honda,

    and Narita. (2020). โ€œEfficient Average Treatment Effect Estimation via Adaptive Experiments.โ€ Preprint. 1. Treatment-assignment phase: in each round ๐‘ก = 1,2, โ€ฆ , ๐‘‡: I. Obtain estimator เทœ ๐œŽ๐‘Ž,๐‘ก 2 (๐‘‹๐‘ก ) of ๐œŽ๐‘Ž 2(๐‘‹๐‘ก ) using past observations ๐‘Œ๐‘  , ๐ด๐‘  , ๐‘‹๐‘  ๐‘ =1 ๐‘กโˆ’1. II. Assign ๐ด๐‘ก โˆผ ๐‘ค๐‘Ž,๐‘ก (๐‘‹๐‘ก ) = เทœ ๐œŽ๐‘Ž,๐‘ก 2 (๐‘‹๐‘ก ) / เทœ ๐œŽ1,๐‘ก 2 (๐‘‹๐‘ก ) + เทœ ๐œŽ0,๐‘ก 2 (๐‘‹๐‘ก ) . 2. Decision-making phase: at the end of the experiment: โ€ข Adaptive Augmented Inverse Probability Weighting (A2IPW) estimator: ฦธ ๐œ๐‘‡ A2IPW = 1 ๐‘‡ เท ๐‘ก=1 ๐‘‡ 1 ๐ด๐‘ก = 1 ๐‘Œ๐‘ก โˆ’ เทœ ๐œ‡1,๐‘ก (๐‘‹๐‘ก ) ๐‘ค1,๐‘ก ๐‘‹๐‘ก โˆ’ 1 ๐ด๐‘ก = 0 ๐‘Œ๐‘ก โˆ’ เทœ ๐œ‡0,๐‘ก (๐‘‹๐‘ก ) ๐‘ค0,๐‘ก ๐‘‹๐‘ก + เทœ ๐œ‡1,๐‘ก (๐‘‹๐‘ก ) โˆ’ เทœ ๐œ‡0,๐‘ก (๐‘‹๐‘ก ) . โ€ข เทœ ๐œ‡๐‘Ž,๐‘ก (๐‘‹๐‘ก ) is an estimator of ๐œ‡๐‘Ž (๐‘‹๐‘ก ) using past observations ๐‘Œ๐‘  , ๐ด๐‘  , ๐‘‹๐‘  ๐‘ =1 ๐‘กโˆ’1. Vโ‰ˆ
  16. 18 Batch and sequential design โ—ผ Batch design (Hahn, Hirano,

    and Karlan (2011)): โ€ข Update the treatment-allocation probability only at certain rounds. โ€ข E.g., Two-stage experiment: Split 1000 rounds into the first 300 rounds and the next 700 rounds. The first stage is pilot phase for estimating the variances. In the second stage, we allocate the treatments following the estimated probability. โ—ผ Sequential design (ours): โ€ข Update the treatment-assignment probability at every rounds. Our paper is a sequential version of Hahn, Hirano, and Karlan (2011). ๐‘‡ ๐‘ก = 1 Sequential design Batch design
  17. 19 Asymptotic Normality and efficiency โ—ผ Rewrite the A2IPW estimator

    as ฦธ ๐œ๐‘‡ A2IPW = 1 ๐‘‡ ฯƒ๐‘ก=1 ๐‘‡ ๐œ“๐‘ก . โ€ข ๐œ“๐‘ก โ‰” 1 ๐ด๐‘ก=1 ๐‘Œ๐‘กโˆ’เท ๐œ‡1,๐‘ก(๐‘‹๐‘ก) ๐‘ค1,๐‘ก ๐‘‹๐‘ก โˆ’ 1 ๐ด๐‘ก=0 ๐‘Œ๐‘กโˆ’เท ๐œ‡0,๐‘ก(๐‘‹๐‘ก) ๐‘ค0,๐‘ก(๐‘‹๐‘ก) + เทœ ๐œ‡1,๐‘ก (๐‘‹๐‘ก ) โˆ’ เทœ ๐œ‡0,๐‘ก (๐‘‹๐‘ก ). โ—ผ ๐œ“๐‘ก ๐‘ก=1 ๐‘‡ is a martingale difference sequence. โ€ข Under suitable conditions, we can apply the martingale central limit theorem. โ€ข We can address the sample dependency problem occurred by the adaptive sampling. โ€ข m (Asymptotic normality of the A2IPW estimator): โ€ข ๐‘ค๐‘Ž,๐‘ก (๐‘‹) โˆ’ ๐‘ค0 โˆ—(๐‘‹) โ†’ 0 and เทœ ๐œ‡๐‘Ž,๐‘ก (๐‘‹) โ†’ ๐œ‡๐‘Ž (๐‘‹๐‘ก ) as ๐‘ก โ†’ โˆž almost surely. โ€ข Then, it holds that ๐‘‡ ฦธ ๐œ๐‘‡ A2IPW โˆ’ ๐œ โ†’ ๐‘‘ ๐’ฉ 0, ๐‘‰ ๐œ‹โˆ— ๐‘Ž๐‘  ๐‘‡ โ†’ โˆž Theorem (Asymptotic normality of the A2IPW estimator) Vโ‰ˆ
  18. 20 Efficiency gain ๏ฐ How the adaptive sampling improves the

    efficiency? โ—ผ Sample size computation in hypothesis testing with ๐ป0 : ๐œ = 0, ๐ป1 : ๐œ = ฮ”. โ€ข Treatment-assignment probability ๐œ‹. An effect size ฮ” โ‰  0. โ€ข To achieve a power ๐›ฝ with controlling Type I error at ๐›ผ, we need a sample size at least ๐‘‡โˆ— ๐‘ค = ๐‘‰ ๐‘ค ฮ”2 ๐‘ง1 โˆ’๐›ผ/2 โˆ’ ๐‘ง๐›ฝ 2 โ€ข ๐‘ง๐‘ค is the ๐‘ค quantile of the standard normal distribution. โ—ผ Sample size comparison. โ€ข Setting: ๐‘‹ โˆˆ Uniform[0, 1], ๐œŽ1 2(๐‘‹) = ๐‘‹2, ๐œŽ0 2(๐‘‹) = 0.1, ฮ” = 0.1, ๐›ผ = ๐›ฝ = 0.05, ๐œ ๐‘‹ = ๐œ = ฮ”. โ€ข RCT (๐‘ค1 (๐‘‹) = ๐‘ค0 (๐‘‹) = 1/2): ๐‘‡โˆ— ๐œ‹ โ‰ˆ 1300. โ€ข Neyman allocation: ๐‘‡โˆ— ๐œ‹ โ‰ˆ 970. โ€ข Another case: ๐œŽ1 2(๐‘‹) = (2๐‘‹)2, ๐œŽ0 2(๐‘‹) = 0.1. RCT: ๐‘‡โˆ— ๐œ‹ โ‰ˆ 3700. Neyman allocation: ๐‘‡โˆ— ๐œ‹ โ‰ˆ 2600. Reduce 330 samples. Vโ‰ˆ
  19. 21 Double machine learning Kato, Yasui, and McAlinn (2021). โ€œThe

    Adaptive Doubly Robust Estimator for Policy Evaluation in Adaptive Experiments and a Paradox Concerning Logging Policy.โ€ In NeurIPS. โ—ผ The asymptotic normality is related to double machine learning (Chernozhukov et al., 2018). โ—ผ We can replace the true probability ๐‘ค๐‘ก with its estimator even if we know ๐‘ค๐‘ก . โ€ข Since ๐‘ค๐‘ก can be volatile, replacing it with its estimator can stabilize the performance. โ—ผ The asymptotic property does not change. โ€ข The use of past observations is a variant of sample splitting, recently called cross fitting. We refer to the sample splitting as adaptive fitting (Figure 1).
  20. 22 Active adaptive experimental design Kato, Oga, Komatsubara, and Inokuchi

    (2024). Active Adaptive Experimental Design for Treatment Effect Estimation with Covariate Choices. In ICML โ—ผ We can gain more efficiency by optimizing covariate distributions. โ€ข Fix the target population covariate density as ๐‘ž(๐‘ฅ). We aim to estimate the ATE over ๐‘ž(๐‘ฅ). โ€ข But we can sample from a density ๐‘ ๐‘ฅ different from ๐‘ž(๐‘ฅ). = Covariate shift problem (Shimodaira, 2000). Train data and test data follow different distributions. โ‘  Sample units from ๐‘(๐‘ฅ) โ‘ข Observe the outcome โ‘ก Treatment ๐‘ค๐‘Ž (๐‘ฅ) The pool of Experimental units Target population ๐‘ž(๐‘ฅ) Decision-maker Experimental unit
  21. 23 Active adaptive experimental design โ—ผ Efficiency bound under covariate

    shift is investigated in Uehara, Kato, and Yasui (2021). โ—ผ By using the efficiency bound, ideal covariate density and allocation probability are given as ๐‘โˆ— ๐‘ฅ = ๐œŽ1 (๐‘ฅ) + ๐œŽ0 (๐‘ฅ) ืฌ ๐œŽ1 (๐‘ฅ) + ๐œŽ0 (๐‘ฅ) ๐‘ž ๐‘ฅ ๐‘‘๐‘ฅ , ๐‘ค๐‘Ž โˆ— ๐‘ฅ = ๐œŽ๐‘Ž ๐‘ฅ ๐œŽ1 ๐‘ฅ + ๐œŽ0 ๐‘ฅ . โ€ข Covariate density: We sample more experimental units with higher variances. โ€ข Treatment-allocation probability: Neyman allocation. โ—ผ Duding an experiment, we estimate ๐‘โˆ—(๐‘ฅ) and ๐œ‹โˆ— using the past observations. โ—ผ Source data optimization is called active learning.
  22. 24 Related literature โ€ข Hahn, Hirano, and Karlan (2011) develops

    adaptive experimental design for ATE estimation with a batch update and discrete contexts. โ€ข Tabord-Meehan (2023) proposes stratification tree for relaxing the discreetness of the covariates in Hahn, Hirano, and Karlan (2011). โ€ข Kallus, Saito, and Uehara (2020), Rafi (2023), and Li and Owen (2024) refine arguments about the efficiency bound for experimental design. โ€ข Shimodaira (2000) develops a framework of covariates-shift adaptation, and Sugiyama (2008) proposes active learning using techniques of covariate-shift adaptation. โ€ข Uehara, Kato, and Yasui (2021) investigates ATE estimation and policy learning under covariate shift. They develop the efficiency bound and an efficient ATE estimator using DML.
  23. 26 From ATE estimation to treatment choice โ—ผ From ATE

    estimation to decision-making (treatment choice. Manski, 2004). โ—ผ Treatment choice via adaptive experiments is called best-arm identification (BAI). โ€ข BAI has been investigated in various areas, including operations research and economics. Gathering data Parameter estimation Decision-making ATE estimation Treatment choice
  24. 27 Setup โ—ผ Treatment. ๐พ treatments indexed by 1,2, โ€ฆ

    ๐พ๏ผŽ Also referred to as treatment arms or arms. โ—ผ Potential outcome. ๐‘Œ๐‘Ž โˆˆ โ„๏ผŽ โ—ผ No covariates. โ—ผ Distribution. ๐‘Œ๐‘Ž follows a parametric distribution ๐‘ƒ ๐ , where ๐ = ๐œ‡๐‘Ž ๐‘Žโˆˆ[๐พ] โˆˆ โ„๐พ. โ€ข The mean of ๐‘Œ๐‘Ž is ๐œ‡๐‘Ž (and only the mean parameters can take different values). โ€ข For simplicity, let ๐‘ƒ ๐ be a Gaussian distribution under which the variance of ๐‘Œ๐‘Ž is fixed at ๐œŽ๐‘Ž 2. โ—ผ Goal. Find the following best treatment arm efficiently: ๐‘Ž๐ โˆ— โ‰” arg max ๐‘Žโˆˆ 1,2,โ€ฆ,๐พ ๐œ‡๐‘Ž .
  25. 28 Setup โ—ผ Adaptive experimental design with ๐‘ป rounds: โ€ข

    Treatment-allocation phase: in each round ๐‘ก = 1,2, โ€ฆ , ๐‘‡: โ€ข Allocate treatment ๐ด๐‘ก โˆˆ ๐พ โ‰” 1,2, โ€ฆ , ๐พ using ๐ด๐‘  , ๐‘Œ๐‘  ๐‘ =1 ๐‘กโˆ’1. โ€ข Observe the outcome ๐‘Œ๐‘ก = ฯƒ ๐‘Žโˆˆ ๐พ 1 ๐ด๐‘ก = ๐‘Ž ๐‘Œ๐‘Ž,๐‘ก . โ€ข Decision-making phase: at the end of the experiment: โ€ข Obtain an estimator เทœ ๐‘Ž๐‘‡ of the the best treatment arm using ๐‘‹๐‘ก , ๐ด๐‘ก , ๐‘Œ๐‘ก ๐‘ก=1 ๐‘‡ . Round ๐’• Unit ๐‘ก Treatment ๐ด๐‘ก Outcome ๐‘Œ๐‘ก After round ๐‘ป Choose the best treatment.
  26. 29 Performance measures โ€ข Let โ„™๐ and ๐”ผ๐ be the

    probability law and expectation under ๐‘ƒ ๐ . โ—ผ Probability of misidentification: โ„™๐ เทœ ๐‘Ž๐‘‡ โ‰  ๐‘Ž๐ โˆ— . โžข The probability that the estimate of the best treatment เทœ ๐‘Ž๐‘‡ is not the true best one ๐‘Ž๐œ‡ โˆ— . โ—ผ Expected simple regret: Regret๐ โ‰” ๐”ผ๐ ๐‘Œ๐‘Ž๐ โˆ— โˆ’ ๐‘Œเทœ ๐‘Ž๐‘‡ . โžข The welfare loss when we deploy the estimate of the best treatment เทœ ๐‘Ž๐‘‡ for the population. โ€ข In this talk, we simply refer this regret as the regret. โ€ข Also called the out-of-sample regret or the policy regret (Kasy and Sautmann, 2021). E.g., in-sample regret in regret minimization.
  27. 30 Evaluation framework โ—ผ Distribution-dependent analysis. โ€ข Evaluate the performance

    measures given ๐ fixed for ๐‘‡. โ€ข For each ๐ โˆˆ โ„๐พ, we evaluate โ„™๐ เทœ ๐‘Ž๐‘‡ โ‰  ๐‘Ž๐ โˆ— or Regret๐ โ—ผ Minimax analysis. โ€ข Evaluate the performance measures for the worst-case ๐. โ€ข We evaluate sup ๐โˆˆโ„๐พ โ„™๐ เทœ ๐‘Ž๐‘‡ โ‰  ๐‘Ž๐œ‡ โˆ— or sup ๐โˆˆโ„๐พ Regret๐ โ—ผ Bayes analysis. โ€ข A prior ฮ  is given. โ€ข Evaluate the performance measures by summing them weighted by the prior. โ€ข We evaluate ืฌ โ„๐พ โ„™๐ เทœ ๐‘Ž๐‘‡ โ‰  ๐‘Ž๐ โˆ— ๐‘‘ฮ (๐œ‡) or ืฌ โ„๐พ Regret๐ ๐‘‘ฮ (๐œ‡).
  28. 31 Regret decomposition โ—ผ The simple regret can be written

    as Regret๐ = ๐”ผ๐ ๐‘Œ๐‘Ž๐œ‡ โˆ— โˆ’ ๐”ผ๐ ๐‘Œเทœ ๐‘Ž๐‘‡ = เท ๐‘โ‰ ๐‘Ž๐œ‡ โˆ— ๐œ‡๐‘Ž๐œ‡ โˆ— โˆ’ ๐œ‡๐‘ โ„™๐ (เทœ ๐‘Ž๐‘‡ = ๐‘). โ€ข The regret (Regret๐ ) is equal to the sum of the probability (โ„™๐ (เทœ ๐‘Ž๐‘‡ = ๐‘)) weighted by the ATE (๐œ‡๐‘Ž๐œ‡ โˆ— โˆ’ ๐œ‡๐‘ ) between the best and suboptimal arms. โ€ข Also holds that Regret๐ โ‰ค ฯƒ ๐‘โ‰ ๐‘Ž๐œ‡ โˆ— ๐œ‡๐‘Ž๐œ‡ โˆ— โˆ’ ๐œ‡๐‘ โ„™๐ (เทœ ๐‘Ž๐‘‡ โ‰  ๐‘Ž๐œ‡ โˆ— ). โ—ผ โ„™๐ เทœ ๐‘Ž๐‘‡ โ‰  ๐‘Ž๐œ‡ โˆ— roughly upper bounded by เท ๐‘โ‰ ๐‘Ž๐œ‡ โˆ— exp โˆ’๐ถโˆ— ๐‘‡ ๐œ‡๐‘Ž๐œ‡ โˆ— โˆ’ ๐œ‡๐‘ 2 for some constant ๐ถโˆ— > 0. โ€ข C.f., Chernoff bound. Large-deviation principles.
  29. 32 Distribution-dependent, minimax, and Bayes analysis โ—ผ To obtain an

    intuition, we consider a binary case (๐พ = 2), where ๐‘Ž๐œ‡ โˆ— = 1. Regret๐ = ๐œ‡1 โˆ’ ๐œ‡2 โ‹… โ„™๐ เทœ ๐‘Ž๐‘‡ = 2 โ‰ค ๐œ‡1 โˆ’ ๐œ‡2 โ‹… exp โˆ’๐ถโˆ— ๐‘‡ ๐œ‡1 โˆ’ ๐œ‡2 2 . โ—ผ Distribution-dependent analysis. โ†’ ๐œ‡1 โˆ’ ๐œ‡2 can be asymptotically ignored since it is constant. โ†’ It is enough to evaluate โ„™๐ เทœ ๐‘Ž๐‘‡ โ‰  ๐‘Ž๐œ‡ โˆ— for evaluating Regret๐‘ƒ . โ†’ We evaluate the term ๐ถโˆ— by 1 ๐‘‡ log โ„™๐ เทœ ๐‘Ž๐‘‡ โ‰  ๐‘Ž๐ โˆ— โ‰ˆ 1 ๐‘‡ log Regret๐ โ‰ˆ ๐ถโˆ—.
  30. 33 Distribution-dependent, minimax, and Bayes analysis โ—ผ To obtain an

    intuition, we consider a binary case (๐พ = 2), where ๐‘Ž๐œ‡ โˆ— = 1. Regret๐ = ๐œ‡1 โˆ’ ๐œ‡2 โ‹… โ„™๐œ‡ เทœ ๐‘Ž๐‘‡ = 2 โ‰ค ๐œ‡1 โˆ’ ๐œ‡2 โ‹… exp โˆ’๐ถโˆ— ๐‘‡ ๐œ‡1 โˆ’ ๐œ‡2 2 . โ—ผ Minimax and Bayes analysis. โ€ข Probability of misidentification, โ„™๐ เทœ ๐‘Ž๐‘‡ โ‰  ๐‘Ž๐ โˆ— . โ€ข The rate of convergence to zero is still exponential (if the worst-case is well-defined). โ€ข Regret, Regret๐œ‡ . โ€ข Distributions whose means are ๐œ‡1 โˆ’ ๐œ‡2 = ๐‘‚ 1/ ๐‘‡ dominate the regret. โ€ข The rate of convergence to zero is ๐‘‚ 1/ ๐‘‡ . โ†’ The analysis differs between the probability of misidentification and the regret.
  31. 34 Distribution-dependent, minimax, and Bayes analysis โ—ผ Define ๐›ฅ๐œ‡ =

    ๐œ‡1 โˆ’ ๐œ‡2 . โ—ผ For some constant ๐ถ > 0, the regret Regret๐ = ๐›ฅ๐ (เทœ ๐‘Ž๐‘‡ โ‰  ๐‘Ž๐œ‡ โˆ— ) can be written as Regret๐ โ‰ˆ ๐›ฅ๐ exp โˆ’๐ถ๐‘‡ ๐›ฅ๐ 2 โ†’ The dependency of ๐›ฅ๐ on ๐‘‡ affects the evaluation of the regret. 1. ๐›ฅ๐œ‡ converges to zero with an order slower than 1/ ๐‘‡. โ†’ For some increasing function ๐‘”, the regret becomes Regret๐ โ‰ˆ exp โˆ’๐‘” ๐‘‡ . 2. ๐›ฅ๐œ‡ converges to zero with an order of 1/ ๐‘‡. โ†’ For some ๐ถ > 0, Regret๐œ‡ โ‰ˆ C ๐‘‡ . 3. ๐›ฅ๐œ‡ converges to zero with an order faster than 1/ ๐‘‡. โ†’ Regret๐ โ‰ˆ ๐‘œ(1/ ๐‘‡) โ†’ In the worst case, Regret๐ โ‰ˆ C ๐‘‡ , where ๐›ฅ๐œ‡ = ๐‘‚ 1 ๐‘‡ )๏ผŽ
  32. 35 Minimax and Bayes optimal experiment for the regret Kato,

    M. Minimax and Bayes optimal best-arm identification: adaptive experimental design for treatment choice. (Unpublished. I will upload soon) โ€ข We design an experiment for treatment choice (best-arm identification). โ€ข The designed experiment is minimax and Bayes optimality for the regret. โ—ผ Our designed experiment consists of the following two elements: โ€ข Treatment-allocation phase: two-stage sampling. โ€ข In the first stage, we remove clearly suboptimal treatments and estimate the variances. โ€ข In the second stage, we allocate treatment arms with a variant of the Neyman allocation. โ€ข Decision-making phase: choice of the empirical best arm (Bubeck et al., 2011, Manski, 2004).
  33. 36 Minimax and Bayes optimal experiment for the regret โ—ผ

    Treatment-allocation phase: We introduce the following two-stage sampling: โ€ข Split ๐‘‡ into the first stage with ๐‘Ÿ๐‘‡ rounds and the second stage with 1 โˆ’ ๐‘Ÿ ๐‘‡ rounds. 1. First stage: โ€ข Allocate each treatment with equivalent ratio ๐‘Ÿ๐‘‡/๐พ. โ€ข Using concentration inequality, we select candidates of the best treatments as แˆ˜ ๐’ฎ โІ ๐พ . โ€ข Obtain an estimator เทœ ๐œŽ๐‘Ž,๐‘Ÿ๐‘‡ 2 of the variance ๐œŽ๐‘Ž 2. (sample variance is fine). 2. Second stage (we do not allocate treatment ๐‘Ž โˆ‰ แˆ˜ ๐’ฎ from this stage): โ€ข If แˆ˜ ๐’ฎ = 1, we return an arm ๐‘Ž โˆˆ แˆ˜ ๐’ฎ as an estimate of the best arm. โ€ข If แˆ˜ ๐’ฎ = 2, we allocate treatment ๐‘Ž โˆˆ แˆ˜ ๐’ฎ with probability ๐‘ค๐‘Ž,๐‘ก โ‰” เทœ ๐œŽ๐‘Ž,๐‘Ÿ๐‘‡ / เทœ ๐œŽ1,๐‘Ÿ๐‘‡ + เทœ ๐œŽ2,๐‘Ÿ๐‘‡ . โ€ข If แˆ˜ ๐’ฎ โ‰ฅ 3, we allocate treatment ๐‘Ž โˆˆ แˆ˜ ๐’ฎ with probability ๐‘ค๐‘Ž,๐‘ก โ‰” เทœ ๐œŽ๐‘Ž,๐‘Ÿ๐‘‡ 2 / ฯƒ ๐‘โˆˆ แˆ˜ ๐’ฎ เทœ ๐œŽ๐‘,๐‘Ÿ๐‘‡ 2 .
  34. 37 Minimax and Bayes optimal experiment for the regret โ—ผ

    Decision-making phase: โ€ข Return เทœ ๐‘Ž๐‘‡ = arg max ๐‘Žโˆˆ[๐พ] เทœ ๐œ‡๐‘Ž,๐‘‡ . โ€ข เทœ ๐œ‡๐‘Ž,๐‘‡ โ‰” 1 ฯƒ๐‘ก=1 ๐‘‡ 1 ๐ด๐‘ก=๐‘Ž ฯƒ๐‘ก=1 ๐‘‡ 1 ๐ด๐‘ก = ๐‘Ž ๐‘Œ๐‘ก is the sample mean. โœ“ Remark: โ€ข How to select candidates แˆ˜ ๐’ฎ? โ€ข Let ๐‘ฃ๐‘Ž,๐‘ž๐‘‡ โ‰” log ๐‘‡ ๐‘‡ เทœ ๐œŽ๐‘Ž,๐‘Ÿ๐‘‡ , and เทœ ๐‘Ž๐‘Ÿ๐‘‡ = arg max ๐‘Žโˆˆ[๐พ] เทœ ๐œ‡๐‘Ž,๐‘Ÿ๐‘‡ . โ€ข Construct lower and upper confidence bounds as เทœ ๐œ‡๐‘Ž,๐‘Ÿ๐‘‡ โˆ’ ๐‘ฃ๐‘Ž,๐‘ž๐‘‡ and เทœ ๐œ‡๐‘Ž,๐‘Ÿ๐‘‡ ๏ผ‹๐‘ฃ๐‘Ž,๐‘ž๐‘‡ . โ€ข Define แˆ˜ ๐’ฎ as แˆ˜ ๐’ฎ โ‰” ๐‘Ž โˆˆ ๐พ โˆถ เทœ ๐œ‡๐‘Ž,๐‘Ÿ๐‘‡ ๏ผ‹๐‘ฃ๐‘Ž,๐‘ž๐‘‡ โ‰ฅ เทœ ๐œ‡เทœ ๐‘Ž๐‘Ÿ๐‘‡,๐‘Ÿ๐‘‡ โˆ’ ๐‘ฃเทœ ๐‘Ž๐‘Ÿ๐‘‡,๐‘Ÿ๐‘‡ .
  35. 38 Minimax and Bayes optimal experiment for the regret โ—ผ

    For the designed experiment, we can prove the minimax and Bayes optimality. โ€ข Let โ„ฐ be a set of consistent experiments. For any ๐›ฟ โˆˆ โ„ฐ, it holds that โ„™๐œ‡ เทœ ๐‘Ž๐‘‡ ๐›ฟ โ‰  ๐‘Ž๐œ‡ โˆ— โ†’ 0. โžข Minimax optimality. lim ๐‘‡โ†’โˆž sup ๐œ‡โˆˆโ„๐พ ๐‘‡ Regret๐œ‡ ๐›ฟTSโˆ’EBA โ‰ค max 1 ๐‘’ max ๐‘Žโ‰ ๐‘ ๐œŽ๐‘Ž + ๐œŽ๐‘ , ๐พ โˆ’ 1 ๐พ เท ๐‘Žโˆˆ ๐พ ๐œŽ๐‘Ž 2 log ๐พ โ‰ค inf ๐›ฟโˆˆโ„ฐ lim ๐‘‡โ†’โˆž sup ๐œ‡โˆˆโ„๐พ ๐‘‡ Regret๐œ‡ ๐›ฟ. โžข Bayes optimality. lim ๐‘‡โ†’โˆž ๐‘‡ เถฑ ๐œ‡โˆˆโ„๐พ Regret๐œ‡ ๐›ฟTSโˆ’EBA ๐‘‘ฮ (๐œ‡) โ‰ค 4 เท ๐‘Žโˆˆ ๐พ เถฑ โ„๐พโˆ’1 ๐œŽโˆ– ๐‘Ž 2โˆ— โ„Ž๐‘Ž ๐œ‡โˆ– ๐‘Ž โˆ— ๐โˆ– ๐‘Ž ๐‘‘๐ปโˆ– ๐‘Ž ๐โˆ– ๐‘Ž โ‰ค inf ๐›ฟโˆˆโ„ฐ lim ๐‘‡โ†’โˆž ๐‘‡ เถฑ ๐œ‡โˆˆโ„๐พ Regret๐œ‡ ๐›ฟ ๐‘‘ฮ  ๐œ‡ , โ€ข ๐œŽโˆ– ๐‘Ž 2โˆ— = ๐œŽ๐‘โˆ— 2 and ๐œ‡โˆ– ๐‘Ž โˆ— = ๐œ‡๐‘โˆ—, where ๐‘โˆ— be arg max ๐‘โˆˆ ๐พ โˆ–{๐‘Ž} ๐œ‡๐‘Ž . โ€ข ๐โˆ– ๐‘Ž is a parameter vector which removes ๐œ‡๐‘Ž from ๐. ๐ปโˆ– ๐‘Ž ๐โˆ– ๐‘Ž is a prior of ๐โˆ– ๐‘Ž . Vโ‰ˆ Vโ‰ˆ
  36. 39 Intuition โ—ผ Regret๐ = ๐”ผ๐ ๐‘Œ๐‘Ž๐œ‡ โˆ— โˆ’ ๐”ผ๐

    ๐‘Œเทœ ๐‘Ž๐‘‡ = ฯƒ ๐‘โ‰ ๐‘Ž๐œ‡ โˆ— ๐œ‡๐‘Ž๐œ‡ โˆ— โˆ’ ๐œ‡๐‘ โ„™๐ (เทœ ๐‘Ž๐‘‡ = ๐‘). โ€ข โ„™๐ เทœ ๐‘Ž๐‘‡ = ๐‘ = exp โˆ’๐ถ1 ๐‘‡ ๐œ‡๐‘Ž๐œ‡ โˆ— โˆ’ ๐œ‡๐‘ 2 . โ€ข If ๐œ‡๐‘Ž๐œ‡ โˆ— โˆ’ ๐œ‡๐‘ converges to zero slower than 1/ ๐‘‡, we have โ„™๐ เทœ ๐‘Ž๐‘‡ = ๐‘ โ†’ 0. (๐‘ โˆ‰ แˆ˜ ๐’ฎ๐‘Ÿ๐‘‡ ) โ€ข If ๐œ‡๐‘Ž๐œ‡ โˆ— โˆ’ ๐œ‡๐‘ = ๐ถ2 / ๐‘‡, we have โ„™๐ เทœ ๐‘Ž๐‘‡ = ๐‘ = constant. (๐‘ โˆˆ แˆ˜ ๐’ฎ๐‘Ÿ๐‘‡ ) Treatment 1 Treatment 2 ๐œ‡1 ๐œ‡2 Treatment 3 ๐œ‡3 ๐œ‡2 Treatment 4 Treatment 5 ๐œ‡5 แˆ˜ ๐’ฎ๐‘Ÿ๐‘‡ . ๐ถ2 / ๐‘‡
  37. 40 Intuition โ—ผ When two treatments are given, we allocate

    them so as to maximize the discrepancy between their confidence intervals. โ—ผ When multiple treatments are given, there is no unique way to compare their confidence intervals. Treatment 1 Treatment 2 ๐œ‡1 ๐œ‡2 Confidence intervals Treatment 1 Treatment 2 ๐œ‡1 ๐œ‡2 Treatment 3 ๐œ‡3 โ†’ It depends on the performance measures and uncertainty evaluation Maximize โ†’ This task is closely related to make the variance of ATE estimators smaller. โ†’ Use Neyman allocation.
  38. 41 Proof of the lower bound โ—ผ Simplicity, only consider

    binary treatments, where ๐œ‡1 > ๐œ‡2 . โ€ข The regret is given as Regret๐ = ๐œ‡1 โˆ’ ๐œ‡2 โ„™๐ (เทœ ๐‘Ž๐‘‡ = 2). โ—ผ As I explain later, by using bandit techniques (Kaufmann et al., 2016), we can derive the following lower bound for the probability of misidentification: โ„™๐ เทœ ๐‘Ž๐‘‡ = 2 โ‰ฅ exp โˆ’๐‘‡ ๐œ‡1 โˆ’ ๐œ‡2 2 ๐œŽ1 2/๐‘ค1 + ๐œŽ2 2/๐‘ค2 + ๐‘œ 1 . โ—ผ Therefore, we have Regret๐ โ‰ฅ ๐œ‡1 โˆ’ ๐œ‡2 exp โˆ’๐‘‡ ๐œ‡1โˆ’๐œ‡2 2 2 ๐œŽ1 2/๐‘ค1+๐œŽ2 2/๐‘ค2 + ๐‘œ 1 . โ—ผ Letting ๐œ‡1 โˆ’ ๐œ‡2 = ๐œŽ1 2 ๐‘ค1 + ๐œŽ2 2 ๐‘ค2 /๐‘‡ and optimizing ๐‘ค๐‘Ž , we have Regret๐ โ‰ฅ (๐œŽ1 + ๐œŽ2 )/ ๐‘’๐‘‡ .
  39. 42 Bayes optimal experiment for the simple regret โ—ผ Our

    designed experiment and theory are applicable for cases with other distributions. E.g., Bernoulli distributions. Komiyama, J., Ariu, K., Kato, M., and Qin, C. (2023). Rate-optimal Bayesian simple regret in best arm identification. Mathematics of Operations Research. โ€ข Bayes optimal experiment for treatment choice with Bernoulli distributions. โ€ข Komiyama et al. (2023) corresponds to a simplified case of our new results. โ€ข In our previous study, we only consider Bernoulli bandits and the lower bound is not tight (there exists a constant gap between the upper and lower bounds). โ€ข In Bernoulli cases, the Neyman allocation becomes the uniform sampling. โ€ข No need for estimating the variances.
  40. 43 Related literature โ—ผ Ordinal optimization: The best-arm identification has

    been considered as ordinal optimization. In this setting, we assume that ideal allocation probability is known. โ€ข Chen (2000) considers Gaussian outcomes. Glynn and Juneja (2004) considers more general distributions. โ—ผ Bestโ€“arm identification: โ€ข Audibert, Bubeck, and Munos (2010) establishes the best-arm identification framework. An ideal allocation probability is unknown, and we need to estimate it. โ€ข Bubeck, Munos, and Stoltz (2011) proposes minimax-rate optimal algorithm.
  41. 44 Related literature โ—ผ Limit-of-experiment (or local asymptotic normality) framework

    for experimental design: Tools for developing experiments with matching lower and upper bounds by assuming locality for the underlying distributions. โ€ข Hirano and Porter (2008) introduces this framework for treatment choice. โ€ข Armstong (2021) and Hirano and Porter (2023) apply the framework for adaptive experiments. โ€ข Adusumilli (2025) proposes adaptive experimental design based on the limit-of-experiment framework and the diffusion-process theory. โžข I guess that we can derive tight lower and upper bounds without using the limit-of-experiment. โ€ข 1/ ๐‘‡ regimes (locality) directedly appear as a result of the worst-case for the regret.
  42. 46 Policy learning โ—ผ Treatment choice with covariates ๐‘‹๐‘ก .

    โ—ผ Two goals: โ€ข Choosing the marginalized best arm ๐‘Ž๐‘ƒ โˆ— = arg max ๐‘Žโˆˆ[๐พ] ๐œ‡๐‘Ž , where ๐œ‡๐‘Ž = ๐”ผ๐‘ƒ ๐‘Œ๐‘Ž . โ€ข Choosing the conditional best arm ๐‘Ž๐‘ƒ โˆ— (๐‘‹) = arg max ๐‘Žโˆˆ ๐พ ๐œ‡๐‘Ž (๐‘‹), where ๐œ‡๐‘Ž ๐‘‹ = ๐”ผ๐‘ƒ ๐‘Œ๐‘Ž ๐‘‹ . โ—ผ The second task is called policy learning. โ€ข Policy learning has the following two approaches: โ€ข Plug-in policy: estimate ๐œ‡๐‘Ž ๐‘‹ and choose เทœ ๐‘Ž๐‘› (๐‘‹) = arg max ๐‘Žโˆˆ ๐พ เทœ ๐œ‡๐‘Ž,๐‘› (๐‘‹). โ€ข Empirical welfare maximization (EWM): train a policy by minimizing a counterfactual risk. โ—ผ This talk focuses on adaptive experimental design for policy learning via the EWM approach.
  43. 47 Setup โ—ผ In the EWM approach, we train a

    policy function ๐œ‹: [๐พ] ร— ๐’ณ โ†’ [0,1]. โ€ข For all ๐‘ฅ โˆˆ ๐’ณ, ฯƒ๐‘Ž=1 ๐พ ๐œ‹๐‘Ž (๐‘ฅ) = 1 holds. โ€ข A policy decides how we recommend a treatment for the future population. โ€ข Corresponds to a classifier in machine learning. โ—ผ ฮ  is a set of policy functions ๐œ‹: [๐พ] ร— ๐’ณ โ†’ [0,1]. E.g., linear models, neural networks, and random forest. โ—ผ Welfare of a policy ๐œ‹ โˆˆ ฮ : ๐ฟ(๐œ‹) โ‰” ๐”ผ๐‘ƒ เท ๐‘Žโˆˆ ๐พ ๐œ‹๐‘Ž (๐‘‹)๐‘Œ๐‘Ž .
  44. 48 Setup โ—ผ Goal. Obtain an optimal policy defined as

    ๐œ‹โˆ— โ‰” arg max ๐œ‹โˆˆฮ  ๐ฟ ๐œ‹ = arg max ๐œ‹โˆˆฮ  ๐”ผ๐‘ƒ เท ๐‘Žโˆˆ ๐พ ๐œ‹๐‘Ž (๐‘‹)๐‘Œ๐‘Ž . โ—ผ EWM approach. โ€ข Estimate the welfare ๐ฟ(๐œ‹) as เท  ๐ฟ(๐œ‹). โ€ข Train a policy as เทœ ๐œ‹ โ‰” arg max ๐œ‹โˆˆฮ  เท  ๐ฟ(๐œ‹). โ—ผ Regret. The regret for this problem. is defined as Regret๐‘ƒ โ‰” ๐”ผ๐‘ƒ เท ๐‘Žโˆˆ ๐พ ๐œ‹๐‘Ž โˆ— ๐‘‹ ๐‘Œ ๐‘Ž โˆ’ ๐”ผ๐‘ƒ เท ๐‘Žโˆˆ ๐พ เทœ ๐œ‹๐‘Ž ๐‘‹ ๐‘Œ ๐‘Ž .
  45. 49 Adaptive experimental design for policy learning Kato, M., Okumura,

    K., Ishihara, T., and Kitagawa, T. (2024) Adaptive experimental design for policy learning. See the proceedings of the ICML 2024 Workshop RLControlTheory (I will update arXiv next Tuesday.). โ—ผ Propose minimax rate optimal experiment for policy learning. โ—ผ Let โ„ฐ be a class of experiment ๐›ฟ. Let ๐‘€ be a VC or Natarayan dimension of the policy class ฮ . โ—ผ Lower bound: Given the class of experiment โ„ฐ, we have inf ๐›ฟโˆˆโ„ฐ lim ๐‘‡โ†’โˆž ๐‘‡ Regret๐‘ƒ โ‰ฅ 1 8 ๐”ผ ๐‘€ เท ๐‘Žโˆˆ ๐พ ๐œŽ๐‘Ž 2 ๐‘‹ , if ๐พ โ‰ฅ 3 1 8 ๐”ผ ๐‘€ ๐œŽ1 ๐‘‹ + ๐œŽ2 ๐‘‹ 2 , if ๐พ = 2
  46. 50 Adaptive experimental design for policy learning โ—ผ Ideal treatment-allocation

    probability: โ€ข If ๐พ = 2, ๐‘ค๐‘Ž โˆ—(๐‘‹) = ๐œŽ๐‘Ž(๐‘‹) ๐œŽ1 ๐‘‹ +๐œŽ2(๐‘‹) . (the ratio of the standard deviation). โ€ข If ๐พ โ‰ฅ 3, ๐‘ค๐‘Ž โˆ—(๐‘‹) = ๐œŽ๐‘Ž 2(๐‘‹) ฯƒ๐‘โˆˆ[๐พ] ๐œŽ๐‘ 2(๐‘‹) . (the ratio of the variances). โ—ผ Adaptive experiment for policy learning. โ€ข In each round ๐‘ก, we estimate ๐‘คโˆ— as ๐‘ค๐‘ก and allocate treatment with probability ๐‘ค๐‘ก โ€ข At the end, we estimate the welfare as เท  ๐ฟ ๐œ‹ = เท ๐‘ก=1 ๐‘‡ เท ๐‘Ž=1 ๐พ ๐œ‹ ๐‘Ž ๐‘‹๐‘ก 1 ๐ด๐‘ก = ๐‘Ž ๐‘Œ๐‘ก โˆ’ เทœ ๐œ‡๐‘Ž,๐‘ก (๐‘‹๐‘ก ) ๐‘ค๐‘ก ๐‘Ž โˆฃ ๐‘‹๐‘ก + เทœ ๐œ‡๐‘Ž,๐‘ก (๐‘‹๐‘ก ) . โ€ข Then, we train a policy as เทœ ๐œ‹ = arg max ๐œ‹โˆˆฮ  เท  ๐ฟ ๐œ‹ .
  47. 51 Adaptive experimental design for policy learning โ—ผ Upper bound:

    For our designed experiment ๐›ฟ, there exists a constant ๐ถ1 , ๐ถ2 > 0 such that lim ๐‘‡โ†’โˆž ๐‘‡ Regret๐‘ƒ ๐›ฟ โ‰ค ๐ถ1 ๐‘€ + ๐ถ2 ๐”ผ ๐‘€ เท ๐‘Žโˆˆ ๐พ ๐œŽ๐‘Ž 2 ๐‘‹ , if ๐พ โ‰ฅ 3 ๐ถ1 ๐‘€ + ๐ถ2 ๐”ผ ๐‘€ ๐œŽ1 ๐‘‹ + ๐œŽ2 ๐‘‹ 2 , if ๐พ = 2 โ—ผ Rate optimality. โ€ข For example, if ๐พ = 2, a lower bound is 1 8 ๐”ผ ๐‘€ ๐œŽ1 ๐‘‹ + ๐œŽ2 ๐‘‹ , while the upper bound is ๐ถ1 ๐‘€ + ๐ถ2 ๐”ผ ๐‘€ ๐œŽ1 ๐‘‹ + ๐œŽ2 ๐‘‹ 2 . โ€ข Only rates of ๐‘‡ and ๐œŽ๐‘Ž (๐‘‹) are optimal.
  48. 52 Comparison with the uniform allocation โ—ผ Upper bound under

    some ๐‘ค: โ€ข Suppose that ๐พ = 2. Consider allocating treatments with ๐‘ค. โ€ข There exists a constant ๐ถ1 , ๐ถ2 > 0 such that lim ๐‘‡โ†’โˆž ๐‘‡ Regret๐‘ƒ ๐›ฟ โ‰ค ๐ถ1 ๐‘€ + ๐ถ2 ๐”ผ ๐‘€ ๐œŽ1 2 ๐‘‹ ๐‘ค1 ๐‘‹ + ๐œŽ2 2 ๐‘‹ ๐‘ค2 ๐‘‹ . โ—ผ Uniform allocation (๐‘ค1 ๐‘‹ = ๐‘ค2 ๐‘‹ = 1/2): lim ๐‘‡โ†’โˆž ๐‘‡ Regret๐‘ƒ ๐›ฟ โ‰ค ๐ถ1 ๐‘€ + ๐ถ2 ๐”ผ 2๐‘€ ๐œŽ1 2 ๐‘‹ + ๐œŽ2 2 ๐‘‹ . โ€ข Here, ๐”ผ ๐‘€ ๐œŽ1 ๐‘‹ + ๐œŽ2 ๐‘‹ 2 (Neyman๏ผ‰ โ‰ค ๐”ผ 2๐‘€ ๐œŽ1 2 ๐‘‹ + ๐œŽ2 2 ๐‘‹ (Uniform).
  49. 53 Open questions โ—ผ We only showed the rate optimal

    for the sample size ๐‘‡ and variance ๐œŽ๐‘Ž 2. โ€ข Between the upper and lower bounds, there exist a large constant gap independent of the variance ๐œŽ๐‘Ž 2(๐‘‹) and ๐‘‡. โ€ข The result is not so tight. The previous experiments for ATE estimation and treatment choice have matching lower and upper bounds, including the constant terms. There is no room for improvement. โ€ข I guess that this this is due to the looseness of the empirical-process theory (EWM). โ—ผ Can we implement an arm-elimination pilot phase as our previously designed minimax and Bayes optimal best-arm identification? (Chernozhukov, Lee, Rosen, and Sun, 2025?) โ€ข Finite sample confidence intervals are required for estimators of ๐œ‡๐‘Ž ๐‘ฅ .
  50. 54 Related literature โ—ผ EWM approach. โ€ข Swaminathan and Joachims

    (2015), Kitagawa and Tetenov (2018), Athey and Wager (2021) develop a basic framework for policy learning. โ€ข Zhou, Athey, and Wager (2018) shows a regret upper bound when multiple treatments are given. โ€ข Zhan, Ren, Athey, and Zhou (2022) investigates policy learning with adaptively collected data.
  51. 56 Difficulty in best-arm identification โ—ผ Why we consider minimax

    or Bayes optimality for treatment choice? โ†’ We can consider optimality under each fixed distribution. โ€ข Kasy and Sautmann (2021) claims that they developed such a result. โ€ข But their proof had several issues. โ†’ Through the investigation of their issues, various impossibility theorems have been showed. โ—ผ Impossibility theorems: there is no algorithm which is optimal for any distribution. โ€ข Kaufmann (2020). โ€ข Ariu, Kato, Komiyama, McAlinn, and Qin (2021) โ€ข Degenne (2023). โ€ข Wang, Ariu, and Proutier (2024) โ€ข Imbens, Qin, and Wager (2025(
  52. 57 Bandit lower Bound โ—ผ Lower bounds in the bandit

    problem are derived via the information theory. โ—ผ The following lemma is one of the most general and tight results for lower bounds. โ€ข Let ๐‘ƒ and ๐‘„ be two distributions with ๐พ arms such that for all ๐‘Ž, the distributions ๐‘ƒ(๐‘Ž) and ๐‘„(๐‘Ž) of ๐‘Œ(๐‘Ž) are mutually absolutely continuous. โ€ข We have เท ๐‘Žโˆˆ[๐พ] ๐”ผ๐‘ƒ เท ๐‘ก=1 ๐‘‡ 1[๐ด๐‘ก = ๐‘Ž] KL ๐‘ƒ ๐‘Ž , ๐‘„๐‘Ž โ‰ฅ sup โ„ฐโˆˆโ„ฑ๐‘‡ ๐‘‘ โ„™๐‘ƒ โ„ฐ , โ„™๐‘„ โ„ฐ . โ€ข ๐‘‘ ๐‘ฅ, ๐‘ฆ โ‰” ๐‘ฅ log ๐‘ฅ ๐‘ฆ + 1 โˆ’ ๐‘ฅ log 1โˆ’๐‘ฅ 1โˆ’๐‘ฆ is the binary relative entropy with the convention that ๐‘‘ 0,0 = ๐‘‘ 1, 1 = 0. Transportation lemma (Lai and Robbins 1985; Kaufmann et al. 2016)
  53. 58 Example: regret lower bound in regret minimization โ—ผ Lai

    and Robbins (1985) develops a lower bound for the regret minimization problem. โ€ข Regret minimization problem. โ€ข Goal. Maximize the cumulative outcome ฯƒ๐‘ก=1 ๐‘‡ ๐‘Œ ๐ด๐‘ก,๐‘ก . โ€ข The (in-sample) regret is defined as Regret๐‘ƒ โ‰” ๐”ผ ฯƒ๐‘ก=1 ๐‘‡ ๐‘Œ๐‘Ž๐‘ƒ โˆ— , ๐‘ก โˆ’ ฯƒ๐‘ก=1 ๐‘‡ ๐‘Œ ๐ด๐‘ก,๐‘ก . โ€ข Under each distribution ๐‘ƒ0 , for large ๐‘‡, the lower bound is given as Regret๐‘ƒ0 โ‰ฅ log ๐‘‡ inf ๐‘„โˆˆ๐ด๐‘™๐‘ก ๐‘ƒ0 เท ๐‘Žโˆˆ ๐พ ๐”ผ๐‘ƒ0 เท ๐‘ก=1 ๐‘‡ 1[๐ด๐‘ก = ๐‘Ž] KL ๐‘ƒ๐‘Ž,0 , ๐‘„๐‘Ž , where ๐ด๐‘™๐‘ก ๐‘ƒ0 โ‰” {๐‘„ โˆˆ ๐’ซ: ๐‘Ž๐‘„ โˆ— โ‰  ๐‘Ž๐‘ƒ_) โˆ— } โ—ผ Kaufmannโ€™s lower bound is a generalization of Lai and Robbinsโ€™ lower bound. โ€ข It is known that Kaufmannโ€™s lower bound can yield lower bounds for various bandit problems. โ€ข โ€œCan we develop a lower bound for best-arm identification using Kaufmannโ€™s lemma?โ€
  54. 59 Lower bound in the fixed-budget setting โ—ผ A direct

    application of Kaufmannโ€™s lemma for best-arm identification yields the following bound. โ—ผ Lower bound (conjecture): โ€ข Under each distribution ๐‘ƒ0 of the data-generating process, a lower bound is given as โ„™๐‘ƒ0 เทœ ๐‘Ž๐‘‡ โ‰  ๐‘Ž๐‘ƒ0 โˆ— โ‰ฅ inf ๐‘„โˆˆ๐ด๐‘™๐‘ก(๐‘ƒ0) exp โˆ’ เท ๐‘Žโˆˆ[๐พ] ๐”ผ๐‘„ เท ๐‘ก=1 ๐‘‡ 1[๐ด๐‘ก = ๐‘Ž] KL ๐‘„๐‘Ž ๐‘ƒ๐‘Ž, 0 . โ€ข 1 ๐‘‡ ๐”ผ๐‘„ ฯƒ๐‘ก=1 ๐‘‡ 1[๐ด๐‘ก = ๐‘Ž] corresponds to a treatment-assignment probability (ratio) under ๐‘„. โ€ข By denoting 1 ๐‘‡ ๐”ผ๐‘„ ฯƒ๐‘ก=1 ๐‘‡ 1[๐ด๐‘ก = ๐‘Ž] by ๐‘ค๐‘Ž , we can rewrite the above conjecture as โ„™๐‘ƒ0 เทœ ๐‘Ž๐‘‡ โ‰  ๐‘Ž๐‘ƒ0 โˆ— โ‰ฅ inf ๐‘„โˆˆ๐ด๐‘™๐‘ก(๐‘ƒ0) exp โˆ’ เท ๐‘Žโˆˆ 1,2,โ€ฆ,๐พ ๐‘ค๐‘Ž KL ๐‘„๐‘Ž , ๐‘ƒ0,๐‘Ž . โ€ข Note that ๐‘ค๐‘Ž = 1 ๐‘‡ ๐”ผ๐‘„ ฯƒ๐‘ก=1 ๐‘‡ 1[๐ด๐‘ก = ๐‘Ž] implies that ๐œ‹ is a treatment-assignment probability (ratio) under ๐‘„.
  55. 60 Optimal design in the fixed-budget setting โ—ผ Question: Does

    there exist an algorithm whose probability of misidentification โ„™๐‘ƒ0 เทœ ๐‘Ž๐‘‡ โ‰  ๐‘Ž๐‘ƒ0 โˆ— exactly matches the following lower bound?: inf ๐‘„โˆˆ๐ด๐‘™๐‘ก(๐‘ƒ0) exp โˆ’๐‘‡ เท ๐‘Žโˆˆ[๐พ] ๐‘คa KL ๐‘„๐‘Ž , ๐‘ƒ0,๐‘Ž . โ†’ Answer: No(without additional assumptions). โ€ข When the number of treatments is two, and the outcomes follow the Gaussian distributions with known variances, such an optimal algorithm exists. โ€ข The optimal algorithm is Neyman allocation. โ€ข In more general cases, no algorithm exists.
  56. 61 Neyman allocation is optimal in two-armed bandits โ—ผ Consider

    two treatments (๐พ = 2) โ€ข Assume that the variance is known. โ—ผ The lower bound can be computed as inf ๐‘„โˆˆ๐ด๐‘™๐‘ก(๐‘ƒ0) exp โˆ’๐‘‡ เท ๐‘Žโˆˆ[๐พ] ๐‘ค๐‘Ž KL ๐‘„๐‘Ž , ๐‘ƒ0,๐‘Ž โ‰ฅ exp โˆ’๐‘‡ ๐”ผ๐‘ƒ0 ๐‘Œ1 โˆ’ ๐”ผ๐‘ƒ0 ๐‘Œ2 2 ๐œŽ1 + ๐œŽ2 2 โ—ผ Asymptotically optimal algorithm is Neyman allocation (Kaufmann et al. 2016): โ€ข Assume that the variances are known. โ€ข Allocate treatments with a ratio of the standard deviation. โ€ข At the end of the experiment, recommend a treatment with the highest sample mean as the best treatment. โ—ผ ๐œŽ1 + ๐œŽ2 2 is the asymptotic variance of the ATE estimator under the Neyman allocation.
  57. 62 Optimal algorithm in the multi-armed bandits. Kasy and Sautmann

    (KS) (2021) โ—ผ Propose the Exploration Sampling (ES) algorithm. โ€ข A variant of the Top-Two Thompson sampling (Russo, 2016). โ—ผ They show that under their ES algorithm, for each ๐‘ƒ0 โˆˆ Bernoulli independent of ๐‘‡, it holds that Regret๐‘ƒ0 โ‰ค inf ๐‘ค inf ๐‘„โˆˆ๐ด๐‘™๐‘ก(๐‘ƒ0) exp โˆ’๐‘‡ เท ๐‘Žโˆˆ[๐พ] ๐‘ค๐‘Ž KL ๐‘„๐‘Ž , ๐‘ƒ๐‘Ž,0 . โ†’ They claim that their algorithm is optimal for Kaufmannโ€™s lower bound โ„™๐‘ƒ0 เทœ ๐‘Ž๐‘‡ โ‰  ๐‘Ž๐‘ƒ0 โˆ— โ‰ฅ inf ๐‘„โˆˆ๐ด๐‘™๐‘ก(๐‘ƒ0) exp โˆ’๐‘‡ เท ๐‘Žโˆˆ[๐พ] ๐‘ค๐‘Ž KL ๐‘„๐‘Ž , ๐‘ƒ0,๐‘Ž .
  58. 63 Impossibility theorem in the multi-armed bandits โ—ผ Issues in

    KS (Ariu et al. 2025). โ€ข The KL divergence is flipped. โ€ข KS (Incorrect): Regret๐‘ƒ0 โ‰ค inf ๐‘ค inf ๐‘„โˆˆ๐ด๐‘™๐‘ก(๐‘ƒ0) exp โˆ’๐‘‡ ฯƒ ๐‘Žโˆˆ[๐พ] ๐‘ค๐‘Ž KL ๐‘„๐‘Ž , ๐‘ƒ๐‘Ž,0 . โ€ข Ours (Correct): Regret๐‘ƒ0 โ‰ค inf ๐‘ค inf ๐‘„โˆˆ๐ด๐‘™๐‘ก(๐‘ƒ0) exp โˆ’๐‘‡ ฯƒ ๐‘Žโˆˆ[๐พ] ๐‘ค๐‘Ž KL ๐‘ƒ๐‘Ž,0 , ๐‘„๐‘Ž . โ€ข There exists a distribution under which any algorithm cannot attain ฮ“โˆ—. Impossibility theorem: there exists a distribution ๐‘ƒ0 โˆˆ ๐’ซ under which Regret๐‘ƒ0 โ‰ฅ inf ๐‘ค inf ๐‘„โˆˆ๐ด๐‘™๐‘ก(๐‘ƒ0) exp โˆ’๐‘‡ เท ๐‘Žโˆˆ[๐พ] ๐‘ค๐‘Ž KL ๐‘„๐‘Ž , ๐‘ƒ๐‘Ž,0 .
  59. 64 Why? โ—ผ Why the conjectured lower bound does not

    work? โ—ผ There are several technical issues. 1. We cannot compute an ideal treatment-allocation probability from the conjectured lower bound. โ€ข Recall that the lower bound is given as โ„™๐‘ƒ0 เทœ ๐‘Ž๐‘‡ โ‰  ๐‘Ž๐‘ƒ0 โˆ— โ‰ฅ ๐‘‰โˆ— = inf ๐‘„โˆˆ๐ด๐‘™๐‘ก(๐‘ƒ0) exp โˆ’๐‘‡ เท ๐‘Žโˆˆ[๐พ] ๐‘ค๐‘Ž KL ๐‘„๐‘Ž , ๐‘ƒ๐‘Ž,0 , Here, ๐‘ค๐‘Ž = 1 ๐‘‡ ๐”ผ๐‘„ ฯƒ๐‘ก=1 ๐‘‡ 1[๐ด๐‘ก = ๐‘Ž] depends on ๐‘„. โ†’ min ๐‘„ min ๐‘ค exp โˆ’๐‘‡ ฯƒ ๐‘Žโˆˆ[๐พ] ๐‘ค๐‘Ž KL ๐‘„๐‘Ž , ๐‘ƒ๐‘Ž,0 โ‰  inf ๐‘ค inf ๐‘„โˆˆ๐ด๐‘™๐‘ก(๐‘ƒ0) exp โˆ’๐‘‡ ฯƒ ๐‘Žโˆˆ[๐พ] ๐‘ค๐‘Ž KL ๐‘„๐‘Ž , ๐‘ƒ๐‘Ž,0 . โ†’ We cannot obtain an ideal allocation probability from this lower bound. 2. There exists ๐‘ƒ0 such that โ„™๐‘ƒ0 เทœ ๐‘Ž๐‘‡ โ‰  ๐‘Ž๐‘ƒ0 โˆ— โ‰ฅ inf ๐‘ค inf ๐‘„โˆˆ๐ด๐‘™๐‘ก(๐‘ƒ0) exp โˆ’๐‘‡ ฯƒ ๐‘Žโˆˆ[๐พ] ๐‘ค๐‘Ž KL ๐‘„๐‘Ž , ๐‘ƒ๐‘Ž,0 .
  60. 65 Uncertainty Evaluations โ—ผ impossibility theorems for distribution-dependent analysis โ€ข

    There exists a distribution under which we can derive a lower bound larger than the one conjectured from Kaufmannโ€™s lemma. โ€ข If we aim to develop an experiment that is optimal for any distribution, we can find a counterexample. โ—ผ Two solutions. โ€ข Restrict a distribution class. โ€ข Minimax and Bayes optimality ๐’ซ: a set of distributions Performance measure Worst case Bayes (Average of the performances weighted by priors) Distribution dependent
  61. 66 Locally minimax optimal experiment for โ„™๐ เทœ ๐‘Ž๐‘‡ โ‰ 

    ๐‘Ž๐ โˆ— Kato, M. (2024). Generalized Neyman allocation for locally minimax optimal best-arm identification. Preprint. โ—ผ Locally minimax optimal asymptotically optimal experiment for โ„™๐œ‡ เทœ ๐‘Ž๐‘‡ โ‰  ๐‘Ž๐œ‡ โˆ— . โ—ผ Generalized Neyman allocation (GNA): set an ideal treatment-allocation probability as ๐‘ค๐‘Ž๐œ‡ โˆ— โˆ— = ๐œŽ๐‘Ž๐œ‡ โˆ— ๐œŽ๐‘Ž๐œ‡ โˆ— + ฯƒ ๐‘โˆˆ[๐พ]โˆ– ๐‘Ž๐œ‡ โˆ— ๐œŽ๐‘ 2 , ๐‘ค๐‘Ž โˆ— = 1 โˆ’ ๐œ‹๐‘Ž๐œ‡ โˆ— โˆ— ๐œŽ๐‘Ž 2 ฯƒ ๐‘โˆˆ[๐พ]โˆ– ๐‘Ž๐œ‡ โˆ— ๐œŽ๐‘ 2 for all ๐‘Ž โˆˆ ๐พ โˆ– ๐‘Ž๐œ‡ โˆ— . โ—ผ GNA-A2IPW experiment: estimate the GNA and return the best arm using the A2IPW estimator. โ—ผ Local minimax optimality: sup ๐œ‡โˆˆฮ˜ฮ” lim sup ๐‘‡โ†’โˆž โ„™๐‘ƒ เทœ ๐‘Ž๐‘‡ ๐›ฟGNAโˆ’A2IPW โ‰  ๐‘Ž๐‘ƒ โˆ— โ‰ค min ๐‘Ž exp โˆ’ ฮ”2๐‘‡ 2 ๐œŽ๐‘Ž + ฯƒ ๐‘โˆˆ ๐พ โˆ– ๐‘Ž ๐œŽ๐‘ 2 โ‰ค inf ๐›ฟโˆˆโ„ฐ sup ๐œ‡โˆˆฮ˜ฮ” lim inf ๐‘‡โ†’โˆž โ„™๐‘ƒ เทœ ๐‘Ž๐‘‡ ๐›ฟ โ‰  ๐‘Ž๐‘ƒ โˆ— .
  62. 67 Related literature โ€ข Glynn and Juneja (2004) develops the

    large-deviation optimal experiments for treatment choice. โ€ข Kasy and Sautman (2021) shows that Russo (2021)โ€™s Top-Two-Thompson sampling is optimal in the sense that the upper bound matches the bound in Glynn and Juneja (2004). โ€ข Kaufmann, Cappรฉ, and Garivier (2016) develops a general lower bound for the bandit problem. โ—ผ Impossibility theorems. โ€ข Ariu, Kato, Komiyama, McAlinn, and Qin (2025) finds a counterexample for Kasy and Sautmann (2021). โ€ข Degenne (2023) shows that the bounds in Glynn and Juneja (2004) is a special case of Kaufmannโ€™s bounds under strong assumptions. โ€ข Wang, Ariu, and Proutier (2024) and Imbens, Qin, and Wager also derive counterexamples.
  63. 69 Concluding remarks โ—ผ Adaptive experimental design for efficient ATE

    estimation and treatment choice. โ—ผ ATE estimation. โ€ข Goal. Estimation of the ATE with a smaller asymptotic variance. โ€ข Lower bound. Efficiency bound (Hahn, 1998). โ€ข Ideal treatment-allocation probability. = Neyman allocation.
  64. 70 Concluding remarks โ—ผ Treatment choice (BAI). โ€ข Goal. Choosing

    the best treatment arm. โ€ข A lower bound and an ideal treatment-allocation probability depend on the uncertainty. โ€ข Distribution-dependent analysis: โ€ข No globally optimal algorithm exists for Kaufmannโ€™s lower bound. โ€ข Impossibility theorems: there exists a distribution under which a lower bound is larger than Kaufmannโ€™s one. โ€ข Locally optimal algorithm. โ€ข Minimax and Bayes analysis
  65. 71 Concluding remarks โ—ผ Policy learning. โ€ข We can develop

    a rate-optimal experiment for policy learning. โ€ข The convergence rate regarding the variance and sample size ๐‘‡ match the lower bound. โ€ข There remain other constant terms. โ€ข Tighter lower and upper bounds? โ€ข We may need to reconsider the EWM approach.
  66. 72 Reference โ€ข Masahiro Kato, Takuya Ishihara, Junya Honda, and

    Yusuke Narita. E๏ฌƒcient adaptive experimental design for average treatment e๏ฌ€ect estimation, 2020. arXiv:2002.05308. โ€ข Masahiro Kato, Kenichiro McAlinn, and Shota Yasui. The adaptive doubly robust estimator and a paradox concerning logging policy. In International Conference on Neural Information Processing Systems (NeurIPS), 2021. โ€ข Kaito Ariu, Masahiro Kato, Junpei Komiyama, Kenichiro McAlinn, and Chao Qin. A comment on โ€œadaptive treatment assignment in experiments for policy choiceโ€, 2021. โ€ข Masahiro Kato. Generalized Neyman allocation for locally minimax optimal best-arm identification, 2024a. arXiv: 2405.19317. โ€ข Masahiro Kato. Locally optimal fixed-budget best arm identification in two-armed gaussian bandits with unknown variances, 2024b. arXIV: 2312.12741. โ€ข Masahiro Kato and Kaito Ariu. The role of contextual information in best arm identification, 2021. Accepted for Journal of Machine Learning Research conditioned on minor revisions. โ€ข Masahiro Kato, Akihiro Oga, Wataru Komatsubara, and Ryo Inokuchi. Active adaptive experimental design for treatment e๏ฌ€ect estimation with covariate choice. In International Conference on Machine Learning (ICML), 2024a. โ€ข Masahiro Kato, Kyohei Okumura, Takuya Ishihara, and Toru Kitagawa. Adaptive experimental design for policy learning, 2024b. arXiv: 2401.03756. โ€ข Junpei Komiyama, Kaito Ariu, Masahiro Kato, and Chao Qin. Rate-optimal bayesian simple regret in best arm identification. Mathematics of Operations Research, 2023.
  67. 73 Reference โ€ข van der Vaart, A. (1998), Asymptotic Statistics,

    Cambridge Series in Statistical and Probabilistic Mathematics, Cambridge University Press. โ€ข Tabord-Meehan, M. (2022), โ€œStratification Trees for Adaptive Randomization in Randomized Controlled Trials,โ€ The Review of Economic Studies. โ€ข van der Laan, M. J. (2008), โ€œThe Construction and Analysis of Adaptive Group Sequential Designs,โ€ https://biostats.bepress.com/ucbbiostat/paper232. โ€ข Neyman, J. (1923), โ€œSur les applications de la theorie des probabilites aux experiences agricoles: Essai des principes,โ€ Statistical Science, 5, 463โ€“472. โ€ข Neyman, J. (1934), โ€œOn the Two Different Aspects of the Representative Method: the Method of Stratified Sampling and the Method of Purposive Selection,โ€ Journal of the Royal Statistical Society, 97, 123โ€“150. โ€ข Manski, C. F. (2002), โ€œTreatment choice under ambiguity induced by inferential problems,โ€ Journal of Statistical Planning and Inference, 105, 67โ€“82. โ€ข Manski (2004), โ€œStatistical Treatment Rules for Heterogeneous Populations,โ€ Econometrica, 72, 1221โ€“1246.
  68. 74 Reference โ€ข Kitagawa, T. and Tetenov, A. (2018), โ€œWho

    Should Be Treated? Empirical Welfare Maximization Methods for Treatment Choice,โ€ Econometrica, 86, 591โ€“616. โ€ข Garivier, A. and Kaufmann, E. (2016), โ€œOptimal Best Arm Identification with Fixed Confidence,โ€ in Conference on Learning Theory. โ€ข Glynn, P. and Juneja, S. (2004), โ€œA large deviations perspective on ordinal optimization,โ€ in Proceedings of the 2004 Winter Simulation Conference, IEEE, vol. 1. โ€ข Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W., and Robins, J. (2018), โ€œDouble/debiased machine learning for treatment and structural parameters,โ€ The Econometrics Journal. โ€ข Degenne, R. (2023), โ€œOn the Existence of a Complexity in Fixed Budget Bandit Identification,โ€ Conference on Learning Theory (COLT). โ€ข Kasy, M. and Sautmann, A. (2021), โ€œAdaptive Treatment Assignment in Experiments for Policy Choice,โ€ Econometrica, 89, 113โ€“ 132. โ€ข Rubin, D. B. (1974), โ€œEstimating causal effects of treatments in randomized and nonrandomized studies,โ€ Journal of Educational Psychology.