Upgrade to PRO for Only $50/Yearโ€”Limited-Time Offer! ๐Ÿ”ฅ

Minimax and Bayes Optimal Best-arm Identificati...

Avatar for MasaKat0 MasaKat0
July 08, 2025

Minimax and Bayes Optimal Best-arm Identification: Adaptive Experimental Design for Treatmentย Choice

Avatar for MasaKat0

MasaKat0

July 08, 2025
Tweet

More Decks by MasaKat0

Other Decks in Research

Transcript

  1. 1 Minimax and Bayes Optimal Best-arm Identification: Adaptive Experimental Design

    for Treatment Choice ็ตฑ่จˆ่ผช่ฌ› Masahiro Kato https://arxiv.org/abs/2506.24007
  2. 2 Experimental approach for causal inference โ—ผ The gold standard

    for causal inference is randomized controlled trials (RCTs). โ€ข We randomly allocate treatments for experimental units. โ—ผ RCTs are the gold standard but often costly and inefficient. โ†’ Design more efficient experiments (in some sense). โžข Adaptive experiment design. โ€ข Update treatment-allocation during an experiment to gain efficiency. โ—ผ How do we define an ideal treatment-allocation probability (propensity score)?
  3. 3 Table of contents โ—ผ I explain how we design

    adaptive experiments for treatment choice. 1. (Introduction) General approach for adaptive experimental design. 2. Problem setting. 3. Algorithm and optimality. 4. Theoretical analysis. 5. On the optimal experimental design for treatment choice
  4. 5 How to Design Adaptive Experiments? โ—ผ A typical procedure

    of experimental design for causal inference: Step 1. Define the goal of causal inference and the performance measure. Step 2. Compute a lower bound and an ideal treatment-allocation probability. Step 3. Conduct the designed adaptive experiment Allocate treatment arms while estimating the ideal treatment-allocation probability. Return a target of interest using the observation in the adaptive experiment. Step 4. Investigate the performance of the designed experiment and confirm the optimality.
  5. 6 Step 1: goals and performance measures โ—ผ Goal 1:

    Average treatment effect (ATE) estimation. * The number of treatment arms is usually two (binary treatments). โ€ข Goal. Estimate the ATE. โ€ข Performance measure. The (Asymptotic) Variance. โ€ข Smaller asymptotic variance is better. โ—ผ Goal 2: Treatment choice, also known as best-arm identification. * The number of treatments is more than or equal to two (multiple treatments). โ€ข Goal. Choose the best treatment = a treatment whose expected outcome is the highest. โ€ข Performance measure. The probability of misidentifying the best treatment or the regret.
  6. 7 Step 2: lower bounds and ideal allocation probability โ—ผ

    After deciding the goal and performance metric, we develop a lower bound. โ—ผ ATE estimation. โ€ข Semiparametric efficiency bound (Hahn, 1998). โ—ผ Best-arm identification. โ€ข Various lower bounds have been proposed. No consensus for which one we should use. โœ“ Lower bounds are often functions of treatment-allocation probabilities (propensity score). โ†’ We can minimize the lower bound regarding treatment-allocation probabilities. We refer to probabilities that minimize lower bounds as an ideal treatment-allocation probabilities.
  7. 8 Step 3: adaptive experiment โ—ผ Ideal allocation probabilities usually

    depend on unknown parameters of a distribution. โ€ข We estimate the probabilities (unknown parameters) during an experiment. โ—ผ We run an adaptive experiment, which consists of the following two phases. 1. Treatment-allocation phase: in each round ๐‘ก = 1,2, โ€ฆ , ๐‘‡: โ€ข Estimate the ideal treatment-allocation probability based on past observations. โ€ข Allocate treatment following the estimated ideal treatment-allocation probability. 2. Decision-making phase: at the end of the experiment: โ€ข Return an estimate of a target of interest.
  8. 9 Step 4: upper bound and optimality โ—ผ For the

    designed experiment, we investigate its theoretical performance. โžข ATE estimation. โ€ข We prove the asymptotic normality and check the asymptotic variance. โ€ข If the asymptotic variance matches the efficiency bound minimized for the treatment- allocation probability, the design is (asymptotically) optimal. โžข Best-arm identification. โ€ข We investigate the probability of misidentifying the best arm or the regret. โ€ข Distribution-dependent, minimax and Bayes optimality.
  9. 11 From ATE estimation to treatment choice โ—ผ From ATE

    estimation to decision-making (treatment choice. Manski, 2004). โ—ผ Treatment choice via adaptive experiments is called best-arm identification (BAI). โ€ข BAI has been investigated in various areas, including operations research and economics. Gathering data Parameter estimation Decision-making ATE estimation Treatment choice
  10. 12 Setup โ—ผ Treatment. ๐พ treatments indexed by 1,2, โ€ฆ

    ๐พ๏ผŽ Also referred to as treatment arms or arms. โ—ผ Potential outcome. ๐‘Œ๐‘Ž โˆˆ โ„๏ผŽ โ—ผ No covariates. โ—ผ Distribution. ๐‘Œ๐‘Ž follows a parametric distribution ๐‘ƒ ๐ , where ๐ = ๐œ‡๐‘Ž ๐‘Žโˆˆ[๐พ] โˆˆ โ„๐พ. โ€ข The mean of ๐‘Œ๐‘Ž is ๐œ‡๐‘Ž (and only the mean parameters can take different values). โ€ข For simplicity, let ๐‘ƒ ๐ be a Gaussian distribution under which the variance of ๐‘Œ๐‘Ž is fixed at ๐œŽ๐‘Ž 2. โ—ผ Goal. Find the following best treatment arm efficiently: ๐‘Ž๐ โˆ— โ‰” arg max ๐‘Žโˆˆ 1,2,โ€ฆ,๐พ ๐œ‡๐‘Ž .
  11. 13 Setup โ—ผ Adaptive experimental design with ๐‘ป rounds: โ€ข

    Treatment-allocation phase: in each round ๐‘ก = 1,2, โ€ฆ , ๐‘‡: โ€ข Allocate treatment ๐ด๐‘ก โˆˆ ๐พ โ‰” 1,2, โ€ฆ , ๐พ using ๐ด๐‘  , ๐‘Œ๐‘  ๐‘ =1 ๐‘กโˆ’1. โ€ข Observe the outcome ๐‘Œ๐‘ก = ฯƒ ๐‘Žโˆˆ ๐พ 1 ๐ด๐‘ก = ๐‘Ž ๐‘Œ๐‘Ž,๐‘ก . โ€ข Decision-making phase: at the end of the experiment: โ€ข Obtain an estimator เทœ ๐‘Ž๐‘‡ of the the best treatment arm using ๐‘‹๐‘ก , ๐ด๐‘ก , ๐‘Œ๐‘ก ๐‘ก=1 ๐‘‡ . Round ๐’• Unit ๐‘ก Treatment ๐ด๐‘ก Outcome ๐‘Œ๐‘ก After round ๐‘ป Choose the best treatment.
  12. 14 Performance measures โ€ข Let โ„™๐ and ๐”ผ๐ be the

    probability law and expectation under ๐‘ƒ ๐ . โ—ผ Probability of misidentification: โ„™๐ เทœ ๐‘Ž๐‘‡ โ‰  ๐‘Ž๐ โˆ— . โžข The probability that the estimate of the best treatment เทœ ๐‘Ž๐‘‡ is not the true best one ๐‘Ž๐œ‡ โˆ— . โ—ผ Expected simple regret: Regret๐ โ‰” ๐”ผ๐ ๐‘Œ๐‘Ž๐ โˆ— โˆ’ ๐‘Œเทœ ๐‘Ž๐‘‡ . โžข The welfare loss when we deploy the estimate of the best treatment เทœ ๐‘Ž๐‘‡ for the population. โ€ข In this talk, we simply refer this regret as the regret. โ€ข Also called the out-of-sample regret or the policy regret (Kasy and Sautmann, 2021). E.g., in-sample regret in regret minimization.
  13. 15 Evaluation framework โ—ผ Distribution-dependent analysis. โ€ข Evaluate the performance

    measures given ๐ fixed for ๐‘‡. โ€ข For each ๐ โˆˆ โ„๐พ, we evaluate โ„™๐ เทœ ๐‘Ž๐‘‡ โ‰  ๐‘Ž๐ โˆ— or Regret๐ โ—ผ Minimax analysis. โ€ข Evaluate the performance measures for the worst-case ๐. โ€ข We evaluate sup ๐โˆˆโ„๐พ โ„™๐ เทœ ๐‘Ž๐‘‡ โ‰  ๐‘Ž๐ โˆ— or sup ๐โˆˆโ„๐พ Regret๐ โ—ผ Bayes analysis. โ€ข A prior ฮ  is given. โ€ข Evaluate the performance measures by summing them weighted by the prior. โ€ข We evaluate ืฌ โ„๐พ โ„™๐ เทœ ๐‘Ž๐‘‡ โ‰  ๐‘Ž๐ โˆ— ๐‘‘ฮ (๐œ‡) or ืฌ โ„๐พ Regret๐ ๐‘‘ฮ (๐).
  14. 17 Minimax and Bayes optimal experiment for the regret Kato,

    M. Minimax and Bayes optimal best-arm identification: adaptive experimental design for treatment choice. โ€ข We design an experiment for treatment choice (best-arm identification). โ€ข The designed experiment is minimax and Bayes optimality for the (expected simple) regret.
  15. 18 Minimax and Bayes optimal experiment for the regret โžข

    Our designed experiment consists of the following two elements: โ€ข Treatment-allocation phase: two-stage sampling. โ€ข In the first stage, we remove clearly suboptimal treatments and estimate the variances. โ€ข In the second stage, we allocate treatment arms with a variant of the Neyman allocation. โ€ข Decision-making phase: choice of the empirical best arm (Bubeck et al., 2011, Manski, 2004). โ†’ We refer to our designed experiment as the TS-EBA experiment.
  16. 19 Minimax and Bayes optimal experiment for the regret โ—ผ

    Treatment-allocation phase: We introduce the following two-stage sampling: โ€ข Split ๐‘‡ into the first stage with ๐‘Ÿ๐‘‡ rounds and the second stage with 1 โˆ’ ๐‘Ÿ ๐‘‡ rounds. 1. First stage: โ€ข Allocate each treatment with equivalent ratio ๐‘Ÿ๐‘‡/๐พ. โ€ข Using concentration inequality, we select candidates of the best treatments as แˆ˜ ๐’ฎ โІ ๐พ . โ€ข Obtain an estimator เทœ ๐œŽ๐‘Ž,๐‘Ÿ๐‘‡ 2 of the variance ๐œŽ๐‘Ž 2. (sample variance is fine). 2. Second stage (we do not allocate treatment ๐‘Ž โˆ‰ แˆ˜ ๐’ฎ from this stage): โ€ข If แˆ˜ ๐’ฎ = 1, we return an arm ๐‘Ž โˆˆ แˆ˜ ๐’ฎ as an estimate of the best arm. โ€ข If แˆ˜ ๐’ฎ = 2, we allocate treatment ๐‘Ž โˆˆ แˆ˜ ๐’ฎ with probability ๐‘ค๐‘Ž,๐‘ก โ‰” เทœ ๐œŽ๐‘Ž,๐‘Ÿ๐‘‡ / เทœ ๐œŽ1,๐‘Ÿ๐‘‡ + เทœ ๐œŽ2,๐‘Ÿ๐‘‡ . โ€ข If แˆ˜ ๐’ฎ โ‰ฅ 3, we allocate treatment ๐‘Ž โˆˆ แˆ˜ ๐’ฎ with probability ๐‘ค๐‘Ž,๐‘ก โ‰” เทœ ๐œŽ๐‘Ž,๐‘Ÿ๐‘‡ 2 / ฯƒ ๐‘โˆˆ แˆ˜ ๐’ฎ เทœ ๐œŽ๐‘,๐‘Ÿ๐‘‡ 2 .
  17. 20 Minimax and Bayes optimal experiment for the regret โ—ผ

    How to select candidates แˆ˜ ๐’ฎ? โ€ข Let ๐‘ฃ๐‘Ž,๐‘ž๐‘‡ โ‰” log ๐‘‡ ๐‘‡ max ๐‘Ž เทœ ๐œŽ๐‘Ž,๐‘Ÿ๐‘‡ , and เทœ ๐‘Ž๐‘Ÿ๐‘‡ = arg max ๐‘Žโˆˆ[๐พ] เทœ ๐œ‡๐‘Ž,๐‘Ÿ๐‘‡ . โ€ข Construct lower and upper confidence bounds as เทœ ๐œ‡๐‘Ž,๐‘Ÿ๐‘‡ โˆ’ ๐‘ฃ๐‘Ž,๐‘ž๐‘‡ and เทœ ๐œ‡๐‘Ž,๐‘Ÿ๐‘‡ ๏ผ‹๐‘ฃ๐‘Ž,๐‘ž๐‘‡ . โ€ข Define แˆ˜ ๐’ฎ as แˆ˜ ๐’ฎ โ‰” ๐‘Ž โˆˆ ๐พ โˆถ เทœ ๐œ‡๐‘Ž,๐‘Ÿ๐‘‡ ๏ผ‹๐‘ฃ๐‘Ž,๐‘ž๐‘‡ โ‰ฅ เทœ ๐œ‡เทœ ๐‘Ž๐‘Ÿ๐‘‡,๐‘Ÿ๐‘‡ โˆ’ ๐‘ฃเทœ ๐‘Ž๐‘Ÿ๐‘‡,๐‘Ÿ๐‘‡ .
  18. 21 Minimax and Bayes optimal experiment for the regret โ—ผ

    Decision-making phase: โ€ข Just choose the treatment arm with the highest mean: เทœ ๐‘Ž๐‘‡ = arg max ๐‘Žโˆˆ[๐พ] เทœ ๐œ‡๐‘Ž,๐‘‡ . โ€ข เทœ ๐œ‡๐‘Ž,๐‘‡ โ‰” 1 ฯƒ๐‘ก=1 ๐‘‡ 1 ๐ด๐‘ก=๐‘Ž ฯƒ๐‘ก=1 ๐‘‡ 1 ๐ด๐‘ก = ๐‘Ž ๐‘Œ๐‘ก is the sample mean. โ—ผ Denote the TS-EBA experiment by ๐›ฟTSโˆ’EBA. โ€ข We also denote เทœ ๐‘Ž๐‘‡ by เทœ ๐‘Ž๐‘‡ ๐›ฟTSโˆ’EBA when we clarify its dependence on the experiment. โ€ข Similarly, let Regret๐ be the regret of ๐›ฟTSโˆ’EBA under ๐.
  19. 22 Minimax and Bayes optimal experiment for the regret โ—ผ

    For the designed experiment, we can prove the minimax and Bayes optimality. โ€ข For deriving lower bounds, we restrict the class of experiments. โ—ผ Regular experiments. โ€ข Let โ„ฐ be a set of regular experiments. โ€ข For any ๐›ฟ โˆˆ โ„ฐ, the followings hold: โ€ข If ๐‘‡ ๐œ‡๐‘Ž๐œ‡ โˆ— โˆ’ ๐œ‡๐‘Ž โ†’ โˆž holds for all ๐‘Ž โˆˆ [๐พ], we have โ„™๐œ‡ เทœ ๐‘Ž๐‘‡ ๐›ฟ โ‰  ๐‘Ž๐œ‡ โˆ— โ†’ 0 as ๐‘‡ โ†’ โˆž. โ€ข There exists ๐‘Ž โˆˆ [๐พ] such that ๐‘‡ ๐œ‡๐‘Ž๐œ‡ โˆ— โˆ’ ๐œ‡๐‘Ž โ†’ 0 holds, there exists a constant ๐ถ โˆˆ (0, 1) independent of ๐‘‡ such that โ„™๐œ‡ เทœ ๐‘Ž๐‘‡ ๐›ฟ โ‰  ๐‘Ž๐œ‡ โˆ— โ†’ ๐ถ as ๐‘‡ โ†’ โˆž.
  20. 23 Minimax and Bayes optimal experiment for the regret โžข

    Minimax optimality. โ€ข If ๐พ = 2, lim ๐‘‡โ†’โˆž sup ๐โˆˆโ„๐พ ๐‘‡ Regret๐ ๐›ฟTSโˆ’EBA โ‰ค 1 ๐‘’ ๐œŽ1 + ๐œŽ2 โ‰ค inf ๐›ฟโˆˆโ„ฐ lim ๐‘‡โ†’โˆž sup ๐โˆˆโ„๐พ ๐‘‡ Regret๐ ๐›ฟ. โ€ข If ๐พ โ‰ฅ 3, lim ๐‘‡โ†’โˆž sup ๐โˆˆโ„๐พ ๐‘‡ Regret๐ ๐›ฟTSโˆ’EBA โ‰ค 2 1 + ๐พ โˆ’ 1 ๐พ เท ๐‘Žโˆˆ ๐พ ๐œŽ๐‘Ž 2 log ๐พ
  21. 24 Minimax and Bayes optimal experiment for the regret โžข

    Bayes optimality. lim ๐‘‡โ†’โˆž ๐‘‡ เถฑ ๐โˆˆโ„๐พ Regret๐ ๐›ฟTSโˆ’EBA ๐‘‘ฮ (๐) โ‰ค 4 เท ๐‘Žโˆˆ ๐พ เถฑ ๐โˆ– ๐‘Ž โˆˆโ„๐พโˆ’1 ๐œŽโˆ– ๐‘Ž 2โˆ— โ„Ž๐‘Ž ๐œ‡โˆ– ๐‘Ž โˆ— ๐โˆ– ๐‘Ž ๐‘‘๐ปโˆ– ๐‘Ž ๐โˆ– ๐‘Ž โ‰ค inf ๐›ฟโˆˆโ„ฐ lim ๐‘‡โ†’โˆž ๐‘‡ เถฑ ๐โˆˆโ„๐พ Regret๐œ‡ ๐›ฟ ๐‘‘ฮ  ๐ . โ€ข ๐œŽโˆ– ๐‘Ž 2โˆ— = ๐œŽ๐‘โˆ— 2 and ๐œ‡โˆ– ๐‘Ž โˆ— = ๐œ‡๐‘โˆ—, where ๐‘โˆ— be arg max ๐‘โˆˆ ๐พ โˆ–{๐‘Ž} ๐œ‡๐‘Ž . โ€ข ๐โˆ– ๐‘Ž is a parameter vector which removes ๐œ‡๐‘Ž from ๐. โ€ข ๐ปโˆ– ๐‘Ž ๐โˆ– ๐‘Ž is a prior of ๐โˆ– ๐‘Ž .
  22. 25 Intuition โ—ผ Regret๐ = ๐”ผ๐ ๐‘Œ๐‘Ž๐œ‡ โˆ— โˆ’ ๐”ผ๐

    ๐‘Œเทœ ๐‘Ž๐‘‡ = ฯƒ ๐‘โ‰ ๐‘Ž๐œ‡ โˆ— ๐œ‡๐‘Ž๐œ‡ โˆ— โˆ’ ๐œ‡๐‘ โ„™๐ (เทœ ๐‘Ž๐‘‡ = ๐‘). โ€ข โ„™๐ เทœ ๐‘Ž๐‘‡ = ๐‘ = exp โˆ’๐ถ1 ๐‘‡ ๐œ‡๐‘Ž๐œ‡ โˆ— โˆ’ ๐œ‡๐‘ 2 . โ€ข If ๐œ‡๐‘Ž๐œ‡ โˆ— โˆ’ ๐œ‡๐‘ converges to zero slower than 1/ ๐‘‡, we have โ„™๐ เทœ ๐‘Ž๐‘‡ = ๐‘ โ†’ 0. (๐‘ โˆ‰ แˆ˜ ๐’ฎ๐‘Ÿ๐‘‡ ) โ€ข If ๐œ‡๐‘Ž๐œ‡ โˆ— โˆ’ ๐œ‡๐‘ = ๐ถ2 / ๐‘‡, we have โ„™๐ เทœ ๐‘Ž๐‘‡ = ๐‘ = constant. (๐‘ โˆˆ แˆ˜ ๐’ฎ๐‘Ÿ๐‘‡ ) Treatment 1 Treatment 2 ๐œ‡1 ๐œ‡2 Treatment 3 ๐œ‡3 ๐œ‡2 Treatment 4 Treatment 5 ๐œ‡5 แˆ˜ ๐’ฎ๐‘Ÿ๐‘‡ . ๐ถ2 / ๐‘‡
  23. 26 Intuition โ—ผ When two treatments are given, we allocate

    them so as to maximize the discrepancy between their confidence intervals. โ—ผ When multiple treatments are given, there is no unique way to compare their confidence intervals. Treatment 1 Treatment 2 ๐œ‡1 ๐œ‡2 Confidence intervals Treatment 1 Treatment 2 ๐œ‡1 ๐œ‡2 Treatment 3 ๐œ‡3 โ†’ It depends on the performance measures and uncertainty evaluation Maximize โ†’ This task is closely related to make the variance of ATE estimators smaller. โ†’ Use Neyman allocation.
  24. 27 Bayes optimal experiment for the simple regret โ—ผ Our

    designed experiment and theory are applicable for cases with other distributions. E.g., Bernoulli distributions. Komiyama, J., Ariu, K., Kato, M., and Qin, C. (2023). Rate-optimal Bayesian simple regret in best arm identification. Mathematics of Operations Research. โ€ข Bayes optimal experiment for treatment choice with Bernoulli distributions. โ€ข Komiyama et al. (2023) corresponds to a simplified case of our new results. โ€ข In our previous study, we only consider Bernoulli bandits and the lower bound is not tight (there exists a constant gap between the upper and lower bounds). โ€ข In Bernoulli cases, the Neyman allocation becomes the uniform sampling. โ€ข No need for estimating the variances.
  25. 28 Related literature โ—ผ Ordinal optimization: The best-arm identification has

    been considered as ordinal optimization. In this setting, we assume that ideal allocation probability is known. โ€ข Chen (2000) considers Gaussian outcomes. Glynn and Juneja (2004) considers more general distributions. โ—ผ Bestโ€“arm identification: โ€ข Audibert, Bubeck, and Munos (2010) establishes the best-arm identification framework. An ideal allocation probability is unknown, and we need to estimate it. โ€ข Bubeck, Munos, and Stoltz (2011) proposes minimax-rate optimal algorithm.
  26. 29 Related literature โ—ผ Limit-of-experiment (or local asymptotic normality) framework

    for experimental design: Tools for developing experiments with matching lower and upper bounds by assuming locality for the underlying distributions. โ€ข Hirano and Porter (2008) introduces this framework for treatment choice. โ€ข Armstong (2021) and Hirano and Porter (2023) apply the framework for adaptive experiments. โ€ข Adusumilli (2025) proposes adaptive experimental design based on the limit-of-experiment framework and the diffusion-process theory. โžข I guess that we can derive tight lower and upper bounds without using the limit-of-experiment. โ€ข 1/ ๐‘‡ regimes (locality) directedly appear as a result of the worst-case for the regret.
  27. 31 Preliminary: regret decomposition โ—ผ The simple regret can be

    written as Regret๐ = ๐”ผ๐ ๐‘Œ๐‘Ž๐œ‡ โˆ— โˆ’ ๐”ผ๐ ๐‘Œเทœ ๐‘Ž๐‘‡ = เท ๐‘โ‰ ๐‘Ž๐œ‡ โˆ— ๐œ‡๐‘Ž๐œ‡ โˆ— โˆ’ ๐œ‡๐‘ โ„™๐ (เทœ ๐‘Ž๐‘‡ = ๐‘). โ€ข The regret (Regret๐ ) is equal to the sum of the probability (โ„™๐ (เทœ ๐‘Ž๐‘‡ = ๐‘)) weighted by the ATE (๐œ‡๐‘Ž๐œ‡ โˆ— โˆ’ ๐œ‡๐‘ ) between the best and suboptimal arms. โ€ข Also holds that Regret๐ โ‰ค ฯƒ ๐‘โ‰ ๐‘Ž๐œ‡ โˆ— ๐œ‡๐‘Ž๐œ‡ โˆ— โˆ’ ๐œ‡๐‘ โ„™๐ (เทœ ๐‘Ž๐‘‡ โ‰  ๐‘Ž๐œ‡ โˆ— ). โ—ผ โ„™๐ เทœ ๐‘Ž๐‘‡ โ‰  ๐‘Ž๐œ‡ โˆ— roughly upper bounded by เท ๐‘โ‰ ๐‘Ž๐œ‡ โˆ— exp โˆ’๐ถโˆ— ๐‘‡ ๐œ‡๐‘Ž๐œ‡ โˆ— โˆ’ ๐œ‡๐‘ 2 for some constant ๐ถโˆ— > 0. โ€ข C.f., Chernoff bound. Large-deviation principles.
  28. 32 Preliminary: evaluation framework โ—ผ Distribution-dependent analysis. โ€ข Evaluate the

    performance measures given ๐ fixed for ๐‘‡. โ€ข For each ๐ โˆˆ โ„๐พ, we evaluate โ„™๐ เทœ ๐‘Ž๐‘‡ โ‰  ๐‘Ž๐ โˆ— or Regret๐ โ—ผ Minimax analysis. โ€ข Evaluate the performance measures for the worst-case ๐. โ€ข We evaluate sup ๐โˆˆโ„๐พ โ„™๐ เทœ ๐‘Ž๐‘‡ โ‰  ๐‘Ž๐œ‡ โˆ— or sup ๐โˆˆโ„๐พ Regret๐ โ—ผ Bayes analysis. โ€ข A prior ฮ  is given. โ€ข Evaluate the performance measures by summing them weighted by the prior. โ€ข We evaluate ืฌ โ„๐พ โ„™๐ เทœ ๐‘Ž๐‘‡ โ‰  ๐‘Ž๐ โˆ— ๐‘‘ฮ (๐œ‡) or ืฌ โ„๐พ Regret๐ ๐‘‘ฮ (๐œ‡).
  29. 33 Preliminary: distribution-dependent, minimax, and Bayes analysis โ—ผ To obtain

    an intuition, we consider a binary case (๐พ = 2), where ๐‘Ž๐œ‡ โˆ— = 1. Regret๐ = ๐œ‡1 โˆ’ ๐œ‡2 โ‹… โ„™๐ เทœ ๐‘Ž๐‘‡ = 2 โ‰ค ๐œ‡1 โˆ’ ๐œ‡2 โ‹… exp โˆ’๐ถโˆ— ๐‘‡ ๐œ‡1 โˆ’ ๐œ‡2 2 . โ—ผ Distribution-dependent analysis. โ†’ ๐œ‡1 โˆ’ ๐œ‡2 can be asymptotically ignored since it is constant. โ†’ It is enough to evaluate โ„™๐ เทœ ๐‘Ž๐‘‡ โ‰  ๐‘Ž๐œ‡ โˆ— for evaluating Regret๐‘ƒ . โ†’ We evaluate the term ๐ถโˆ— by 1 ๐‘‡ log โ„™๐ เทœ ๐‘Ž๐‘‡ โ‰  ๐‘Ž๐ โˆ— โ‰ˆ 1 ๐‘‡ log Regret๐ โ‰ˆ ๐ถโˆ—.
  30. 34 Preliminary: distribution-dependent, minimax, and Bayes analysis โ—ผ To obtain

    an intuition, we consider a binary case (๐พ = 2), where ๐‘Ž๐œ‡ โˆ— = 1. Regret๐ = ๐œ‡1 โˆ’ ๐œ‡2 โ‹… โ„™๐œ‡ เทœ ๐‘Ž๐‘‡ = 2 โ‰ค ๐œ‡1 โˆ’ ๐œ‡2 โ‹… exp โˆ’๐ถโˆ— ๐‘‡ ๐œ‡1 โˆ’ ๐œ‡2 2 . โ—ผ Minimax and Bayes analysis. โ€ข Probability of misidentification, โ„™๐ เทœ ๐‘Ž๐‘‡ โ‰  ๐‘Ž๐ โˆ— . โ€ข The rate of convergence to zero is still exponential (if the worst-case is well-defined). โ€ข Regret, Regret๐œ‡ . โ€ข Distributions whose means are ๐œ‡1 โˆ’ ๐œ‡2 = ๐‘‚ 1/ ๐‘‡ dominate the regret. โ€ข The rate of convergence to zero is ๐‘‚ 1/ ๐‘‡ . โ†’ The analysis differs between the probability of misidentification and the regret.
  31. 35 Preliminary: distribution-dependent, minimax, and Bayes analysis โ—ผ Define ๐›ฅ๐œ‡

    = ๐œ‡1 โˆ’ ๐œ‡2 . โ—ผ For some constant ๐ถ > 0, the regret Regret๐ = ๐›ฅ๐ (เทœ ๐‘Ž๐‘‡ โ‰  ๐‘Ž๐œ‡ โˆ— ) can be written as Regret๐ โ‰ˆ ๐›ฅ๐ exp โˆ’๐ถ๐‘‡ ๐›ฅ๐ 2 โ†’ The dependency of ๐›ฅ๐ on ๐‘‡ affects the evaluation of the regret. 1. ๐›ฅ๐œ‡ converges to zero with an order slower than 1/ ๐‘‡. โ†’ For some increasing function ๐‘”, the regret becomes Regret๐ โ‰ˆ exp โˆ’๐‘” ๐‘‡ . 2. ๐›ฅ๐œ‡ converges to zero with an order of 1/ ๐‘‡. โ†’ For some ๐ถ > 0, Regret๐œ‡ โ‰ˆ C ๐‘‡ . 3. ๐›ฅ๐œ‡ converges to zero with an order faster than 1/ ๐‘‡. โ†’ Regret๐ โ‰ˆ ๐‘œ(1/ ๐‘‡) โ†’ In the worst case, Regret๐ โ‰ˆ C ๐‘‡ , where ๐›ฅ๐œ‡ = ๐‘‚ 1 ๐‘‡ )๏ผŽ
  32. 36 Preliminary: bandit lower bound โ—ผ Lower bounds in the

    bandit problem are derived via the information theory. โ—ผ The following lemma is one of the most general and tight results for lower bounds. โ€ข Let ๐‘ƒ and ๐‘„ be two distributions with ๐พ arms such that for all ๐‘Ž, the distributions ๐‘ƒ(๐‘Ž) and ๐‘„(๐‘Ž) of ๐‘Œ(๐‘Ž) are mutually absolutely continuous. โ€ข We have เท ๐‘Žโˆˆ[๐พ] ๐”ผ๐‘ƒ เท ๐‘ก=1 ๐‘‡ 1[๐ด๐‘ก = ๐‘Ž] KL ๐‘ƒ ๐‘Ž , ๐‘„๐‘Ž โ‰ฅ sup โ„ฐโˆˆโ„ฑ๐‘‡ ๐‘‘ โ„™๐‘ƒ โ„ฐ , โ„™๐‘„ โ„ฐ . โ€ข ๐‘‘ ๐‘ฅ, ๐‘ฆ โ‰” ๐‘ฅ log ๐‘ฅ ๐‘ฆ + 1 โˆ’ ๐‘ฅ log 1โˆ’๐‘ฅ 1โˆ’๐‘ฆ is the binary relative entropy with the convention that ๐‘‘ 0,0 = ๐‘‘ 1, 1 = 0. Transportation lemma (Lai and Robbins 1985; Kaufmann et al. 2016)
  33. 37 Proof of the lower bound โ—ผ Simplicity, only consider

    binary treatments, where ๐œ‡1 > ๐œ‡2 . โ€ข The regret is given as Regret๐ = ๐œ‡1 โˆ’ ๐œ‡2 โ„™๐ (เทœ ๐‘Ž๐‘‡ = 2). โ—ผ By using Kaufmann et al., (2016)โ€™s lemma, we can derive the following lower bound for the probability of misidentification: โ„™๐ เทœ ๐‘Ž๐‘‡ = 2 โ‰ฅ exp โˆ’๐‘‡ ๐œ‡1 โˆ’ ๐œ‡2 2 ๐œŽ1 2/๐‘ค1 + ๐œŽ2 2/๐‘ค2 + ๐‘œ 1 . โ—ผ Therefore, we have Regret๐ โ‰ฅ ๐œ‡1 โˆ’ ๐œ‡2 exp โˆ’๐‘‡ ๐œ‡1โˆ’๐œ‡2 2 2 ๐œŽ1 2/๐‘ค1+๐œŽ2 2/๐‘ค2 + ๐‘œ 1 . โ—ผ Letting ๐œ‡1 โˆ’ ๐œ‡2 = ๐œŽ1 2 ๐‘ค1 + ๐œŽ2 2 ๐‘ค2 /๐‘‡ and optimizing ๐‘ค๐‘Ž , we have Regret๐ โ‰ฅ (๐œŽ1 + ๐œŽ2 )/ ๐‘’๐‘‡ .
  34. 38 Proof of the upper bound โ—ผ We upper bound

    the regret. โ—ผ From the regret decomposition Regret๐ = ๐œ‡1 โˆ’ ๐œ‡2 โ„™๐ (เทœ ๐‘Ž๐‘‡ = 2), only โ„™๐ (เทœ ๐‘Ž๐‘‡ = 2) depends on the experiment. โ—ผ We aim to bound โ„™๐ เทœ ๐‘Ž๐‘‡ = 2 . โ€ข Note that โ„™๐ เทœ ๐‘Ž๐‘‡ = 2 = โ„™๐ เทœ ๐œ‡1,๐‘‡ โ‰ค เทœ ๐œ‡2,๐‘‡ = โ„™๐ เทœ ๐œ‡1,๐‘‡ โˆ’ เทœ ๐œ‡2,๐‘‡ โˆ’ ฮ”2,๐ โ‰ค โˆ’ฮ”2,๐ . โ€ข ฮ”2,๐ = ๐œ‡1 โˆ’ ๐œ‡2
  35. 39 Proof of the upper bound โ—ผ Can we use

    the central limit theorem? โ€ข Ex. ๐‘‡ เทœ ๐œ‡1,๐‘‡ โˆ’ เทœ ๐œ‡2,๐‘‡ โˆ’ ฮ”2,๐ โ†’ ๐‘‘ ๐’ฉ 0, ๐œŽ1 + ๐œŽ2 2 (Hahn, Hirano, and Karlan, 2011). (if the propensity score is given as ๐‘ค1 = ๐œŽ1 ๐œŽ1+๐œŽ2 ). โ€ข The central limit theorem guarantees lim ๐‘‡โ†’โˆž sup ๐‘ง โ„™๐ ๐‘‡ เทœ ๐œ‡1,๐‘‡ โˆ’ เทœ ๐œ‡2,๐‘‡ โˆ’ ฮ”2,๐ ๐œŽ1 + ๐œŽ2 โ‰ค ๐‘ง ๐œŽ1 + ๐œŽ2 โˆ’ ฮฆ ๐‘ง ๐œŽ1 + ๐œŽ2 = 0. โ€ข ฮฆ(๐‘ฅ) is the cumulative density of the standard normal distribution. Evaluate the large deviation probability โ„™๐ เทœ ๐œ‡1,๐‘‡ โˆ’ เทœ ๐œ‡2,๐‘‡ โˆ’ ฮ”2,๐ โ‰ค โˆ’ฮ”2,๐ when ๐‘‡ฮ”2,๐ โ†’ โˆž. โ€ข The central limit guarantee is insufficient.
  36. 40 Proof of the upper bound โ—ผ Note that โ„™๐

    เทœ ๐‘Ž๐‘‡ = 2 = โ„™๐ เทœ ๐œ‡1,๐‘‡ โ‰ค เทœ ๐œ‡2,๐‘‡ = โ„™๐ เทœ ๐œ‡1,๐‘‡ โˆ’ เทœ ๐œ‡2,๐‘‡ โˆ’ ฮ”2,๐ โ‰ค โˆ’ฮ”2,๐ . โ—ผ From the Chernoff bound, for any ๐œ† > 0, we have โ„™๐ เทœ ๐œ‡1,๐‘‡ โˆ’ เทœ ๐œ‡2,๐‘‡ โˆ’ ฮ”2,๐ โ‰ค โˆ’ฮ”2,๐ โ‰ค exp โˆ’๐œ† ๐‘‡ฮ”2,๐ ๐”ผ exp ๐œ† ๐‘‡ เทœ ๐œ‡1,๐‘‡ โˆ’ เทœ ๐œ‡2,๐‘‡ โˆ’ ฮ”2,๐ . โ€ข Here, we have ๐”ผ exp ๐œ† ๐‘‡ เทœ ๐œ‡1,๐‘‡ โˆ’ เทœ ๐œ‡2,๐‘‡ โˆ’ ฮ”2,๐ = exp log ๐”ผ exp ๐œ† ๐‘‡ เทœ ๐œ‡1,๐‘‡ โˆ’ เทœ ๐œ‡2,๐‘‡ โˆ’ ฮ”2,๐ . โ—ผ From the Taylor expansion and the central limit theorem, if the third moment of ๐‘‡เตซ เตฏ เทœ ๐œ‡1,๐‘‡ โˆ’ เทœ ๐œ‡2,๐‘‡ โˆ’ ฮ”2,๐ is bounded, we have exp log ๐”ผ exp ๐œ† เทœ ๐œ‡1,๐‘‡ โˆ’ เทœ ๐œ‡2,๐‘‡ โˆ’ ฮ”2,๐ ] โ‰ค exp ๐œ†2 ๐œŽ1 + ๐œŽ2 2/2 + ๐‘œ(1) .
  37. 41 Proof of the upper bound โ—ผ Finally, letting ๐œ†

    = ๐‘‡ฮ”2,๐ / ๐œŽ1 + ๐œŽ2 2, we have โ„™๐ เทœ ๐‘Ž๐‘‡ = 2 โ‰ค โ„™๐ ฦธ ๐œ‡1,๐‘‡ โˆ’ ฦธ ๐œ‡2,๐‘‡ โˆ’ ฮ”2,๐ โ‰ค โˆ’ฮ”2,๐ โ‰ค exp โˆ’๐œ† ๐‘‡ฮ”2,๐ ๐”ผ exp ๐œ† ๐‘‡ ฦธ ๐œ‡1,๐‘‡ โˆ’ ฦธ ๐œ‡2,๐‘‡ โˆ’ ฮ”2,๐ โ‰ค exp โˆ’๐œ† ๐‘‡ฮ”2,๐ exp ๐œ†2 ๐œŽ1 + ๐œŽ2 2/2 + ๐‘œ 1 โ‰ค exp โˆ’ ๐‘‡ฮ”2,๐ 2 ๐œŽ1 + ๐œŽ2 2 + ๐‘œ 1 . โ—ผ Question: this upper bound is tight? โ€ข When ๐‘‡ฮ”2,๐ โ†’ 0, the central limit guarantee is tighter than the Chernoff bound.
  38. 43 Difficulty in best-arm identification โ—ผ Why we consider minimax

    or Bayes optimality for treatment choice? โ†’ We can consider optimality under each fixed distribution. โ€ข Kasy and Sautmann (2021) claims that they developed such a result. โ€ข But their proof had several issues. โ†’ Through the investigation of their issues, various impossibility theorems have been showed. โ—ผ Impossibility theorems: there is no algorithm which is optimal for any distribution. โ€ข Kaufmann (2020). โ€ข Ariu, Kato, Komiyama, McAlinn, and Qin (2021). โ€ข Degenne (2023). โ€ข Wang, Ariu, and Proutier (2024) โ€ข Imbens, Qin, and Wager (2025(
  39. 44 Preliminary: bandit lower bound โ—ผ Lower bounds in the

    bandit problem are derived via the information theory. โ—ผ The following lemma is one of the most general and tight results for lower bounds. โ€ข Let ๐‘ƒ and ๐‘„ be two distributions with ๐พ arms such that for all ๐‘Ž, the distributions ๐‘ƒ(๐‘Ž) and ๐‘„(๐‘Ž) of ๐‘Œ(๐‘Ž) are mutually absolutely continuous. โ€ข We have เท ๐‘Žโˆˆ[๐พ] ๐”ผ๐‘ƒ เท ๐‘ก=1 ๐‘‡ 1[๐ด๐‘ก = ๐‘Ž] KL ๐‘ƒ ๐‘Ž , ๐‘„๐‘Ž โ‰ฅ sup โ„ฐโˆˆโ„ฑ๐‘‡ ๐‘‘ โ„™๐‘ƒ โ„ฐ , โ„™๐‘„ โ„ฐ . โ€ข ๐‘‘ ๐‘ฅ, ๐‘ฆ โ‰” ๐‘ฅ log ๐‘ฅ ๐‘ฆ + 1 โˆ’ ๐‘ฅ log 1โˆ’๐‘ฅ 1โˆ’๐‘ฆ is the binary relative entropy with the convention that ๐‘‘ 0,0 = ๐‘‘ 1, 1 = 0. Transportation lemma (Lai and Robbins 1985; Kaufmann et al. 2016)
  40. 45 Example: regret lower bound in regret minimization โ—ผ Lai

    and Robbins (1985) develops a lower bound for the regret minimization problem. โ€ข Regret minimization problem. โ€ข Goal. Maximize the cumulative outcome ฯƒ๐‘ก=1 ๐‘‡ ๐‘Œ ๐ด๐‘ก,๐‘ก . โ€ข The (in-sample) regret is defined as Regret๐‘ƒ โ‰” ๐”ผ๐‘ƒ ฯƒ๐‘ก=1 ๐‘‡ ๐‘Œ๐‘Ž๐‘ƒ โˆ— , ๐‘ก โˆ’ ฯƒ๐‘ก=1 ๐‘‡ ๐‘Œ ๐ด๐‘ก,๐‘ก . โ€ข Under each distribution ๐‘ƒ0 , for large ๐‘‡, the lower bound is given as Regret๐‘ƒ0 โ‰ฅ log ๐‘‡ inf ๐‘„โˆˆ๐ด๐‘™๐‘ก ๐‘ƒ0 เท ๐‘Žโˆˆ ๐พ ๐”ผ๐‘ƒ0 เท ๐‘ก=1 ๐‘‡ 1[๐ด๐‘ก = ๐‘Ž] KL ๐‘ƒ๐‘Ž,0 , ๐‘„๐‘Ž for large ๐‘‡ โ†’ โˆž, where ๐ด๐‘™๐‘ก ๐‘ƒ0 โ‰” {๐‘„ โˆˆ ๐’ซ: ๐‘Ž๐‘„ โˆ— โ‰  ๐‘Ž๐‘ƒ_) โˆ— } โ—ผ Kaufmannโ€™s lower bound is a generalization of Lai and Robbinsโ€™ lower bound. โ€ข It is known that Kaufmannโ€™s lower bound can yield lower bounds for various bandit problems. โ€ข โ€œCan we develop a lower bound for best-arm identification using Kaufmannโ€™s lemma?โ€
  41. 46 Lower bound in the fixed-budget setting โ—ผ A direct

    application of Kaufmannโ€™s lemma for best-arm identification yields the following bound. โ—ผ Lower bound (conjecture): โ€ข Under each distribution ๐‘ƒ0 of the data-generating process, a lower bound is given as โ„™๐‘ƒ0 เทœ ๐‘Ž๐‘‡ โ‰  ๐‘Ž๐‘ƒ0 โˆ— โ‰ฅ inf ๐‘„โˆˆ๐ด๐‘™๐‘ก(๐‘ƒ0) exp โˆ’ เท ๐‘Žโˆˆ[๐พ] ๐”ผ๐‘„ เท ๐‘ก=1 ๐‘‡ 1[๐ด๐‘ก = ๐‘Ž] KL ๐‘„๐‘Ž ๐‘ƒ๐‘Ž, 0 . โ€ข 1 ๐‘‡ ๐”ผ๐‘„ ฯƒ๐‘ก=1 ๐‘‡ 1[๐ด๐‘ก = ๐‘Ž] corresponds to a treatment-assignment probability (ratio) under ๐‘„. โ€ข By denoting 1 ๐‘‡ ๐”ผ๐‘„ ฯƒ๐‘ก=1 ๐‘‡ 1[๐ด๐‘ก = ๐‘Ž] by ๐‘ค๐‘Ž , we can rewrite the above conjecture as โ„™๐‘ƒ0 เทœ ๐‘Ž๐‘‡ โ‰  ๐‘Ž๐‘ƒ0 โˆ— โ‰ฅ inf ๐‘„โˆˆ๐ด๐‘™๐‘ก(๐‘ƒ0) exp โˆ’ เท ๐‘Žโˆˆ 1,2,โ€ฆ,๐พ ๐‘ค๐‘Ž KL ๐‘„๐‘Ž , ๐‘ƒ0,๐‘Ž . โ€ข Note that ๐‘ค๐‘Ž = 1 ๐‘‡ ๐”ผ๐‘„ ฯƒ๐‘ก=1 ๐‘‡ 1[๐ด๐‘ก = ๐‘Ž] implies that ๐œ‹ is a treatment-assignment probability (ratio) under ๐‘„.
  42. 47 Optimal design in the fixed-budget setting โ—ผ Question: Does

    there exist an algorithm whose probability of misidentification โ„™๐‘ƒ0 เทœ ๐‘Ž๐‘‡ โ‰  ๐‘Ž๐‘ƒ0 โˆ— exactly matches the following lower bound?: inf ๐‘„โˆˆ๐ด๐‘™๐‘ก(๐‘ƒ0) exp โˆ’๐‘‡ เท ๐‘Žโˆˆ[๐พ] ๐‘คa KL ๐‘„๐‘Ž , ๐‘ƒ0,๐‘Ž . โ†’ Answer: No(without additional assumptions). โ€ข When the number of treatments is two, and the outcomes follow the Gaussian distributions with known variances, such an optimal algorithm exists. โ€ข The optimal algorithm is Neyman allocation. โ€ข In more general cases, no algorithm exists.
  43. 48 Neyman allocation is optimal in two-armed bandits โ—ผ Consider

    two treatments (๐พ = 2) โ€ข Assume that the variance is known. โ—ผ The lower bound can be computed as inf ๐‘„โˆˆ๐ด๐‘™๐‘ก(๐‘ƒ0) exp โˆ’๐‘‡ เท ๐‘Žโˆˆ[๐พ] ๐‘ค๐‘Ž KL ๐‘„๐‘Ž , ๐‘ƒ0,๐‘Ž โ‰ฅ exp โˆ’๐‘‡ ๐”ผ๐‘ƒ0 ๐‘Œ1 โˆ’ ๐”ผ๐‘ƒ0 ๐‘Œ2 2 ๐œŽ1 + ๐œŽ2 2 โ—ผ Asymptotically optimal algorithm is Neyman allocation (Kaufmann et al. 2016): โ€ข Assume that the variances are known. โ€ข Allocate treatments with a ratio of the standard deviation. โ€ข At the end of the experiment, recommend a treatment with the highest sample mean as the best treatment. โ—ผ ๐œŽ1 + ๐œŽ2 2 is the asymptotic variance of the ATE estimator under the Neyman allocation.
  44. 49 Optimal algorithm in the multi-armed bandits. Kasy and Sautmann

    (KS) (2021) โ—ผ Propose the Exploration Sampling (ES) algorithm. โ€ข A variant of the Top-Two Thompson sampling (Russo, 2016). โ—ผ They show that under their ES algorithm, for each ๐‘ƒ0 โˆˆ Bernoulli independent of ๐‘‡, it holds that Regret๐‘ƒ0 โ‰ค inf ๐‘ค inf ๐‘„โˆˆ๐ด๐‘™๐‘ก(๐‘ƒ0) exp โˆ’๐‘‡ เท ๐‘Žโˆˆ[๐พ] ๐‘ค๐‘Ž KL ๐‘„๐‘Ž , ๐‘ƒ๐‘Ž,0 . โ†’ They claim that their algorithm is optimal for Kaufmannโ€™s lower bound โ„™๐‘ƒ0 เทœ ๐‘Ž๐‘‡ โ‰  ๐‘Ž๐‘ƒ0 โˆ— โ‰ฅ inf ๐‘„โˆˆ๐ด๐‘™๐‘ก(๐‘ƒ0) exp โˆ’๐‘‡ เท ๐‘Žโˆˆ[๐พ] ๐‘ค๐‘Ž KL ๐‘„๐‘Ž , ๐‘ƒ0,๐‘Ž .
  45. 50 Impossibility theorem in the multi-armed bandits โ—ผ Issues in

    KS (Ariu et al. 2025). โ€ข The KL divergence is flipped. โ€ข KS (Incorrect): Regret๐‘ƒ0 โ‰ค inf ๐‘ค inf ๐‘„โˆˆ๐ด๐‘™๐‘ก(๐‘ƒ0) exp โˆ’๐‘‡ ฯƒ ๐‘Žโˆˆ[๐พ] ๐‘ค๐‘Ž KL ๐‘„๐‘Ž , ๐‘ƒ๐‘Ž,0 . โ€ข Ours (Correct): Regret๐‘ƒ0 โ‰ค inf ๐‘ค inf ๐‘„โˆˆ๐ด๐‘™๐‘ก(๐‘ƒ0) exp โˆ’๐‘‡ ฯƒ ๐‘Žโˆˆ[๐พ] ๐‘ค๐‘Ž KL ๐‘ƒ๐‘Ž,0 , ๐‘„๐‘Ž . โ€ข There exists a distribution under which any algorithm cannot attain ฮ“โˆ—. Impossibility theorem: there exists a distribution ๐‘ƒ0 โˆˆ ๐’ซ under which Regret๐‘ƒ0 โ‰ฅ inf ๐‘ค inf ๐‘„โˆˆ๐ด๐‘™๐‘ก(๐‘ƒ0) exp โˆ’๐‘‡ เท ๐‘Žโˆˆ[๐พ] ๐‘ค๐‘Ž KL ๐‘„๐‘Ž , ๐‘ƒ๐‘Ž,0 .
  46. 51 Why? โ—ผ Why the conjectured lower bound does not

    work? โ—ผ There are several technical issues. 1. We cannot compute an ideal treatment-allocation probability from the conjectured lower bound. โ€ข Recall that the lower bound is given as โ„™๐‘ƒ0 เทœ ๐‘Ž๐‘‡ โ‰  ๐‘Ž๐‘ƒ0 โˆ— โ‰ฅ ๐‘‰โˆ— = inf ๐‘„โˆˆ๐ด๐‘™๐‘ก(๐‘ƒ0) exp โˆ’๐‘‡ เท ๐‘Žโˆˆ[๐พ] ๐‘ค๐‘Ž KL ๐‘„๐‘Ž , ๐‘ƒ๐‘Ž,0 , Here, ๐‘ค๐‘Ž = 1 ๐‘‡ ๐”ผ๐‘„ ฯƒ๐‘ก=1 ๐‘‡ 1[๐ด๐‘ก = ๐‘Ž] depends on ๐‘„. โ†’ min ๐‘„ min ๐‘ค exp โˆ’๐‘‡ ฯƒ ๐‘Žโˆˆ[๐พ] ๐‘ค๐‘Ž KL ๐‘„๐‘Ž , ๐‘ƒ๐‘Ž,0 โ‰  inf ๐‘ค inf ๐‘„โˆˆ๐ด๐‘™๐‘ก(๐‘ƒ0) exp โˆ’๐‘‡ ฯƒ ๐‘Žโˆˆ[๐พ] ๐‘ค๐‘Ž KL ๐‘„๐‘Ž , ๐‘ƒ๐‘Ž,0 . โ†’ We cannot obtain an ideal allocation probability from this lower bound. 2. There exists ๐‘ƒ0 such that โ„™๐‘ƒ0 เทœ ๐‘Ž๐‘‡ โ‰  ๐‘Ž๐‘ƒ0 โˆ— โ‰ฅ inf ๐‘ค inf ๐‘„โˆˆ๐ด๐‘™๐‘ก(๐‘ƒ0) exp โˆ’๐‘‡ ฯƒ ๐‘Žโˆˆ[๐พ] ๐‘ค๐‘Ž KL ๐‘„๐‘Ž , ๐‘ƒ๐‘Ž,0 .
  47. 52 Related literature โ€ข Glynn and Juneja (2004) develops the

    large-deviation optimal experiments for treatment choice. โ€ข Kasy and Sautman (2021) shows that Russo (2021)โ€™s Top-Two-Thompson sampling is optimal in the sense that the upper bound matches the bound in Glynn and Juneja (2004). โ€ข Kaufmann, Cappรฉ, and Garivier (2016) develops a general lower bound for the bandit problem. โ—ผ Impossibility theorems. โ€ข Ariu, Kato, Komiyama, McAlinn, and Qin (2025) finds a counterexample for Kasy and Sautmann (2021). โ€ข Degenne (2023) shows that the bounds in Glynn and Juneja (2004) is a special case of Kaufmannโ€™s bounds under strong assumptions. โ€ข Wang, Ariu, and Proutier (2024) and Imbens, Qin, and Wager also derive counterexamples.
  48. 54 Concluding remarks โ—ผ Adaptive experimental design for treatment choice

    (BAI). โ€ข Goal. Choosing the best treatment arm. โ€ข A lower bound and an ideal treatment-allocation probability depend on the uncertainty. โ€ข Distribution-dependent analysis: โ€ข No globally optimal algorithm exists for Kaufmannโ€™s lower bound. โ€ข Impossibility theorems: there exists a distribution under which a lower bound is larger than Kaufmannโ€™s one. โ€ข Minimax and Bayes analysis โ€ข TS-EBA experiment.
  49. 55 Reference โ€ข Masahiro Kato, Takuya Ishihara, Junya Honda, and

    Yusuke Narita. E๏ฌƒcient adaptive experimental design for average treatment e๏ฌ€ect estimation, 2020. arXiv:2002.05308. โ€ข Masahiro Kato, Kenichiro McAlinn, and Shota Yasui. The adaptive doubly robust estimator and a paradox concerning logging policy. In International Conference on Neural Information Processing Systems (NeurIPS), 2021. โ€ข Kaito Ariu, Masahiro Kato, Junpei Komiyama, Kenichiro McAlinn, and Chao Qin. A comment on โ€œadaptive treatment assignment in experiments for policy choiceโ€, 2021. โ€ข Masahiro Kato. Generalized Neyman allocation for locally minimax optimal best-arm identification, 2024a. arXiv: 2405.19317. โ€ข Masahiro Kato. Locally optimal fixed-budget best arm identification in two-armed gaussian bandits with unknown variances, 2024b. arXIV: 2312.12741. โ€ข Masahiro Kato and Kaito Ariu. The role of contextual information in best arm identification, 2021. Accepted for Journal of Machine Learning Research conditioned on minor revisions. โ€ข Masahiro Kato, Akihiro Oga, Wataru Komatsubara, and Ryo Inokuchi. Active adaptive experimental design for treatment e๏ฌ€ect estimation with covariate choice. In International Conference on Machine Learning (ICML), 2024a. โ€ข Masahiro Kato, Kyohei Okumura, Takuya Ishihara, and Toru Kitagawa. Adaptive experimental design for policy learning, 2024b. arXiv: 2401.03756. โ€ข Junpei Komiyama, Kaito Ariu, Masahiro Kato, and Chao Qin. Rate-optimal bayesian simple regret in best arm identification. Mathematics of Operations Research, 2023.
  50. 56 Reference โ€ข van der Vaart, A. (1998), Asymptotic Statistics,

    Cambridge Series in Statistical and Probabilistic Mathematics, Cambridge University Press. โ€ข Tabord-Meehan, M. (2022), โ€œStratification Trees for Adaptive Randomization in Randomized Controlled Trials,โ€ The Review of Economic Studies. โ€ข van der Laan, M. J. (2008), โ€œThe Construction and Analysis of Adaptive Group Sequential Designs,โ€ https://biostats.bepress.com/ucbbiostat/paper232. โ€ข Neyman, J. (1923), โ€œSur les applications de la theorie des probabilites aux experiences agricoles: Essai des principes,โ€ Statistical Science, 5, 463โ€“472. โ€ข Neyman, J. (1934), โ€œOn the Two Different Aspects of the Representative Method: the Method of Stratified Sampling and the Method of Purposive Selection,โ€ Journal of the Royal Statistical Society, 97, 123โ€“150. โ€ข Manski, C. F. (2002), โ€œTreatment choice under ambiguity induced by inferential problems,โ€ Journal of Statistical Planning and Inference, 105, 67โ€“82. โ€ข Manski (2004), โ€œStatistical Treatment Rules for Heterogeneous Populations,โ€ Econometrica, 72, 1221โ€“1246.
  51. 57 Reference โ€ข Kitagawa, T. and Tetenov, A. (2018), โ€œWho

    Should Be Treated? Empirical Welfare Maximization Methods for Treatment Choice,โ€ Econometrica, 86, 591โ€“616. โ€ข Garivier, A. and Kaufmann, E. (2016), โ€œOptimal Best Arm Identification with Fixed Confidence,โ€ in Conference on Learning Theory. โ€ข Glynn, P. and Juneja, S. (2004), โ€œA large deviations perspective on ordinal optimization,โ€ in Proceedings of the 2004 Winter Simulation Conference, IEEE, vol. 1. โ€ข Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W., and Robins, J. (2018), โ€œDouble/debiased machine learning for treatment and structural parameters,โ€ The Econometrics Journal. โ€ข Degenne, R. (2023), โ€œOn the Existence of a Complexity in Fixed Budget Bandit Identification,โ€ Conference on Learning Theory (COLT). โ€ข Kasy, M. and Sautmann, A. (2021), โ€œAdaptive Treatment Assignment in Experiments for Policy Choice,โ€ Econometrica, 89, 113โ€“ 132. โ€ข Rubin, D. B. (1974), โ€œEstimating causal effects of treatments in randomized and nonrandomized studies,โ€ Journal of Educational Psychology.