Upgrade to PRO for Only $50/Yearโ€”Limited-Time Offer! ๐Ÿ”ฅ

Online Inverse Linear Optimization

Avatar for Shinsaku Sakaue Shinsaku Sakaue
March 06, 2025
240

Online Inverse Linearย Optimization

Mar. 7, 2025, at Kyoto University.
https://www.ml.ist.i.kyoto-u.ac.jp/kiss

Avatar for Shinsaku Sakaue

Shinsaku Sakaue

March 06, 2025
Tweet

More Decks by Shinsaku Sakaue

Transcript

  1. Online Inverse Linear Optimization Mar. 7, 2025 @ KyotoU, KISS

    Shinsaku Sakaue (Univ. Tokyo, RIKEN AIP) Joint work with Taira Tsuchiya (Univ. Tokyo, RIKEN AIP), Han Bao (Kyoto Univ.), Taihei Oki (Hokkaido Univ.) Preprint: https://arxiv.org/abs/2501.14349
  2. Forward Optimization 2 maximize ๐‘“!(๐‘ฅ) subject to ๐‘ฅ โˆˆ ๐‘‹

    Forward optimization: given ๐œƒ, find optimal solution ๐‘ฅ. https://www.geothermalnextgeneration.com/updates/seismic-tomography-a-cat-scan-of-the-earth https://en.wikipedia.org/wiki/Consumer_behaviour Seismic tomography ๐œƒ = geological features of certain zones Seismic wave trajectories are modeled as shortest path problems โ€ข Decision variable ๐‘ฅ โˆˆ โ„" โ€ข Constraint ๐‘‹ โІ โ„" (known) โ€ข Model parameter ๐œƒ โˆˆ โ„# Customer behavior ๐œƒ = customerโ€™s preference Purchase behavior is modeled as utility maximization
  3. Inverse Optimization 3 maximize ๐‘“!(๐‘ฅ) subject to ๐‘ฅ โˆˆ ๐‘‹

    Inverse optimization: estimate ๐œƒ from optimal solution ๐‘ฅ. Forward optimization: given ๐œƒ, find optimal solution ๐‘ฅ. โ€ข Decision variable ๐‘ฅ โˆˆ โ„" โ€ข Constraint ๐‘‹ โІ โ„" (known) โ€ข Model parameter ๐œƒ โˆˆ โ„# Seismic tomography Estimate geological features ๐œƒ from observed waves ๐‘ฅ Estimate customerโ€™s preference ๐œƒ from purchase behavior ๐‘ฅ Customer behavior https://www.geothermalnextgeneration.com/updates/seismic-tomography-a-cat-scan-of-the-earth https://en.wikipedia.org/wiki/Consumer_behaviour
  4. Linear Optimization (forward model in this talk) 4 For ๐‘ก

    = 1, โ€ฆ , ๐‘‡, an agent solves maximize ๐‘โˆ—, ๐‘ฅ subject to ๐‘ฅ โˆˆ ๐‘‹% . Budget = 2 ๐‘โˆ— = 0.2 0.4 0.1 0.0 0.3 ๐Ÿ‘ฉ๐Ÿฆฐ Agent โ€ข ๐‘โˆ— โˆˆ โ„" is the agentโ€™s internal objective vector. โ€ข ๐‘โˆ— lies in a convex set ฮ˜ โŠ‚ โ„" with diam ฮ˜ = 1 (known to the learner). โ€ข ๐‘‹% โІ โ„" is the agentโ€™s ๐‘กth action set with diam ๐‘‹% = 1. โ€“ not necessarily convex, but we assume an oracle to solve linear optimization on ๐‘‹! . โ€ข Let ๐‘ฅ% โˆˆ arg max ๐‘โˆ—, ๐‘ฅ ๐‘ฅ โˆˆ ๐‘‹% .
  5. Linear Optimization (forward model in this talk) 5 For ๐‘ก

    = 1, โ€ฆ , ๐‘‡, an agent solves maximize ๐‘โˆ—, ๐‘ฅ subject to ๐‘ฅ โˆˆ ๐‘‹% . ๐‘‹% Sold out ๐‘โˆ— = 0.2 0.4 0.1 0.0 0.3 ๐Ÿ‘ฉ๐Ÿฆฐ Agent Budget = 2 โ€ข ๐‘โˆ— โˆˆ โ„" is the agentโ€™s internal objective vector. โ€ข ๐‘โˆ— lies in a convex set ฮ˜ โŠ‚ โ„" with diam ฮ˜ = 1 (known to the learner). โ€ข ๐‘‹% โІ โ„" is the agentโ€™s ๐‘กth action set with diam ๐‘‹% = 1. โ€“ not necessarily convex, but we assume an oracle to solve linear optimization on ๐‘‹! . โ€ข Let ๐‘ฅ% โˆˆ arg max ๐‘โˆ—, ๐‘ฅ ๐‘ฅ โˆˆ ๐‘‹% .
  6. Linear Optimization (forward model in this talk) 6 For ๐‘ก

    = 1, โ€ฆ , ๐‘‡, an agent solves maximize ๐‘โˆ—, ๐‘ฅ subject to ๐‘ฅ โˆˆ ๐‘‹% . ๐‘‹% Sold out ๐‘ฅ% = 1 0 0 0 1 ๐‘โˆ— = 0.2 0.4 0.1 0.0 0.3 ๐Ÿ‘ฉ๐Ÿฆฐ Agent Budget = 2 โ€ข ๐‘โˆ— โˆˆ โ„" is the agentโ€™s internal objective vector. โ€ข ๐‘โˆ— lies in a convex set ฮ˜ โŠ‚ โ„" with diam ฮ˜ = 1 (known to the learner). โ€ข ๐‘‹% โІ โ„" is the agentโ€™s ๐‘กth action set with diam ๐‘‹% = 1. โ€“ not necessarily convex, but we assume an oracle to solve linear optimization on ๐‘‹! . โ€ข Let ๐‘ฅ% โˆˆ arg max ๐‘โˆ—, ๐‘ฅ ๐‘ฅ โˆˆ ๐‘‹% .
  7. Inverse Linear Optimization 7 For ๐‘ก = 1, โ€ฆ ,

    ๐‘‡, an agent solves ๐‘‹% Sold out ๐‘ฅ% = 1 0 0 0 1 ๐‘โˆ— = 0.2 0.4 0.1 0.0 0.3 ๐Ÿ‘ฉ๐Ÿฆฐ Agent ๐Ÿค– Learner maximize ๐‘โˆ—, ๐‘ฅ subject to ๐‘ฅ โˆˆ ๐‘‹% . Budget = 2 A learner aims to infer ๐‘โˆ— from ๐‘‹%, ๐‘ฅ% %&' ( .
  8. Online Inverse Linear Optimization 8 For ๐‘ก = 1, โ€ฆ

    , ๐‘‡: ๐Ÿ‘ฉ๐Ÿฆฐ Agent ๐Ÿค– Learner Bรคrmann et al. 2017
  9. Online Inverse Linear Optimization 9 For ๐‘ก = 1, โ€ฆ

    , ๐‘‡: ๐Ÿ‘ฉ๐Ÿฆฐ Agent ๐Ÿค– Learner Learner makes prediction ฬ‚ ๐‘% โˆˆ ฮ˜ of ๐‘โˆ—. Bรคrmann et al. 2017
  10. Online Inverse Linear Optimization 10 For ๐‘ก = 1, โ€ฆ

    , ๐‘‡: ๐Ÿ‘ฉ๐Ÿฆฐ Agent ๐Ÿค– Learner Learner makes prediction ฬ‚ ๐‘% โˆˆ ฮ˜ of ๐‘โˆ—. Agent faces ๐‘‹% and takes action ๐‘ฅ% โˆˆ argmax )โˆˆ+" ๐‘โˆ—, ๐‘ฅ . Bรคrmann et al. 2017
  11. Online Inverse Linear Optimization 11 For ๐‘ก = 1, โ€ฆ

    , ๐‘‡: ๐Ÿ‘ฉ๐Ÿฆฐ Agent ๐Ÿค– Learner Learner makes prediction ฬ‚ ๐‘% โˆˆ ฮ˜ of ๐‘โˆ—. Agent faces ๐‘‹% and takes action ๐‘ฅ% โˆˆ argmax )โˆˆ+" ๐‘โˆ—, ๐‘ฅ . Observes ๐‘‹%, ๐‘ฅ% and updates from ฬ‚ ๐‘% to ฬ‚ ๐‘%,' . Bรคrmann et al. 2017
  12. Online Inverse Linear Optimization 12 For ๐‘ก = 1, โ€ฆ

    , ๐‘‡: ๐Ÿ‘ฉ๐Ÿฆฐ Agent ๐Ÿค– Learner Learner makes prediction ฬ‚ ๐‘% โˆˆ ฮ˜ of ๐‘โˆ—. Agent faces ๐‘‹% and takes action ๐‘ฅ% โˆˆ argmax )โˆˆ+" ๐‘โˆ—, ๐‘ฅ . Let J ๐‘ฅ% โˆˆ arg max ฬ‚ ๐‘%, ๐‘ฅ ๐‘ฅ โˆˆ ๐‘‹% and define regret ๐‘…( -โˆ— as ๐‘…( -โˆ— โ‰” โˆ‘%&' ( ๐‘โˆ—, ๐‘ฅ% โˆ’ J ๐‘ฅ% = โˆ‘%&' ( ๐‘โˆ—, ๐‘ฅ% โˆ’ ๐‘โˆ—, J ๐‘ฅ% . Observes ๐‘‹%, ๐‘ฅ% and updates from ฬ‚ ๐‘% to ฬ‚ ๐‘%,' . Bรคrmann et al. 2017
  13. Online Inverse Linear Optimization 13 For ๐‘ก = 1, โ€ฆ

    , ๐‘‡: ๐Ÿ‘ฉ๐Ÿฆฐ Agent ๐Ÿค– Learner Learner makes prediction ฬ‚ ๐‘% โˆˆ ฮ˜ of ๐‘โˆ—. Agent faces ๐‘‹% and takes action ๐‘ฅ% โˆˆ argmax )โˆˆ+" ๐‘โˆ—, ๐‘ฅ . Let J ๐‘ฅ% โˆˆ arg max ฬ‚ ๐‘%, ๐‘ฅ ๐‘ฅ โˆˆ ๐‘‹% and define regret ๐‘…( -โˆ— as ๐‘…( -โˆ— โ‰” โˆ‘%&' ( ๐‘โˆ—, ๐‘ฅ% โˆ’ J ๐‘ฅ% = โˆ‘%&' ( ๐‘โˆ—, ๐‘ฅ% โˆ’ ๐‘โˆ—, J ๐‘ฅ% . Optimal objective value Objective value achieved by following learnerโ€™s prediction ฬ‚ ๐‘! Observes ๐‘‹%, ๐‘ฅ% and updates from ฬ‚ ๐‘% to ฬ‚ ๐‘%,' . โ€ข ๐‘…( -โˆ— measures the quality of actions suggested by predictions. โ€ข ๐‘…( -โˆ— is non-negative; ๐‘…( -โˆ— = 0 if ๐‘โˆ— = ฬ‚ ๐‘% for all ๐‘ก; smaller is better. Bรคrmann et al. 2017
  14. Convenient Upper Bound on Regret 14 Bรคrmann et al. 2017

    For ฬ‚ ๐‘% , J ๐‘ฅ% โˆˆ arg max ฬ‚ ๐‘%, ๐‘ฅ ๐‘ฅ โˆˆ ๐‘‹% , ๐‘โˆ—, and ๐‘ฅ% โˆˆ arg max ๐‘โˆ—, ๐‘ฅ ๐‘ฅ โˆˆ ๐‘‹% , define O ๐‘…( -โˆ— โ‰” โˆ‘%&' ( ฬ‚ ๐‘% โˆ’ ๐‘โˆ—, J ๐‘ฅ% โˆ’ ๐‘ฅ% .
  15. Convenient Upper Bound on Regret 15 Bรคrmann et al. 2017

    For ฬ‚ ๐‘% , J ๐‘ฅ% โˆˆ arg max ฬ‚ ๐‘%, ๐‘ฅ ๐‘ฅ โˆˆ ๐‘‹% , ๐‘โˆ—, and ๐‘ฅ% โˆˆ arg max ๐‘โˆ—, ๐‘ฅ ๐‘ฅ โˆˆ ๐‘‹% , define hence O ๐‘…( -โˆ— โ‰ฅ ๐‘…( -โˆ— . O ๐‘…( -โˆ— โ‰” โˆ‘%&' ( ฬ‚ ๐‘% โˆ’ ๐‘โˆ—, J ๐‘ฅ% โˆ’ ๐‘ฅ% . O ๐‘…( -โˆ— = โˆ‘%&' ( ฬ‚ ๐‘%, J ๐‘ฅ% โˆ’ ๐‘ฅ% + โˆ‘%&' ( ๐‘โˆ—, ๐‘ฅ% โˆ’ J ๐‘ฅ% = โˆ‘%&' ( ฬ‚ ๐‘%, J ๐‘ฅ% โˆ’ ๐‘ฅ% + ๐‘…( -โˆ— , = regret ๐‘…" #โˆ— โ‰ฅ 0 as ' ๐‘ฅ! is optimal for ฬ‚ ๐‘!
  16. Convenient Upper Bound on Regret 16 Bรคrmann et al. 2017

    For ฬ‚ ๐‘% , J ๐‘ฅ% โˆˆ arg max ฬ‚ ๐‘%, ๐‘ฅ ๐‘ฅ โˆˆ ๐‘‹% , ๐‘โˆ—, and ๐‘ฅ% โˆˆ arg max ๐‘โˆ—, ๐‘ฅ ๐‘ฅ โˆˆ ๐‘‹% , define hence O ๐‘…( -โˆ— โ‰ฅ ๐‘…( -โˆ— . O ๐‘…( -โˆ— โ‰” โˆ‘%&' ( ฬ‚ ๐‘% โˆ’ ๐‘โˆ—, J ๐‘ฅ% โˆ’ ๐‘ฅ% . O ๐‘…( -โˆ— = โˆ‘%&' ( ฬ‚ ๐‘%, J ๐‘ฅ% โˆ’ ๐‘ฅ% + โˆ‘%&' ( ๐‘โˆ—, ๐‘ฅ% โˆ’ J ๐‘ฅ% = โˆ‘%&' ( ฬ‚ ๐‘%, J ๐‘ฅ% โˆ’ ๐‘ฅ% + ๐‘…( -โˆ— , = regret ๐‘…" #โˆ— โ‰ฅ 0 as ' ๐‘ฅ! is optimal for ฬ‚ ๐‘! What is the additional term? ฬ‚ ๐‘%, J ๐‘ฅ% โˆ’ ๐‘ฅ% = max ฬ‚ ๐‘%, ๐‘ฅ ๐‘ฅ โˆˆ ๐‘‹% โˆ’ ฬ‚ ๐‘%, ๐‘ฅ% .
  17. Convenient Upper Bound on Regret 17 Bรคrmann et al. 2017

    For ฬ‚ ๐‘% , J ๐‘ฅ% โˆˆ arg max ฬ‚ ๐‘%, ๐‘ฅ ๐‘ฅ โˆˆ ๐‘‹% , ๐‘โˆ—, and ๐‘ฅ% โˆˆ arg max ๐‘โˆ—, ๐‘ฅ ๐‘ฅ โˆˆ ๐‘‹% , define hence O ๐‘…( -โˆ— โ‰ฅ ๐‘…( -โˆ— . O ๐‘…( -โˆ— โ‰” โˆ‘%&' ( ฬ‚ ๐‘% โˆ’ ๐‘โˆ—, J ๐‘ฅ% โˆ’ ๐‘ฅ% . O ๐‘…( -โˆ— = โˆ‘%&' ( ฬ‚ ๐‘%, J ๐‘ฅ% โˆ’ ๐‘ฅ% + โˆ‘%&' ( ๐‘โˆ—, ๐‘ฅ% โˆ’ J ๐‘ฅ% = โˆ‘%&' ( ฬ‚ ๐‘%, J ๐‘ฅ% โˆ’ ๐‘ฅ% + ๐‘…( -โˆ— , = regret ๐‘…" #โˆ— โ‰ฅ 0 as ' ๐‘ฅ! is optimal for ฬ‚ ๐‘! What is the additional term? ฬ‚ ๐‘%, J ๐‘ฅ% โˆ’ ๐‘ฅ% = max ฬ‚ ๐‘%, ๐‘ฅ ๐‘ฅ โˆˆ ๐‘‹% โˆ’ ฬ‚ ๐‘%, ๐‘ฅ% . Optimal value for ฬ‚ ๐‘! Objective value achieved by ๐‘ฅ! for ฬ‚ ๐‘! โ€ข Takes zero if ๐‘โˆ— = ฬ‚ ๐‘% ; quantifies how well ฬ‚ ๐‘% explains the agentโ€™s choice ๐‘ฅ% . โ€ข Called the suboptimality loss in inverse optimization (Mohajerin Esfahani et al. 2018). โ€ข This alone is sometimes meaningless: ฬ‚ ๐‘% = 0 trivially attains the zero suboptimality loss.
  18. Related Work: Online Learning Approach 18 Consider making the upper

    bound O ๐‘…( -โˆ— = โˆ‘%&' ( ฬ‚ ๐‘% โˆ’ ๐‘โˆ—, J ๐‘ฅ% โˆ’ ๐‘ฅ% as small as possible. Bรคrmann et al. 2017
  19. Related Work: Online Learning Approach 19 Consider making the upper

    bound O ๐‘…( -โˆ— = โˆ‘%&' ( ฬ‚ ๐‘% โˆ’ ๐‘โˆ—, J ๐‘ฅ% โˆ’ ๐‘ฅ% as small as possible. Regarding ๐‘“%: ฮ˜ โˆ‹ ๐‘ โ†ฆ ๐‘, J ๐‘ฅ% โˆ’ ๐‘ฅ% as a linear cost function, O ๐‘…( -โˆ— = โˆ‘%&' ( ฬ‚ ๐‘%, J ๐‘ฅ% โˆ’ ๐‘ฅ% โˆ’ โˆ‘%&' ( ๐‘โˆ—, J ๐‘ฅ% โˆ’ ๐‘ฅ% = โˆ‘%&' ( ๐‘“%( ฬ‚ ๐‘%) โˆ’ โˆ‘%&' ( ๐‘“%(๐‘โˆ—). The standard regret in online learning. Bรคrmann et al. 2017
  20. Related Work: Online Learning Approach 20 Consider making the upper

    bound O ๐‘…( -โˆ— = โˆ‘%&' ( ฬ‚ ๐‘% โˆ’ ๐‘โˆ—, J ๐‘ฅ% โˆ’ ๐‘ฅ% as small as possible. The standard regret in online learning. By using online linear optimization methods (e.g., OGD) to compute ฬ‚ ๐‘% , we obtain ๐‘…( -โˆ— โ‰ค O ๐‘…( -โˆ— = โˆ‘%&' ( ฬ‚ ๐‘%, J ๐‘ฅ% โˆ’ ๐‘ฅ% + โˆ‘%&' ( ๐‘โˆ—, ๐‘ฅ% โˆ’ J ๐‘ฅ% = ๐‘‚( ๐‘‡), achieving a vanishing regret (and cumulative suboptimality loss) on average as ๐‘‡ โ†’ โˆž. Regarding ๐‘“%: ฮ˜ โˆ‹ ๐‘ โ†ฆ ๐‘, J ๐‘ฅ% โˆ’ ๐‘ฅ% as a linear cost function, O ๐‘…( -โˆ— = โˆ‘%&' ( ฬ‚ ๐‘%, J ๐‘ฅ% โˆ’ ๐‘ฅ% โˆ’ โˆ‘%&' ( ๐‘โˆ—, J ๐‘ฅ% โˆ’ ๐‘ฅ% = โˆ‘%&' ( ๐‘“%( ฬ‚ ๐‘%) โˆ’ โˆ‘%&' ( ๐‘“%(๐‘โˆ—). Bรคrmann et al. 2017
  21. Related Work: Online Learning Approach 21 Consider making the upper

    bound O ๐‘…( -โˆ— = โˆ‘%&' ( ฬ‚ ๐‘% โˆ’ ๐‘โˆ—, J ๐‘ฅ% โˆ’ ๐‘ฅ% as small as possible. The standard regret in online learning. By using online linear optimization methods (e.g., OGD) to compute ฬ‚ ๐‘% , we obtain ๐‘…( -โˆ— โ‰ค O ๐‘…( -โˆ— = โˆ‘%&' ( ฬ‚ ๐‘%, J ๐‘ฅ% โˆ’ ๐‘ฅ% + โˆ‘%&' ( ๐‘โˆ—, ๐‘ฅ% โˆ’ J ๐‘ฅ% = ๐‘‚( ๐‘‡), achieving a vanishing regret (and cumulative suboptimality loss) on average as ๐‘‡ โ†’ โˆž. The rate of ๐‘‡ is optimal in general OLO. Is it also optimal in online inverse linear optimization? Regarding ๐‘“%: ฮ˜ โˆ‹ ๐‘ โ†ฆ ๐‘, J ๐‘ฅ% โˆ’ ๐‘ฅ% as a linear cost function, O ๐‘…( -โˆ— = โˆ‘%&' ( ฬ‚ ๐‘%, J ๐‘ฅ% โˆ’ ๐‘ฅ% โˆ’ โˆ‘%&' ( ๐‘โˆ—, J ๐‘ฅ% โˆ’ ๐‘ฅ% = โˆ‘%&' ( ๐‘“%( ฬ‚ ๐‘%) โˆ’ โˆ‘%&' ( ๐‘“%(๐‘โˆ—). Bรคrmann et al. 2017
  22. Related Work: Ellipsoid Based Method 22 There is a method

    achieving ๐‘…( -โˆ— = ๐‘‚(๐‘›. log ๐‘‡), going beyond the limit of OLO! Besbes et al. 2023
  23. Related Work: Ellipsoid Based Method 23 There is a method

    achieving ๐‘…( -โˆ— = ๐‘‚(๐‘›. log ๐‘‡), going beyond the limit of OLO! Besbes et al. 2023 High-level Idea Maintain a cone ๐’ž% representing possible existence of ๐‘โˆ—. After observing (๐‘‹%, ๐‘ฅ%), ๐’ž% can be narrowed down: ๐’ž%,' โ† ๐’ž% โˆฉ ๐‘ โˆˆ ฮ˜ | ๐‘, ๐‘ฅ% โ‰ฅ ๐‘, ๐‘ฅ for all ๐‘ฅ โˆˆ ๐‘‹% . Normal cone of ๐‘‹! at ๐‘ฅ! : Cone of vectors that make ๐‘ฅ! optimal over ๐‘‹! .
  24. Related Work: Ellipsoid Based Method 24 There is a method

    achieving ๐‘…( -โˆ— = ๐‘‚(๐‘›. log ๐‘‡), going beyond the limit of OLO! Besbes et al. 2023 Figure 4 in Besbes, Fonseca, Lobel. Contextual Inverse Optimization: Offline and Online Learning (Oper. Res. 2023). High-level Idea Maintain a cone ๐’ž% representing possible existence of ๐‘โˆ—. After observing (๐‘‹%, ๐‘ฅ%), ๐’ž% can be narrowed down: ๐’ž%,' โ† ๐’ž% โˆฉ ๐‘ โˆˆ ฮ˜ | ๐‘, ๐‘ฅ% โ‰ฅ ๐‘, ๐‘ฅ for all ๐‘ฅ โˆˆ ๐‘‹% . Normal cone of ๐‘‹! at ๐‘ฅ! : Cone of vectors that make ๐‘ฅ! optimal over ๐‘‹! . Slightly inflate ๐’ž% , making it an ellipsoidal cone. Based on the volume argument of the ellipsoid method for LPs, we can strike a good balance of exploration vs. exploitation.
  25. Related Work: Ellipsoid Based Method 25 There is a method

    achieving ๐‘…( -โˆ— = ๐‘‚(๐‘›. log ๐‘‡), going beyond the limit of OLO! Besbes et al. 2023 Figure 4 in Besbes, Fonseca, Lobel. Contextual Inverse Optimization: Offline and Online Learning (Oper. Res. 2023). High-level Idea Maintain a cone ๐’ž% representing possible existence of ๐‘โˆ—. After observing (๐‘‹%, ๐‘ฅ%), ๐’ž% can be narrowed down: ๐’ž%,' โ† ๐’ž% โˆฉ ๐‘ โˆˆ ฮ˜ | ๐‘, ๐‘ฅ% โ‰ฅ ๐‘, ๐‘ฅ for all ๐‘ฅ โˆˆ ๐‘‹% . Normal cone of ๐‘‹! at ๐‘ฅ! : Cone of vectors that make ๐‘ฅ! optimal over ๐‘‹! . But, there are downsidesโ€ฆ โ€ข ๐‘›. factor is prohibitive for large ๐‘›. โ€ข Not very efficient, albeit polynomial in ๐‘› and ๐‘‡. Slightly inflate ๐’ž% , making it an ellipsoidal cone. Based on the volume argument of the ellipsoid method for LPs, we can strike a good balance of exploration vs. exploitation.
  26. Our Results 26 Theorem There is a method achieving ๐‘…(

    -โˆ— โ‰ค O ๐‘…( -โˆ— = ๐‘‚(๐‘› log ๐‘‡). โ€ข Improving ๐‘…( -โˆ— = ๐‘‚(๐‘›. log ๐‘‡) of Besbes et al. (2023) by a factor of ๐‘›/. โ€ข Applies to the upper bound O ๐‘…( -โˆ— โ‰ฅ ๐‘…( -โˆ— . โ€ข More efficient: based on the online Newton step (ONS), rather than the ellipsoid method.
  27. Our Results 27 Theorem There is a method achieving ๐‘…(

    -โˆ— โ‰ค O ๐‘…( -โˆ— = ๐‘‚(๐‘› log ๐‘‡). And more: โ€ข Dealing with suboptimal feedback ๐‘ฅ% with MetaGrad (ONS with multiple learning rates). โ€ข Lower bound of ๐‘…( -โˆ— = ฮฉ(๐‘›), implying the tightness regarding ๐‘›. โ€ข ๐‘…( -โˆ— = ๐‘‚(1) for ๐‘› = 2 based on the method of Besbes et al. (2023). โ€ข Improving ๐‘…( -โˆ— = ๐‘‚(๐‘›. log ๐‘‡) of Besbes et al. (2023) by a factor of ๐‘›/. โ€ข Applies to the upper bound O ๐‘…( -โˆ— โ‰ฅ ๐‘…( -โˆ— . โ€ข More efficient: based on the online Newton step (ONS), rather than the ellipsoid method.
  28. Online Convex Optimization 29 For ๐‘ก = 1, โ€ฆ ,

    ๐‘‡: ๐Ÿค– ๐ŸŒ Learner Environment
  29. Online Convex Optimization 30 For ๐‘ก = 1, โ€ฆ ,

    ๐‘‡: Play ฬ‚ ๐‘% ๐Ÿค– ๐ŸŒ Learner Environment
  30. Online Convex Optimization 31 For ๐‘ก = 1, โ€ฆ ,

    ๐‘‡: Reveal ๐‘“% Play ฬ‚ ๐‘% ๐Ÿค– ๐ŸŒ Learner Environment ๐‘“% may be reactive to ฬ‚ ๐‘%
  31. Online Convex Optimization 32 For ๐‘ก = 1, โ€ฆ ,

    ๐‘‡: Reveal ๐‘“% Play ฬ‚ ๐‘% ๐Ÿค– ๐ŸŒ Learner Environment Incurs ๐‘“%( ฬ‚ ๐‘%) and computes ฬ‚ ๐‘%,' ๐‘“% may be reactive to ฬ‚ ๐‘%
  32. Online Convex Optimization 33 โ€ข Learnerโ€™s domain ฮ˜ is convex,

    and diam ฮ˜ = 1. โ€ข Learner can use information up to the end of round ๐‘ก when computing ฬ‚ ๐‘%,' . โ€ข Loss function ๐‘“%: ฮ˜ โ†’ โ„ is convex. For ๐‘ก = 1, โ€ฆ , ๐‘‡: Reveal ๐‘“% Play ฬ‚ ๐‘% ๐Ÿค– ๐ŸŒ Learner Environment ๐‘“% may be reactive to ฬ‚ ๐‘% Incurs ๐‘“%( ฬ‚ ๐‘%) and computes ฬ‚ ๐‘%,'
  33. Online Convex Optimization 34 For ๐‘ก = 1, โ€ฆ ,

    ๐‘‡: Reveal ๐‘“% Play ฬ‚ ๐‘% ๐Ÿค– ๐ŸŒ Learner Environment ๐‘“% may be reactive to ฬ‚ ๐‘% For any comparator ๐‘โˆ— โˆˆ ฮ˜, the learner aims to make the regret as small as possible: โˆ‘%&' ( ๐‘“% ฬ‚ ๐‘% โˆ’ ๐‘“%(๐‘โˆ—) . โ€ข Learnerโ€™s domain ฮ˜ is convex, and diam ฮ˜ = 1. โ€ข Learner can use information up to the end of round ๐‘ก when computing ฬ‚ ๐‘%,' . โ€ข Loss function ๐‘“%: ฮ˜ โ†’ โ„ is convex. Incurs ๐‘“%( ฬ‚ ๐‘%) and computes ฬ‚ ๐‘%,'
  34. Exp-Concave Loss 35 Function ๐‘“: ฮ˜ โ†’ โ„ is ๐›ผ-exp-concave

    for some ๐›ผ > 0 if the following ๐‘”: ฮ˜ โ†’ โ„ is concave: ๐‘”: ๐‘ โ†ฆ e012(-).
  35. Exp-Concave Loss 36 Function ๐‘“: ฮ˜ โ†’ โ„ is ๐›ผ-exp-concave

    for some ๐›ผ > 0 if the following ๐‘”: ฮ˜ โ†’ โ„ is concave: ๐‘”: ๐‘ โ†ฆ e012(-). If ๐‘“: ฮ˜ โ†’ โ„ is twice-differentiable, ๐›ผ-exp-concavity is equivalent to โˆ‡5๐‘“ ๐‘ โ‰ฝ ๐›ผโˆ‡๐‘“ ๐‘ โˆ‡๐‘“ ๐‘ 6 โˆ€๐‘ โˆˆ ฮ˜. Cf. ๐›ผ-strong convexity requires โˆ‡5๐‘“ ๐‘ โ‰ฝ 1 5 ๐ผ.
  36. Exp-Concave Loss 37 Function ๐‘“: ฮ˜ โ†’ โ„ is ๐›ผ-exp-concave

    for some ๐›ผ > 0 if the following ๐‘”: ฮ˜ โ†’ โ„ is concave: ๐‘”: ๐‘ โ†ฆ e012(-). If ๐‘“: ฮ˜ โ†’ โ„ is twice-differentiable, ๐›ผ-exp-concavity is equivalent to โˆ‡5๐‘“ ๐‘ โ‰ฝ ๐›ผโˆ‡๐‘“ ๐‘ โˆ‡๐‘“ ๐‘ 6 โˆ€๐‘ โˆˆ ฮ˜. Cf. ๐›ผ-strong convexity requires โˆ‡5๐‘“ ๐‘ โ‰ฝ 1 5 ๐ผ. Examples โ€ข ๐‘“ ๐‘ = โˆ’log(๐‘Ÿ6๐‘) (appears in portfolio theory) satisfies โˆ‡5๐‘“ ๐‘ = 77$ 7$- % = โˆ‡๐‘“ ๐‘ โˆ‡๐‘“ ๐‘ 6. โ€ข ๐‘“ ๐‘ = ๐‘Ÿ6๐‘ 5 for ๐‘Ÿ , ๐‘ โ‰ค 1 (used later) satisfies โˆ‡5๐‘“ ๐‘ = 2๐‘Ÿ๐‘Ÿ6 = โˆ‡2 - โˆ‡2 - $ 5 7 % - % โ‰ฝ ' 5 โˆ‡๐‘“ ๐‘ โˆ‡๐‘“ ๐‘ 6.
  37. ONS Regret Bound 38 Set ฬ‚ ๐‘' โˆˆ ฮ˜. Fix

    ๐›พ, ๐œ€ > 0 and let ๐ด9 = ๐œ€๐ผ. For ๐‘ก = 1, โ€ฆ , ๐‘‡ Output ฬ‚ ๐‘% and observe ๐‘“% ๐ด% โ† ๐ด%0' + โˆ‡๐‘“% ฬ‚ ๐‘% โˆ‡๐‘“% ฬ‚ ๐‘% 6 ฬ‚ ๐‘%,' โ† arg min ฬ‚ ๐‘% โˆ’ ๐›พ0'๐ด% 0'โˆ‡๐‘“% ฬ‚ ๐‘% โˆ’ ๐‘ :" ๐‘ โˆˆ ฮ˜ Assume: โ€ข ๐‘“', โ€ฆ , ๐‘“( are twice differentiable and ๐›ผ-exp-concave, โ€ข โˆ‡๐‘“% ฬ‚ ๐‘% โ‰ค ๐บ for all ๐‘ก and ๐‘ โˆˆ ฮ˜. Let ๐›พ = ' 5 min 1/๐บ, ๐›ผ and ๐œ€ = 1/๐›พ5. Then, ฬ‚ ๐‘', โ€ฆ , ฬ‚ ๐‘( โˆˆ ฮ˜ computed by ONS satisfy For PSD ๐ด, ๐‘ฅ : โ‰” ๐‘ฅ6๐ด๐‘ฅ. โˆ‘%&' ( ๐‘“% ฬ‚ ๐‘% โˆ’ ๐‘“%(๐‘โˆ—) = ๐‘‚ ๐‘› ' 1 + ๐บ log ๐‘‡ . A well-known result. Weโ€™ll get back to this later Hazan et al. 2007
  38. Our ONS-based Method 39 Set ฬ‚ ๐‘' โˆˆ ฮ˜ For

    ๐‘ก = 1, โ€ฆ , ๐‘‡ Output ฬ‚ ๐‘% and observe (๐‘‹%, ๐‘ฅ%) Compute J ๐‘ฅ% โˆˆ arg max ฬ‚ ๐‘%, ๐‘ฅ ๐‘ฅ โˆˆ ๐‘‹% Get ฬ‚ ๐‘%,' via ONS applied to ๐‘“% ; For ๐œ‚ โˆˆ (0,1) (specified later), define ๐‘“% ;: ฮ˜ โ†’ โ„ by ๐‘“% ; ๐‘ โ‰” โˆ’๐œ‚ ฬ‚ ๐‘% โˆ’ ๐‘, J ๐‘ฅ% โˆ’ ๐‘ฅ% + ๐œ‚5 ฬ‚ ๐‘% โˆ’ ๐‘, J ๐‘ฅ% โˆ’ ๐‘ฅ% 5. Theorem For ฬ‚ ๐‘', โ€ฆ , ฬ‚ ๐‘( โˆˆ ฮ˜ computed by the above method, it holds that ๐‘…( -โˆ— โ‰ค O ๐‘…( -โˆ— = ๐‘‚(๐‘› log ๐‘‡). Upper bound: * ๐‘…" #โˆ— = โˆ‘!$% " ฬ‚ ๐‘! โˆ’ ๐‘โˆ—, ' ๐‘ฅ! โˆ’ ๐‘ฅ! Regret: ๐‘…" #โˆ— = โˆ‘!$% " ๐‘โˆ—, ' ๐‘ฅ! โˆ’ ๐‘ฅ!
  39. Regret Analysis 40 ๐‘“% ; ๐‘ โ‰” โˆ’๐œ‚ ฬ‚ ๐‘%

    โˆ’ ๐‘, J ๐‘ฅ% โˆ’ ๐‘ฅ% + ๐œ‚5 ฬ‚ ๐‘% โˆ’ ๐‘, J ๐‘ฅ% โˆ’ ๐‘ฅ% 5 with constant ๐œ‚ โˆˆ (0,1) enjoys ฮฉ(1)-exp-concavity and โˆ‡๐‘“% ;( ฬ‚ ๐‘%) = ๐‘‚(1) (by elementary calculation). ONS Regret Bound: โˆ‘%&' ( ๐‘“% ; ฬ‚ ๐‘% โˆ’ ๐‘“% ; ๐‘โˆ— = ๐‘‚(๐‘› log ๐‘‡)
  40. Regret Analysis 41 ๐‘“% ; ๐‘ โ‰” โˆ’๐œ‚ ฬ‚ ๐‘%

    โˆ’ ๐‘, J ๐‘ฅ% โˆ’ ๐‘ฅ% + ๐œ‚5 ฬ‚ ๐‘% โˆ’ ๐‘, J ๐‘ฅ% โˆ’ ๐‘ฅ% 5 with constant ๐œ‚ โˆˆ (0,1) enjoys ฮฉ(1)-exp-concavity and โˆ‡๐‘“% ;( ฬ‚ ๐‘%) = ๐‘‚(1) (by elementary calculation). ONS Regret Bound: โˆ‘%&' ( ๐‘“% ; ฬ‚ ๐‘% โˆ’ ๐‘“% ; ๐‘โˆ— = ๐‘‚(๐‘› log ๐‘‡) Proof Define ๐‘‰( -โˆ— โ‰” โˆ‘%&' ( ฬ‚ ๐‘% โˆ’ ๐‘โˆ—, J ๐‘ฅ% โˆ’ ๐‘ฅ% 5, which satisfies ๐‘‰( -โˆ— โ‰ค โˆ‘%&' ( ฬ‚ ๐‘% โˆ’ ๐‘โˆ— J ๐‘ฅ% โˆ’ ๐‘ฅ% ฬ‚ ๐‘% โˆ’ ๐‘โˆ—, J ๐‘ฅ% โˆ’ ๐‘ฅ% โ‰ค O ๐‘…( -โˆ— . โˆต ฬ‚ ๐‘! โˆ’ ๐‘โˆ—, ' ๐‘ฅ! โˆ’ ๐‘ฅ! โ‰ฅ 0 and Cauchyโ€“Schwarz. โ‰ค 1 โ‰ค 1
  41. Regret Analysis 42 ๐‘“% ; ๐‘ โ‰” โˆ’๐œ‚ ฬ‚ ๐‘%

    โˆ’ ๐‘, J ๐‘ฅ% โˆ’ ๐‘ฅ% + ๐œ‚5 ฬ‚ ๐‘% โˆ’ ๐‘, J ๐‘ฅ% โˆ’ ๐‘ฅ% 5 with constant ๐œ‚ โˆˆ (0,1) enjoys ฮฉ(1)-exp-concavity and โˆ‡๐‘“% ;( ฬ‚ ๐‘%) = ๐‘‚(1) (by elementary calculation). ONS Regret Bound: โˆ‘%&' ( ๐‘“% ; ฬ‚ ๐‘% โˆ’ ๐‘“% ; ๐‘โˆ— = ๐‘‚(๐‘› log ๐‘‡) Hence 1 โˆ’ ๐œ‚ O ๐‘…( -โˆ— โ‰ค ' ; โˆ‘%&' ( ๐‘“% ; ฬ‚ ๐‘% โˆ’ ๐‘“% ; ๐‘โˆ— = ๐‘‚(๐‘› log ๐‘‡), and setting ๐œ‚ = ' 5 completes the proof. Proof Define ๐‘‰( -โˆ— โ‰” โˆ‘%&' ( ฬ‚ ๐‘% โˆ’ ๐‘โˆ—, J ๐‘ฅ% โˆ’ ๐‘ฅ% 5, which satisfies ๐‘‰( -โˆ— โ‰ค โˆ‘%&' ( ฬ‚ ๐‘% โˆ’ ๐‘โˆ— J ๐‘ฅ% โˆ’ ๐‘ฅ% ฬ‚ ๐‘% โˆ’ ๐‘โˆ—, J ๐‘ฅ% โˆ’ ๐‘ฅ% โ‰ค O ๐‘…( -โˆ— . โˆต ฬ‚ ๐‘! โˆ’ ๐‘โˆ—, ' ๐‘ฅ! โˆ’ ๐‘ฅ! โ‰ฅ 0 and Cauchyโ€“Schwarz. O ๐‘…( -โˆ— = โˆ‘%&' ( ฬ‚ ๐‘% โˆ’ ๐‘โˆ—, J ๐‘ฅ% โˆ’ ๐‘ฅ% = โˆ’ ' ; โˆ‘% ( ๐‘“% ; ๐‘โˆ— + ๐œ‚๐‘‰( -โˆ— โ‰ค ' ; โˆ‘% ( ๐‘“% ; ฬ‚ ๐‘% โˆ’ ๐‘“% ; ๐‘โˆ— + ๐œ‚ O ๐‘…( -โˆ— . ๐‘“! ' ฬ‚ ๐‘! = 0 = ๐‘‚(๐‘› log ๐‘‡) โ‰ค 1 โ‰ค 1
  42. Warm-up: Online Gradient Descent 44 Set ฬ‚ ๐‘' โˆˆ ฮ˜.

    Fix ๐œ‚ > 0. For ๐‘ก = 1, โ€ฆ , ๐‘‡ Output ฬ‚ ๐‘% and observe ๐‘“% ฬ‚ ๐‘%,' โ† arg min ฬ‚ ๐‘% โˆ’ ๐œ‚โˆ‡๐‘“% ฬ‚ ๐‘% โˆ’ ๐‘ ๐‘ โˆˆ ฮ˜
  43. Warm-up: Online Gradient Descent 45 Set ฬ‚ ๐‘' โˆˆ ฮ˜.

    Fix ๐œ‚ > 0. For ๐‘ก = 1, โ€ฆ , ๐‘‡ Output ฬ‚ ๐‘% and observe ๐‘“% ฬ‚ ๐‘%,' โ† arg min ฬ‚ ๐‘% โˆ’ ๐œ‚โˆ‡๐‘“% ฬ‚ ๐‘% โˆ’ ๐‘ ๐‘ โˆˆ ฮ˜ From the update rule and the Pythagorean theorem, ฬ‚ ๐‘%,' โˆ’ ๐‘โˆ— 5 โ‰ค ฬ‚ ๐‘% โˆ’ ๐œ‚โˆ‡๐‘“% ฬ‚ ๐‘% โˆ’ ๐‘โˆ— 5 = ฬ‚ ๐‘% โˆ’ ๐‘โˆ— 5 + ๐œ‚5 โˆ‡๐‘“% ฬ‚ ๐‘% 5 โˆ’ 2๐œ‚ โˆ‡๐‘“% ฬ‚ ๐‘% , ฬ‚ ๐‘% โˆ’ ๐‘โˆ— .
  44. Warm-up: Online Gradient Descent 46 Set ฬ‚ ๐‘' โˆˆ ฮ˜.

    Fix ๐œ‚ > 0. For ๐‘ก = 1, โ€ฆ , ๐‘‡ Output ฬ‚ ๐‘% and observe ๐‘“% ฬ‚ ๐‘%,' โ† arg min ฬ‚ ๐‘% โˆ’ ๐œ‚โˆ‡๐‘“% ฬ‚ ๐‘% โˆ’ ๐‘ ๐‘ โˆˆ ฮ˜ From the update rule and the Pythagorean theorem, ฬ‚ ๐‘%,' โˆ’ ๐‘โˆ— 5 โ‰ค ฬ‚ ๐‘% โˆ’ ๐œ‚โˆ‡๐‘“% ฬ‚ ๐‘% โˆ’ ๐‘โˆ— 5 = ฬ‚ ๐‘% โˆ’ ๐‘โˆ— 5 + ๐œ‚5 โˆ‡๐‘“% ฬ‚ ๐‘% 5 โˆ’ 2๐œ‚ โˆ‡๐‘“% ฬ‚ ๐‘% , ฬ‚ ๐‘% โˆ’ ๐‘โˆ— . Summing over ๐‘ก and ignoring โˆ’ ฬ‚ ๐‘(,' โˆ’ ๐‘โˆ— 5 โ‰ค 0, โˆ‘%&' ( โˆ‡๐‘“% ฬ‚ ๐‘% , ฬ‚ ๐‘% โˆ’ ๐‘โˆ— โ‰ค โˆ‘%&' ( ฬ‚ -"0-โˆ— %0 ฬ‚ -"&'0-โˆ— % 5; + ; 5 โˆ‘%&' ( โˆ‡๐‘“% ฬ‚ ๐‘% 5 = ฬ‚ -'0-โˆ— % 5; + ; 5 โˆ‘%&' ( โˆ‡๐‘“% ฬ‚ ๐‘% 5
  45. Warm-up: Online Gradient Descent 47 Set ฬ‚ ๐‘' โˆˆ ฮ˜.

    Fix ๐œ‚ > 0. For ๐‘ก = 1, โ€ฆ , ๐‘‡ Output ฬ‚ ๐‘% and observe ๐‘“% ฬ‚ ๐‘%,' โ† arg min ฬ‚ ๐‘% โˆ’ ๐œ‚โˆ‡๐‘“% ฬ‚ ๐‘% โˆ’ ๐‘ ๐‘ โˆˆ ฮ˜ From the update rule and the Pythagorean theorem, ฬ‚ ๐‘%,' โˆ’ ๐‘โˆ— 5 โ‰ค ฬ‚ ๐‘% โˆ’ ๐œ‚โˆ‡๐‘“% ฬ‚ ๐‘% โˆ’ ๐‘โˆ— 5 = ฬ‚ ๐‘% โˆ’ ๐‘โˆ— 5 + ๐œ‚5 โˆ‡๐‘“% ฬ‚ ๐‘% 5 โˆ’ 2๐œ‚ โˆ‡๐‘“% ฬ‚ ๐‘% , ฬ‚ ๐‘% โˆ’ ๐‘โˆ— . Summing over ๐‘ก and ignoring โˆ’ ฬ‚ ๐‘(,' โˆ’ ๐‘โˆ— 5 โ‰ค 0, โˆ‘%&' ( โˆ‡๐‘“% ฬ‚ ๐‘% , ฬ‚ ๐‘% โˆ’ ๐‘โˆ— โ‰ค โˆ‘%&' ( ฬ‚ -"0-โˆ— %0 ฬ‚ -"&'0-โˆ— % 5; + ; 5 โˆ‘%&' ( โˆ‡๐‘“% ฬ‚ ๐‘% 5 = ฬ‚ -'0-โˆ— % 5; + ; 5 โˆ‘%&' ( โˆ‡๐‘“% ฬ‚ ๐‘% 5 ๐œ‚ = 1/ ๐บ๐‘‡ โ‰ค ' 5 ' ; + ๐œ‚๐บ๐‘‡ = ๐บ๐‘‡. diam ฮ˜ = 1
  46. Warm-up: Online Gradient Descent 48 Set ฬ‚ ๐‘' โˆˆ ฮ˜.

    Fix ๐œ‚ > 0. For ๐‘ก = 1, โ€ฆ , ๐‘‡ Output ฬ‚ ๐‘% and observe ๐‘“% ฬ‚ ๐‘%,' โ† arg min ฬ‚ ๐‘% โˆ’ ๐œ‚โˆ‡๐‘“% ฬ‚ ๐‘% โˆ’ ๐‘ ๐‘ โˆˆ ฮ˜ From the update rule and the Pythagorean theorem, ฬ‚ ๐‘%,' โˆ’ ๐‘โˆ— 5 โ‰ค ฬ‚ ๐‘% โˆ’ ๐œ‚โˆ‡๐‘“% ฬ‚ ๐‘% โˆ’ ๐‘โˆ— 5 = ฬ‚ ๐‘% โˆ’ ๐‘โˆ— 5 + ๐œ‚5 โˆ‡๐‘“% ฬ‚ ๐‘% 5 โˆ’ 2๐œ‚ โˆ‡๐‘“% ฬ‚ ๐‘% , ฬ‚ ๐‘% โˆ’ ๐‘โˆ— . Summing over ๐‘ก and ignoring โˆ’ ฬ‚ ๐‘(,' โˆ’ ๐‘โˆ— 5 โ‰ค 0, โˆ‘%&' ( โˆ‡๐‘“% ฬ‚ ๐‘% , ฬ‚ ๐‘% โˆ’ ๐‘โˆ— โ‰ค โˆ‘%&' ( ฬ‚ -"0-โˆ— %0 ฬ‚ -"&'0-โˆ— % 5; + ; 5 โˆ‘%&' ( โˆ‡๐‘“% ฬ‚ ๐‘% 5 = ฬ‚ -'0-โˆ— % 5; + ; 5 โˆ‘%&' ( โˆ‡๐‘“% ฬ‚ ๐‘% 5 By the convexity of ๐‘“% , โˆ‘%&' ( ๐‘“% ฬ‚ ๐‘% โˆ’ ๐‘“% ๐‘โˆ— โ‰ค โˆ‘%&' ( โˆ‡๐‘“% ฬ‚ ๐‘% , ฬ‚ ๐‘% โˆ’ ๐‘โˆ— โ‰ค ๐บ๐‘‡. ๐œ‚ = 1/ ๐บ๐‘‡ โ‰ค ' 5 ' ; + ๐œ‚๐บ๐‘‡ = ๐บ๐‘‡. diam ฮ˜ = 1
  47. Closer Look at the ๐‘ป Rate 49 Let โˆ‡%= โˆ‡๐‘“%(

    ฬ‚ ๐‘%) for brevity. In sum, the ๐‘‚( ๐‘‡) regret follows from โˆ‘%&' ( โˆ‡%, ฬ‚ ๐‘% โˆ’ ๐‘โˆ— โ‰ค ' 5; ฬ‚ ๐‘' โˆ’ ๐‘โˆ— 5 โˆ’ ฬ‚ ๐‘5 โˆ’ ๐‘โˆ— 5 + ฬ‚ ๐‘5 โˆ’ ๐‘โˆ— 5 + โ‹ฏ โˆ’ ฬ‚ ๐‘% โˆ’ ๐‘โˆ— 5 + ฬ‚ ๐‘% โˆ’ ๐‘โˆ— 5 โ‹ฏ โˆ’ ฬ‚ ๐‘(,' โˆ’ ๐‘โˆ— 5 + ; 5 โˆ‘%&' ( โˆ‡% 5 ๐œ‚ = 1/ ๐บ๐‘‡ โ‰ค ' 5 ' ; + ๐œ‚๐บ๐‘‡ = ๐บ๐‘‡.
  48. Closer Look at the ๐‘ป Rate 50 ๐œ‚ = 1/

    ๐บ๐‘‡ If the penalty and stability sum to ๐‘‚(1) and ๐‘‚(๐‘‡), respectively, the regret scales with ๐‘‡. Let โˆ‡%= โˆ‡๐‘“%( ฬ‚ ๐‘%) for brevity. In sum, the ๐‘‚( ๐‘‡) regret follows from โˆ‘%&' ( โˆ‡%, ฬ‚ ๐‘% โˆ’ ๐‘โˆ— stability โ‰ค ' 5; ฬ‚ ๐‘' โˆ’ ๐‘โˆ— 5 โˆ’ ฬ‚ ๐‘5 โˆ’ ๐‘โˆ— 5 + ฬ‚ ๐‘5 โˆ’ ๐‘โˆ— 5 + โ‹ฏ โˆ’ ฬ‚ ๐‘% โˆ’ ๐‘โˆ— 5 + ฬ‚ ๐‘% โˆ’ ๐‘โˆ— 5 โ‹ฏ โˆ’ ฬ‚ ๐‘(,' โˆ’ ๐‘โˆ— 5 + ; 5 โˆ‘%&' ( โˆ‡% 5 โ‰ค ' 5 ' ; + ๐œ‚๐บ๐‘‡ = ๐บ๐‘‡. penalty = ๐ŸŽ Consider achieving a better stability via the elliptical potential lemma. ignored
  49. Elliptical Potential Lemma 51 Recall ๐‘ฅ : โ‰” ๐‘ฅ6๐ด๐‘ฅ for

    PSD matrix ๐ด. Let ๐ด9 = ๐œ€๐ผ and ๐ด% = ๐ด%0' + โˆ‡%โˆ‡% 6 for ๐‘ก = 1, โ€ฆ , ๐‘‡. Then, it holds that โˆ‘%&' ( โˆ‡% :" (' 5 โ‰ค log =>? :) =>? :* โ‰ค ๐‘› log (@% A + 1 . โˆ‡! โ‰ค ๐บ
  50. Elliptical Potential Lemma 52 Recall ๐‘ฅ : โ‰” ๐‘ฅ6๐ด๐‘ฅ for

    PSD matrix ๐ด. Let ๐ด9 = ๐œ€๐ผ and ๐ด% = ๐ด%0' + โˆ‡%โˆ‡% 6 for ๐‘ก = 1, โ€ฆ , ๐‘‡. Then, it holds that โˆ‘%&' ( โˆ‡% :" (' 5 โ‰ค log =>? :) =>? :* โ‰ค ๐‘› log (@% A + 1 . โˆ‡! โ‰ค ๐บ Proof โˆ‡% :" (' 5 = โˆ‡% 6๐ด% 0'โˆ‡%= ๐ด% 0' r โˆ‡%โˆ‡% 6= ๐ด% 0' r ๐ด% โˆ’ ๐ด%0' โ‰ค log =>? :" =>? :"(' . For ๐‘Ž, ๐‘ > 0, ๐‘Ž(% ๐‘Ž โˆ’ ๐‘ โ‰ค log ) * . Carefully apply this to eigenvalues. Hence โˆ‘%&' ( โˆ‡% :" (' 5 โ‰ค log =>? :) =>? :* . Using det ๐ด9 = ๐œ€" and det ๐ด( โ‰ค ๐‘‡๐บ5 " yields the bound. A.k.a. the concavity of log-det for PSD matrices: โˆ‡log det ๐ด r ๐ด โˆ’ ๐ต = ๐ด0' r ๐ด โˆ’ ๐ต โ‰ผ log det ๐ด det ๐ต = log det ๐ด โˆ’ log det ๐ต.
  51. Online Newton Step 53 Set ฬ‚ ๐‘' โˆˆ ฮ˜. Fix

    ๐›พ, ๐œ€ > 0 and let ๐ด9 = ๐œ€๐ผ. For ๐‘ก = 1, โ€ฆ , ๐‘‡ Output ฬ‚ ๐‘% and observe ๐‘“% ๐ด% โ† ๐ด%0' + โˆ‡%โˆ‡% 6 ฬ‚ ๐‘%,' โ† arg min ฬ‚ ๐‘% โˆ’ ๐›พ0'๐ด% 0'โˆ‡% โˆ’ ๐‘ :" ๐‘ โˆˆ ฮ˜
  52. Online Newton Step 54 Set ฬ‚ ๐‘' โˆˆ ฮ˜. Fix

    ๐›พ, ๐œ€ > 0 and let ๐ด9 = ๐œ€๐ผ. For ๐‘ก = 1, โ€ฆ , ๐‘‡ Output ฬ‚ ๐‘% and observe ๐‘“% ๐ด% โ† ๐ด%0' + โˆ‡%โˆ‡% 6 ฬ‚ ๐‘%,' โ† arg min ฬ‚ ๐‘% โˆ’ ๐›พ0'๐ด% 0'โˆ‡% โˆ’ ๐‘ :" ๐‘ โˆˆ ฮ˜ From the update rule and the Pythagorean theorem, ฬ‚ ๐‘%,' โˆ’ ๐‘โˆ— :" 5 โ‰ค ฬ‚ ๐‘% โˆ’ ๐›พ0'๐ด% 0'โˆ‡% โˆ’ ๐‘โˆ— :" 5 = ฬ‚ ๐‘% โˆ’ ๐‘โˆ— :" 5 + ๐›พ05 โˆ‡% :" (' 5 โˆ’ 2๐›พ0' โˆ‡%, ฬ‚ ๐‘% โˆ’ ๐‘โˆ— .
  53. Online Newton Step 55 Set ฬ‚ ๐‘' โˆˆ ฮ˜. Fix

    ๐›พ, ๐œ€ > 0 and let ๐ด9 = ๐œ€๐ผ. For ๐‘ก = 1, โ€ฆ , ๐‘‡ Output ฬ‚ ๐‘% and observe ๐‘“% ๐ด% โ† ๐ด%0' + โˆ‡%โˆ‡% 6 ฬ‚ ๐‘%,' โ† arg min ฬ‚ ๐‘% โˆ’ ๐›พ0'๐ด% 0'โˆ‡% โˆ’ ๐‘ :" ๐‘ โˆˆ ฮ˜ From the update rule and the Pythagorean theorem, ฬ‚ ๐‘%,' โˆ’ ๐‘โˆ— :" 5 โ‰ค ฬ‚ ๐‘% โˆ’ ๐›พ0'๐ด% 0'โˆ‡% โˆ’ ๐‘โˆ— :" 5 = ฬ‚ ๐‘% โˆ’ ๐‘โˆ— :" 5 + ๐›พ05 โˆ‡% :" (' 5 โˆ’ 2๐›พ0' โˆ‡%, ฬ‚ ๐‘% โˆ’ ๐‘โˆ— . Summing over ๐‘ก, โˆ‘%&' ( โˆ‡%, ฬ‚ ๐‘% โˆ’ ๐‘โˆ— โ‰ค B 5 โˆ‘%&' ( ฬ‚ ๐‘% โˆ’ ๐‘โˆ— :" 5 โˆ’ ฬ‚ ๐‘%,' โˆ’ ๐‘โˆ— :" 5 + ' 5B โˆ‘%&' ( โˆ‡% :" (' 5
  54. Online Newton Step 56 Set ฬ‚ ๐‘' โˆˆ ฮ˜. Fix

    ๐›พ, ๐œ€ > 0 and let ๐ด9 = ๐œ€๐ผ. For ๐‘ก = 1, โ€ฆ , ๐‘‡ Output ฬ‚ ๐‘% and observe ๐‘“% ๐ด% โ† ๐ด%0' + โˆ‡%โˆ‡% 6 ฬ‚ ๐‘%,' โ† arg min ฬ‚ ๐‘% โˆ’ ๐›พ0'๐ด% 0'โˆ‡% โˆ’ ๐‘ :" ๐‘ โˆˆ ฮ˜ From the update rule and the Pythagorean theorem, ฬ‚ ๐‘%,' โˆ’ ๐‘โˆ— :" 5 โ‰ค ฬ‚ ๐‘% โˆ’ ๐›พ0'๐ด% 0'โˆ‡% โˆ’ ๐‘โˆ— :" 5 = ฬ‚ ๐‘% โˆ’ ๐‘โˆ— :" 5 + ๐›พ05 โˆ‡% :" (' 5 โˆ’ 2๐›พ0' โˆ‡%, ฬ‚ ๐‘% โˆ’ ๐‘โˆ— . Summing over ๐‘ก, โˆ‘%&' ( โˆ‡%, ฬ‚ ๐‘% โˆ’ ๐‘โˆ— โ‰ค B 5 โˆ‘%&' ( ฬ‚ ๐‘% โˆ’ ๐‘โˆ— :" 5 โˆ’ ฬ‚ ๐‘%,' โˆ’ ๐‘โˆ— :" 5 + ' 5B โˆ‘%&' ( โˆ‡% :" (' 5 โ‰ค B 5 โˆ‘%&' ( ฬ‚ ๐‘% โˆ’ ๐‘โˆ— :" 5 โˆ’ ฬ‚ ๐‘%,' โˆ’ ๐‘โˆ— :" 5 + " 5B log (@% A + 1 Elliptical potential lem.
  55. Online Newton Step 57 Set ฬ‚ ๐‘' โˆˆ ฮ˜. Fix

    ๐›พ, ๐œ€ > 0 and let ๐ด9 = ๐œ€๐ผ. For ๐‘ก = 1, โ€ฆ , ๐‘‡ Output ฬ‚ ๐‘% and observe ๐‘“% ๐ด% โ† ๐ด%0' + โˆ‡%โˆ‡% 6 ฬ‚ ๐‘%,' โ† arg min ฬ‚ ๐‘% โˆ’ ๐›พ0'๐ด% 0'โˆ‡% โˆ’ ๐‘ :" ๐‘ โˆˆ ฮ˜ From the update rule and the Pythagorean theorem, ฬ‚ ๐‘%,' โˆ’ ๐‘โˆ— :" 5 โ‰ค ฬ‚ ๐‘% โˆ’ ๐›พ0'๐ด% 0'โˆ‡% โˆ’ ๐‘โˆ— :" 5 = ฬ‚ ๐‘% โˆ’ ๐‘โˆ— :" 5 + ๐›พ05 โˆ‡% :" (' 5 โˆ’ 2๐›พ0' โˆ‡%, ฬ‚ ๐‘% โˆ’ ๐‘โˆ— . Summing over ๐‘ก, โˆ‘%&' ( โˆ‡%, ฬ‚ ๐‘% โˆ’ ๐‘โˆ— โ‰ค B 5 โˆ‘%&' ( ฬ‚ ๐‘% โˆ’ ๐‘โˆ— :" 5 โˆ’ ฬ‚ ๐‘%,' โˆ’ ๐‘โˆ— :" 5 + ' 5B โˆ‘%&' ( โˆ‡% :" (' 5 โ‰ค B 5 โˆ‘%&' ( ฬ‚ ๐‘% โˆ’ ๐‘โˆ— :" 5 โˆ’ ฬ‚ ๐‘%,' โˆ’ ๐‘โˆ— :" 5 + " 5B log (@% A + 1 Elliptical potential lem. Take a closer look at the penalty.
  56. ONS: Penalty 58 โˆ‘%&' ( ฬ‚ ๐‘% โˆ’ ๐‘โˆ— :"

    5 โˆ’ ฬ‚ ๐‘%,' โˆ’ ๐‘โˆ— :" 5 = ฬ‚ ๐‘' โˆ’ ๐‘โˆ— :' 5 โˆ’ ฬ‚ ๐‘5 โˆ’ ๐‘โˆ— :' 5 + ฬ‚ ๐‘5 โˆ’ ๐‘โˆ— :% 5 โˆ’ โ‹ฏ โˆ’ ฬ‚ ๐‘% โˆ’ ๐‘โˆ— :"(' 5 + ฬ‚ ๐‘% โˆ’ ๐‘โˆ— :" 5 โ‹ฏ โˆ’ ฬ‚ ๐‘(,' โˆ’ ๐‘โˆ— :) 5
  57. ONS: Penalty 59 โˆ‘%&' ( ฬ‚ ๐‘% โˆ’ ๐‘โˆ— :"

    5 โˆ’ ฬ‚ ๐‘%,' โˆ’ ๐‘โˆ— :" 5 = ฬ‚ ๐‘' โˆ’ ๐‘โˆ— :' 5 โˆ’ ฬ‚ ๐‘5 โˆ’ ๐‘โˆ— :' 5 + ฬ‚ ๐‘5 โˆ’ ๐‘โˆ— :% 5 โˆ’ โ‹ฏ โˆ’ ฬ‚ ๐‘% โˆ’ ๐‘โˆ— :"(' 5 + ฬ‚ ๐‘% โˆ’ ๐‘โˆ— :" 5 โ‹ฏ โˆ’ ฬ‚ ๐‘(,' โˆ’ ๐‘โˆ— :) 5 penalty = ฬ‚ ๐‘! โˆ’ ๐‘โˆ— + ๐ด! โˆ’ ๐ด!(% ฬ‚ ๐‘! โˆ’ ๐‘โˆ— = โˆ‡! + ฬ‚ ๐‘! โˆ’ ๐‘โˆ— , ignored
  58. ONS: Penalty 60 โˆ‘%&' ( ฬ‚ ๐‘% โˆ’ ๐‘โˆ— :"

    5 โˆ’ ฬ‚ ๐‘%,' โˆ’ ๐‘โˆ— :" 5 = ฬ‚ ๐‘' โˆ’ ๐‘โˆ— :' 5 โˆ’ ฬ‚ ๐‘5 โˆ’ ๐‘โˆ— :' 5 + ฬ‚ ๐‘5 โˆ’ ๐‘โˆ— :% 5 โˆ’ โ‹ฏ โˆ’ ฬ‚ ๐‘% โˆ’ ๐‘โˆ— :"(' 5 + ฬ‚ ๐‘% โˆ’ ๐‘โˆ— :" 5 โ‹ฏ โˆ’ ฬ‚ ๐‘(,' โˆ’ ๐‘โˆ— :) 5 โ‰ค ฬ‚ ๐‘' โˆ’ ๐‘โˆ— :* 5 + ฬ‚ ๐‘' โˆ’ ๐‘โˆ— :' 5 โˆ’ ฬ‚ ๐‘' โˆ’ ๐‘โˆ— :* 5 + โˆ‘%&5 ( โˆ‡% 6 ฬ‚ ๐‘% โˆ’ ๐‘โˆ— 5 penalty = ฬ‚ ๐‘! โˆ’ ๐‘โˆ— + ๐ด! โˆ’ ๐ด!(% ฬ‚ ๐‘! โˆ’ ๐‘โˆ— = โˆ‡! + ฬ‚ ๐‘! โˆ’ ๐‘โˆ— , ignored โ‰ค ๐œ€ + โˆ‘%&' ( โˆ‡% 6 ฬ‚ ๐‘% โˆ’ ๐‘โˆ— 5
  59. ONS: Penalty 61 โˆ‘%&' ( ฬ‚ ๐‘% โˆ’ ๐‘โˆ— :"

    5 โˆ’ ฬ‚ ๐‘%,' โˆ’ ๐‘โˆ— :" 5 = ฬ‚ ๐‘' โˆ’ ๐‘โˆ— :' 5 โˆ’ ฬ‚ ๐‘5 โˆ’ ๐‘โˆ— :' 5 + ฬ‚ ๐‘5 โˆ’ ๐‘โˆ— :% 5 โˆ’ โ‹ฏ โˆ’ ฬ‚ ๐‘% โˆ’ ๐‘โˆ— :"(' 5 + ฬ‚ ๐‘% โˆ’ ๐‘โˆ— :" 5 โ‹ฏ โˆ’ ฬ‚ ๐‘(,' โˆ’ ๐‘โˆ— :) 5 โ‰ค ฬ‚ ๐‘' โˆ’ ๐‘โˆ— :* 5 + ฬ‚ ๐‘' โˆ’ ๐‘โˆ— :' 5 โˆ’ ฬ‚ ๐‘' โˆ’ ๐‘โˆ— :* 5 + โˆ‘%&5 ( โˆ‡% 6 ฬ‚ ๐‘% โˆ’ ๐‘โˆ— 5 penalty = ฬ‚ ๐‘! โˆ’ ๐‘โˆ— + ๐ด! โˆ’ ๐ด!(% ฬ‚ ๐‘! โˆ’ ๐‘โˆ— = โˆ‡! + ฬ‚ ๐‘! โˆ’ ๐‘โˆ— , โ‰ค ๐œ€ + โˆ‘%&' ( โˆ‡% 6 ฬ‚ ๐‘% โˆ’ ๐‘โˆ— 5 Therefore, setting ๐œ€ = 1/๐›พ5, โˆ‘%&' ( โˆ‡%, ฬ‚ ๐‘% โˆ’ ๐‘โˆ— โ‰ค BA 5 + B 5 โˆ‘%&' ( โˆ‡% 6 ฬ‚ ๐‘% โˆ’ ๐‘โˆ— 5 + " 5B log (@% A + 1 โŸน โˆ‘%&' ( โˆ‡%, ฬ‚ ๐‘% โˆ’ ๐‘โˆ— โˆ’ B 5 โˆ‡% 6 ฬ‚ ๐‘% โˆ’ ๐‘โˆ— 5 โ‰ค " 5B log ๐‘‡๐บ5๐›พ5 + 1 + ' 5B . ignored
  60. ONS: Using Exp-Concavity 62 If ๐‘“% is ๐›ผ-exp-concave, for ๐›พ

    โ‰ค ' 5 min {๐›ผ, 1/๐บ}, ๐‘“% ฬ‚ ๐‘% โˆ’ ๐‘“% ๐‘โˆ— โ‰ค โˆ‡%, ฬ‚ ๐‘% โˆ’ ๐‘โˆ— โˆ’ B 5 โˆ‡% 6 ฬ‚ ๐‘% โˆ’ ๐‘โˆ— 5 . Therefore, setting ๐›พ = ' 5 min {๐›ผ, 1/๐บ}, we obtain the desired regret bound: โˆ‘%&' ( ๐‘“% ฬ‚ ๐‘% โˆ’ ๐‘“% ๐‘โˆ— โ‰ค โˆ‘%&' ( โˆ‡%, ฬ‚ ๐‘% โˆ’ ๐‘โˆ— โˆ’ B 5 โˆ‡% 6 ฬ‚ ๐‘% โˆ’ ๐‘โˆ— 5 โ‰ค ๐‘› ' 1 + ๐บ log ( . + 1 + 1 . cf. if ๐‘“! is convex, ๐‘“! ฬ‚ ๐‘! โˆ’ ๐‘“! ๐‘โˆ— โ‰ค โˆ‡! , ฬ‚ ๐‘! โˆ’ ๐‘โˆ— . โ‰ค " 5B log ๐‘‡๐บ5๐›พ5 + 1 + ' 5B . + ,-. /,1 โ‰ค + / + + 1 (harmonic mean โ‰ค ๐‘› ร—min)
  61. MetaGrad 63 Tim van Erven, Wouter M. Koolen, and Dirk

    van der Hoeven (NeurIPS 2016, JMLR 2021)
  62. ONS Requires Prior Knowledge of ๐œถ 64 Set ฬ‚ ๐‘'

    โˆˆ ฮ˜. Fix ๐›พ = ' 5 min{๐›ผ, 1/๐บ}, ๐œ€ = 1/๐›พ5 and let ๐ด9 = ๐œ€๐ผ. For ๐‘ก = 1, โ€ฆ , ๐‘‡ Output ฬ‚ ๐‘% and observe ๐‘“% ๐ด% โ† ๐ด%0' + โˆ‡%โˆ‡% 6 ฬ‚ ๐‘%,' โ† arg min ฬ‚ ๐‘% โˆ’ ๐›พ0'๐ด% 0'โˆ‡% โˆ’ ๐‘ :" ๐‘ โˆˆ ฮ˜ We may not know ๐›ผ in advance. Even worse, we may encounter ๐›ผ = 0 at some round. ONS fails in such uncertain situationsโ€ฆ Adapt to the uncertainty with multiple ONS!
  63. MetaGrad: High-Level Idea 65 โ€ข Loss functions ๐‘“% are ๐›ผ-exp-concave,

    but ๐›ผ is unknown, possibly ๐›ผ = 0. โ€ข Keep experts with different learning rates ๐œ‚ > 0, called ๐œ‚-experts. โ€ข Each ๐œ‚-expert runs ONS with its own exp-concave surrogate loss ๐‘“% ;. โ€ข MetaGrad aggregates the expertsโ€™ outputs to return a single output. โ‹ฏ ๐œผ-experts ๐Ÿค– Learner
  64. MetaGrad: High-Level Idea 66 โ€ข Loss functions ๐‘“% are ๐›ผ-exp-concave,

    but ๐›ผ is unknown, possibly ๐›ผ = 0. โ€ข Keep experts with different learning rates ๐œ‚ > 0, called ๐œ‚-experts. โ€ข Each ๐œ‚-expert runs ONS with its own exp-concave surrogate loss ๐‘“% ;. โ€ข MetaGrad aggregates the expertsโ€™ outputs to return a single output. โ‹ฏ ๐œผ-experts ๐Ÿค– Learner Theorem (Van Erven et al. 2016, 2021) MetaGrad simultaneously enjoys โ€ข โˆ‘%&' ( ๐‘“% ๐‘ค% โˆ’ ๐‘“%(๐‘ข) โ‰ค ๐‘‚(๐‘› ๐บ + 1/๐›ผ log ๐‘‡), โ€ข โˆ‘%&' ( ๐‘“% ๐‘ค% โˆ’ ๐‘“%(๐‘ข) โ‰ค ๐‘‚(๐บ ๐‘‡ log log ๐‘‡).
  65. Notation 67 โ€ข For each ๐‘ก: learnerโ€™s loss ๐‘“% ,

    learnerโ€™s output ๐‘ค% , and ๐‘”% โˆˆ ๐œ•๐‘“%(๐‘ค%). โ€ข ๐’ข โ‰” ๐œ‚C = 5(2 D@ ๐‘– = 0,1,2, โ€ฆ , ' 5 log ๐‘‡ is the set of ๐œ‚ values; ๐’ข = ฮ˜(log ๐‘‡). โ€ข For each ๐‘ก and ๐œ‚: ๐œ‚-expertโ€™s loss ๐‘“% ; and ๐œ‚-expertโ€™s output ๐‘ค% ;, where ๐Ÿค– โ‹ฏ Learner ๐œผ-experts ๐‘“% ; ๐‘ค โ‰” โˆ’๐œ‚ ๐‘ค% โˆ’ ๐‘ค, ๐‘”% + ๐œ‚5 ๐‘ค% โˆ’ ๐‘ค, ๐‘”% 5 โˆ€๐‘ค โˆˆ ฮ˜.
  66. Notation 68 โ€ข For each ๐‘ก: learnerโ€™s loss ๐‘“% ,

    learnerโ€™s output ๐‘ค% , and ๐‘”% โˆˆ ๐œ•๐‘“%(๐‘ค%). โ€ข ๐’ข โ‰” ๐œ‚C = 5(2 D@ ๐‘– = 0,1,2, โ€ฆ , ' 5 log ๐‘‡ is the set of ๐œ‚ values; ๐’ข = ฮ˜(log ๐‘‡). โ€ข For each ๐‘ก and ๐œ‚: ๐œ‚-expertโ€™s loss ๐‘“% ; and ๐œ‚-expertโ€™s output ๐‘ค% ;, where ๐Ÿค– โ‹ฏ Learner ๐œผ-experts At round ๐‘ก 1. Play ๐‘ค% 2. Incur ๐‘“%(๐‘ค%) and observe ๐‘”% โˆˆ ๐œ•๐‘“%(๐‘ค%) ๐‘“% ; ๐‘ค โ‰” โˆ’๐œ‚ ๐‘ค% โˆ’ ๐‘ค, ๐‘”% + ๐œ‚5 ๐‘ค% โˆ’ ๐‘ค, ๐‘”% 5 โˆ€๐‘ค โˆˆ ฮ˜.
  67. Notation 69 โ€ข For each ๐‘ก: learnerโ€™s loss ๐‘“% ,

    learnerโ€™s output ๐‘ค% , and ๐‘”% โˆˆ ๐œ•๐‘“%(๐‘ค%). โ€ข ๐’ข โ‰” ๐œ‚C = 5(2 D@ ๐‘– = 0,1,2, โ€ฆ , ' 5 log ๐‘‡ is the set of ๐œ‚ values; ๐’ข = ฮ˜(log ๐‘‡). โ€ข For each ๐‘ก and ๐œ‚: ๐œ‚-expertโ€™s loss ๐‘“% ; and ๐œ‚-expertโ€™s output ๐‘ค% ;, where ๐Ÿค– โ‹ฏ Learner ๐œผ-experts At round ๐‘ก 1. Play ๐‘ค% 2. Incur ๐‘“%(๐‘ค%) and observe ๐‘”% โˆˆ ๐œ•๐‘“%(๐‘ค%) 3. Send ๐‘ค% and ๐‘”% ๐‘“% ; ๐‘ค โ‰” โˆ’๐œ‚ ๐‘ค% โˆ’ ๐‘ค, ๐‘”% + ๐œ‚5 ๐‘ค% โˆ’ ๐‘ค, ๐‘”% 5 โˆ€๐‘ค โˆˆ ฮ˜.
  68. Notation 70 โ€ข For each ๐‘ก: learnerโ€™s loss ๐‘“% ,

    learnerโ€™s output ๐‘ค% , and ๐‘”% โˆˆ ๐œ•๐‘“%(๐‘ค%). โ€ข ๐’ข โ‰” ๐œ‚C = 5(2 D@ ๐‘– = 0,1,2, โ€ฆ , ' 5 log ๐‘‡ is the set of ๐œ‚ values; ๐’ข = ฮ˜(log ๐‘‡). โ€ข For each ๐‘ก and ๐œ‚: ๐œ‚-expertโ€™s loss ๐‘“% ; and ๐œ‚-expertโ€™s output ๐‘ค% ;, where ๐Ÿค– โ‹ฏ Learner ๐œผ-experts At round ๐‘ก 1. Play ๐‘ค% 2. Incur ๐‘“%(๐‘ค%) and observe ๐‘”% โˆˆ ๐œ•๐‘“%(๐‘ค%) 3. Send ๐‘ค% and ๐‘”% 4. Compute ๐‘ค%,' ; via ONS applied to ๐‘“% ; ๐‘“% ; ๐‘ค โ‰” โˆ’๐œ‚ ๐‘ค% โˆ’ ๐‘ค, ๐‘”% + ๐œ‚5 ๐‘ค% โˆ’ ๐‘ค, ๐‘”% 5 โˆ€๐‘ค โˆˆ ฮ˜.
  69. Notation 71 โ€ข For each ๐‘ก: learnerโ€™s loss ๐‘“% ,

    learnerโ€™s output ๐‘ค% , and ๐‘”% โˆˆ ๐œ•๐‘“%(๐‘ค%). โ€ข ๐’ข โ‰” ๐œ‚C = 5(2 D@ ๐‘– = 0,1,2, โ€ฆ , ' 5 log ๐‘‡ is the set of ๐œ‚ values; ๐’ข = ฮ˜(log ๐‘‡). โ€ข For each ๐‘ก and ๐œ‚: ๐œ‚-expertโ€™s loss ๐‘“% ; and ๐œ‚-expertโ€™s output ๐‘ค% ;, where ๐Ÿค– โ‹ฏ Learner ๐œผ-experts At round ๐‘ก 1. Play ๐‘ค% 2. Incur ๐‘“%(๐‘ค%) and observe ๐‘”% โˆˆ ๐œ•๐‘“%(๐‘ค%) ๐‘“% ; ๐‘ค โ‰” โˆ’๐œ‚ ๐‘ค% โˆ’ ๐‘ค, ๐‘”% + ๐œ‚5 ๐‘ค% โˆ’ ๐‘ค, ๐‘”% 5 โˆ€๐‘ค โˆˆ ฮ˜. 3. Send ๐‘ค% and ๐‘”% 5. Aggregate ๐‘ค%,' ; to compute ๐‘ค%,' 4. Compute ๐‘ค%,' ; via ONS applied to ๐‘“% ;
  70. Regret Decomposition 72 Since every ๐‘“% is convex, for any

    comparator ๐‘ข โˆˆ ฮ˜, we have โˆ‘%&' ( ๐‘“% ๐‘ค% โˆ’ ๐‘“%(๐‘ข) โ‰ค โˆ‘%&' ( ๐‘ค% โˆ’ ๐‘ข, ๐‘”% =โˆถ O ๐‘…( E.
  71. Regret Decomposition 73 Since every ๐‘“% is convex, for any

    comparator ๐‘ข โˆˆ ฮ˜, we have โˆ‘%&' ( ๐‘“% ๐‘ค% โˆ’ ๐‘“%(๐‘ข) โ‰ค โˆ‘%&' ( ๐‘ค% โˆ’ ๐‘ข, ๐‘”% =โˆถ O ๐‘…( E. Decomposition of O ๐‘…( E O ๐‘…( E = โˆ‘%&' ( ๐‘ค% โˆ’ ๐‘ข, ๐‘”% = โˆ’ ' ; โˆ‘%&' ( ๐‘“% ; ๐‘ข + ๐œ‚ โˆ‘%&' ( ๐‘ค% โˆ’ ๐‘ข, ๐‘”% 5 = โˆ’ ' ; โˆ‘%&' ( ๐‘“% ; ๐‘ข + ๐œ‚๐‘‰( E. =โˆถ ๐‘‰( E Recall ๐‘“% ; ๐‘ค = โˆ’๐œ‚ ๐‘ค% โˆ’ ๐‘ค, ๐‘”% + ๐œ‚5 ๐‘ค% โˆ’ ๐‘ค, ๐‘”% 5.
  72. Regret Decomposition 74 Since every ๐‘“% is convex, for any

    comparator ๐‘ข โˆˆ ฮ˜, we have โˆ‘%&' ( ๐‘“% ๐‘ค% โˆ’ ๐‘“%(๐‘ข) โ‰ค โˆ‘%&' ( ๐‘ค% โˆ’ ๐‘ข, ๐‘”% =โˆถ O ๐‘…( E. Decomposition of O ๐‘…( E O ๐‘…( E = โˆ‘%&' ( ๐‘ค% โˆ’ ๐‘ข, ๐‘”% = โˆ’ ' ; โˆ‘%&' ( ๐‘“% ; ๐‘ข + ๐œ‚ โˆ‘%&' ( ๐‘ค% โˆ’ ๐‘ข, ๐‘”% 5 = โˆ’ ' ; โˆ‘%&' ( ๐‘“% ; ๐‘ข + ๐œ‚๐‘‰( E. =โˆถ ๐‘‰( E By using ๐‘“% ; ๐‘ค% = 0, for all ๐œผ โˆˆ ๐“– simultaneously, O ๐‘…( E = ' ; โˆ‘%&' ( ๐‘“% ; ๐‘ค% โˆ’ ๐‘“% ; ๐‘ค% ; + โˆ‘%&' ( ๐‘“% ; ๐‘ค% ; โˆ’ ๐‘“% ; ๐‘ข + ๐œ‚๐‘‰( E. Regret of learner against ๐‘ค! ' Regret of ๐œ‚-expert against ๐‘ข Recall ๐‘“% ; ๐‘ค = โˆ’๐œ‚ ๐‘ค% โˆ’ ๐‘ค, ๐‘”% + ๐œ‚5 ๐‘ค% โˆ’ ๐‘ค, ๐‘”% 5.
  73. Bounding Each Component 75 For all ๐œ‚ โˆˆ ๐’ข simultaneously,

    O ๐‘…( E = ' ; โˆ‘%&' ( ๐‘“% ; ๐‘ค% โˆ’ ๐‘“% ; ๐‘ค% ; + โˆ‘%&' ( ๐‘“% ; ๐‘ค% ; โˆ’ ๐‘“% ; ๐‘ข + ๐œ‚๐‘‰( E. 1. Regret of learner against ๐‘ค! ' 2. Regret of ๐œ‚-expert against ๐‘ข
  74. Bounding Each Component 76 For all ๐œ‚ โˆˆ ๐’ข simultaneously,

    O ๐‘…( E = ' ; โˆ‘%&' ( ๐‘“% ; ๐‘ค% โˆ’ ๐‘“% ; ๐‘ค% ; + โˆ‘%&' ( ๐‘“% ; ๐‘ค% ; โˆ’ ๐‘“% ; ๐‘ข + ๐œ‚๐‘‰( E. If ๐‘ค% ; are aggregated by the exponentially weighted averaging, 1 is ๐‘‚(log log ๐‘‡). 1. Regret of learner against ๐‘ค! ' 2. Regret of ๐œ‚-expert against ๐‘ข
  75. Bounding Each Component 77 For all ๐œ‚ โˆˆ ๐’ข simultaneously,

    O ๐‘…( E = ' ; โˆ‘%&' ( ๐‘“% ; ๐‘ค% โˆ’ ๐‘“% ; ๐‘ค% ; + โˆ‘%&' ( ๐‘“% ; ๐‘ค% ; โˆ’ ๐‘“% ; ๐‘ข + ๐œ‚๐‘‰( E. If ๐‘ค% ; are aggregated by the exponentially weighted averaging, 1 is ๐‘‚(log log ๐‘‡). Since ๐‘ค% ; is computed by ONS applied to ๐‘“% ;, 2 is ๐‘‚ ๐‘› log ๐‘‡ . (By elementary calculation, ๐‘“! ' is ฮฉ(1)-exp-concave and โˆ‡๐‘“! '(๐‘ค! ') = ๐‘‚(1) for every ๐œ‚ โˆˆ ๐’ข โІ 0, % -. .) 1. Regret of learner against ๐‘ค! ' 2. Regret of ๐œ‚-expert against ๐‘ข
  76. Bounding Each Component 78 For all ๐œ‚ โˆˆ ๐’ข simultaneously,

    O ๐‘…( E = ' ; โˆ‘%&' ( ๐‘“% ; ๐‘ค% โˆ’ ๐‘“% ; ๐‘ค% ; + โˆ‘%&' ( ๐‘“% ; ๐‘ค% ; โˆ’ ๐‘“% ; ๐‘ข + ๐œ‚๐‘‰( E. If ๐‘ค% ; are aggregated by the exponentially weighted averaging, 1 is ๐‘‚(log log ๐‘‡). Since ๐‘ค% ; is computed by ONS applied to ๐‘“% ;, 2 is ๐‘‚ ๐‘› log ๐‘‡ . (By elementary calculation, ๐‘“! ' is ฮฉ(1)-exp-concave and โˆ‡๐‘“! '(๐‘ค! ') = ๐‘‚(1) for every ๐œ‚ โˆˆ ๐’ข โІ 0, % -. .) Therefore, for all ๐œ‚ โˆˆ ๐’ข simultaneously, O ๐‘…( E = ๐‘‚ " FGH ( ; + ๐œ‚๐‘‰( E . 1. Regret of learner against ๐‘ค! ' 2. Regret of ๐œ‚-expert against ๐‘ข
  77. Infeasible Ideal Tuning 79 If ๐‘‰( E = โˆ‘%&' (

    ๐‘ค% โˆ’ ๐‘ข, ๐‘”% 5 is known a priori, by using only ๐œ‚ = ๐œ‚โˆ— โ‰ƒ " FGH ( I) 3 , O ๐‘…( E = ๐‘‚ " FGH ( ; + ๐œ‚๐‘‰( E โ‰ƒ ๐‘‚ ๐‘‰( E๐‘› log ๐‘‡ .
  78. Infeasible Ideal Tuning 80 If ๐‘‰( E = โˆ‘%&' (

    ๐‘ค% โˆ’ ๐‘ข, ๐‘”% 5 is known a priori, by using only ๐œ‚ = ๐œ‚โˆ— โ‰ƒ " FGH ( I) 3 , O ๐‘…( E = ๐‘‚ " FGH ( ; + ๐œ‚๐‘‰( E โ‰ƒ ๐‘‚ ๐‘‰( E๐‘› log ๐‘‡ . If it turns out that all ๐‘“% are ๐›ผ-exp-concave, (informally,) โˆ‘%&' ( ๐‘“% ๐‘ค% โˆ’ ๐‘“%(๐‘ข) โ‰ค โˆ‘%&' ( ๐‘ค% โˆ’ ๐‘ข, ๐‘”% โˆ’ 1 5 โˆ‘%&' ( ๐‘ค% โˆ’ ๐‘ข, ๐‘”% 5 = O ๐‘…( E โˆ’ 1 5 ๐‘‰( E.
  79. Infeasible Ideal Tuning 81 If ๐‘‰( E = โˆ‘%&' (

    ๐‘ค% โˆ’ ๐‘ข, ๐‘”% 5 is known a priori, by using only ๐œ‚ = ๐œ‚โˆ— โ‰ƒ " FGH ( I) 3 , O ๐‘…( E = ๐‘‚ " FGH ( ; + ๐œ‚๐‘‰( E โ‰ƒ ๐‘‚ ๐‘‰( E๐‘› log ๐‘‡ . If it turns out that all ๐‘“% are ๐›ผ-exp-concave, (informally,) โˆ‘%&' ( ๐‘“% ๐‘ค% โˆ’ ๐‘“%(๐‘ข) โ‰ค โˆ‘%&' ( ๐‘ค% โˆ’ ๐‘ข, ๐‘”% โˆ’ 1 5 โˆ‘%&' ( ๐‘ค% โˆ’ ๐‘ข, ๐‘”% 5 = O ๐‘…( E โˆ’ 1 5 ๐‘‰( E. By self-bounding, regardless of the ๐‘‰( E value, O ๐‘…( E โˆ’ 1 5 ๐‘‰( E โ‰พ ๐‘‰( E๐‘› log ๐‘‡ โˆ’ ๐›ผ๐‘‰( E โ‰พ " 1 log ๐‘‡, achieving the same bound as ONS without using ๐›ผ.
  80. Infeasible Ideal Tuning 82 If ๐‘‰( E = โˆ‘%&' (

    ๐‘ค% โˆ’ ๐‘ข, ๐‘”% 5 is known a priori, by using only ๐œ‚ = ๐œ‚โˆ— โ‰ƒ " FGH ( I) 3 , O ๐‘…( E = ๐‘‚ " FGH ( ; + ๐œ‚๐‘‰( E โ‰ƒ ๐‘‚ ๐‘‰( E๐‘› log ๐‘‡ . If it turns out that all ๐‘“% are ๐›ผ-exp-concave, (informally,) โˆ‘%&' ( ๐‘“% ๐‘ค% โˆ’ ๐‘“%(๐‘ข) โ‰ค โˆ‘%&' ( ๐‘ค% โˆ’ ๐‘ข, ๐‘”% โˆ’ 1 5 โˆ‘%&' ( ๐‘ค% โˆ’ ๐‘ข, ๐‘”% 5 = O ๐‘…( E โˆ’ 1 5 ๐‘‰( E. By self-bounding, regardless of the ๐‘‰( E value, O ๐‘…( E โˆ’ 1 5 ๐‘‰( E โ‰พ ๐‘‰( E๐‘› log ๐‘‡ โˆ’ ๐›ผ๐‘‰( E โ‰พ " 1 log ๐‘‡, achieving the same bound as ONS without using ๐›ผ. However, ๐‘ฝ๐‘ป ๐’– is unknownโ€ฆ Use the fact that โ€ž ๐‘น๐‘ป ๐’– = ๐‘ถ ๐’ ๐ฅ๐จ๐  ๐‘ป ๐œผ + ๐œผ๐‘ฝ๐‘ป ๐’– holds for all ๐œผ โˆˆ ๐“–!
  81. Exploiting Multiple Learning Rates 83 Let ๐œ‚โˆ— โ‰ƒ " FGH

    ( I) 3 โ‰ฅ ' D@ ( be the unknown best learning rate. Recall ๐’ข = ๐œ‚C = 5(2 D@ ๐‘– = 0,1,2, โ€ฆ , ' 5 log ๐‘‡ โІ 0, ' D@ ( O ๐‘…( E โ‰พ " FGH ( ; + ๐œ‚๐‘‰( E holds for all ๐œ‚ โˆˆ ๐’ข).
  82. Exploiting Multiple Learning Rates 84 Let ๐œ‚โˆ— โ‰ƒ " FGH

    ( I) 3 โ‰ฅ ' D@ ( be the unknown best learning rate. If ๐œ‚โˆ— โˆˆ ' D@ ( , ' D@ , there exists ๐œ‚ โˆˆ ๐’ข s.t. ๐œ‚โˆ— โˆˆ ; 5 , ๐œ‚ , hence O ๐‘…( E โ‰พ " FGH ( ; + ๐œ‚๐‘‰( E โ‰ค " FGH ( ;โˆ— + 2๐œ‚โˆ—๐‘‰( E โ‰ƒ ๐‘‰( E๐‘› log ๐‘‡. Recall ๐’ข = ๐œ‚C = 5(2 D@ ๐‘– = 0,1,2, โ€ฆ , ' 5 log ๐‘‡ โІ 0, ' D@ ( O ๐‘…( E โ‰พ " FGH ( ; + ๐œ‚๐‘‰( E holds for all ๐œ‚ โˆˆ ๐’ข).
  83. Exploiting Multiple Learning Rates 85 Let ๐œ‚โˆ— โ‰ƒ " FGH

    ( I) 3 โ‰ฅ ' D@ ( be the unknown best learning rate. Recall ๐’ข = ๐œ‚C = 5(2 D@ ๐‘– = 0,1,2, โ€ฆ , ' 5 log ๐‘‡ โІ 0, ' D@ ( O ๐‘…( E โ‰พ " FGH ( ; + ๐œ‚๐‘‰( E holds for all ๐œ‚ โˆˆ ๐’ข). If ๐œ‚โˆ— โˆˆ ' D@ ( , ' D@ , there exists ๐œ‚ โˆˆ ๐’ข s.t. ๐œ‚โˆ— โˆˆ ; 5 , ๐œ‚ , hence O ๐‘…( E โ‰พ " FGH ( ; + ๐œ‚๐‘‰( E โ‰ค " FGH ( ;โˆ— + 2๐œ‚โˆ—๐‘‰( E โ‰ƒ ๐‘‰( E๐‘› log ๐‘‡. If ๐œ‚โˆ— โ‰ƒ " FGH ( I) 3 โ‰ฅ ' D@ , we have ๐‘‰( E โ‰พ ๐บ5๐‘› log ๐‘‡. Thus, for ๐œ‚ = ๐œ‚9 = ' D@ , O ๐‘…( E โ‰ƒ " FGH ( ;* + ๐œ‚9๐‘‰( E โ‰พ ๐‘›๐บ log ๐‘‡. In any case, O ๐‘…( E = ๐‘‚ ๐‘‰( E๐‘› log ๐‘‡ + ๐‘›๐บ log ๐‘‡ , implying โˆ‘%&' ( ๐‘“% ๐‘ค% โˆ’ ๐‘“%(๐‘ข) โ‰ค O ๐‘…( E โ‰พ ๐‘› ' 1 + ๐บ log ๐‘‡.
  84. Learning with Suboptimal Actions 87 For ๐‘ก = 1, โ€ฆ

    , ๐‘‡: ๐Ÿ‘ฉ๐Ÿฆฐ Agent ๐Ÿค– Learner Learner makes prediction ฬ‚ ๐‘% โˆˆ ฮ˜ of ๐‘โˆ—. Agent faces ๐‘‹% and takes possibly suboptimal action ๐‘ฅ% โˆˆ ๐‘‹% . Observes ๐‘‹%, ๐‘ฅ% and updates from ฬ‚ ๐‘% to ฬ‚ ๐‘%,' . Define: โ€ข O ๐‘…( -โˆ— โ‰” โˆ‘%&' ( ฬ‚ ๐‘% โˆ’ ๐‘โˆ—, J ๐‘ฅ% โˆ’ ๐‘ฅ% . โ€ข ๐‘‰( -โˆ— โ‰” โˆ‘%&' ( ฬ‚ ๐‘% โˆ’ ๐‘โˆ—, J ๐‘ฅ% โˆ’ ๐‘ฅ% 5 . โ€ข ฮ”( โ‰” โˆ‘%&' ( max ๐‘โˆ—, ๐‘ฅ ๐‘ฅ โˆˆ ๐‘‹% โˆ’ ๐‘โˆ—, ๐‘ฅ% (cumulative suboptimality).
  85. Learning with Suboptimal Actions 88 รผ Sublinear in ฮ”( (cf.

    corruption-robustness in bandits). รผ Recovers O ๐‘…( -โˆ— = ๐‘‚ ๐‘› log ๐‘‡ when ฮ”( = 0. Theorem For ฬ‚ ๐‘', โ€ฆ , ฬ‚ ๐‘( โˆˆ ฮ˜ computed by MetaGrad with feedback subgradients J ๐‘ฅ% โˆ’ ๐‘ฅ% , it holds that ๐‘…( -โˆ— โ‰ค O ๐‘…( -โˆ— = ๐‘‚ ๐‘› log ๐‘‡ + ๐‘›ฮ”(log ๐‘‡ .
  86. Learning with Suboptimal Actions 89 รผ Sublinear in ฮ”( (cf.

    corruption-robustness in bandits). รผ Recovers O ๐‘…( -โˆ— = ๐‘‚ ๐‘› log ๐‘‡ when ฮ”( = 0. Proof sketch Theorem For ฬ‚ ๐‘', โ€ฆ , ฬ‚ ๐‘( โˆˆ ฮ˜ computed by MetaGrad with feedback subgradients J ๐‘ฅ% โˆ’ ๐‘ฅ% , it holds that ๐‘…( -โˆ— โ‰ค O ๐‘…( -โˆ— = ๐‘‚ ๐‘› log ๐‘‡ + ๐‘›ฮ”(log ๐‘‡ . By the same discussion as the regret analysis of MetaGrad, O ๐‘…( -โˆ— = โˆ‘%&' ( ฬ‚ ๐‘% โˆ’ ๐‘โˆ—, J ๐‘ฅ% โˆ’ ๐‘ฅ% โ‰ฒ ๐‘‚ ๐‘‰( -โˆ— ๐‘› log ๐‘‡ + ๐‘› log ๐‘‡ . Also, ๐‘‰( -โˆ— = โˆ‘%&' ( ฬ‚ ๐‘% โˆ’ ๐‘โˆ—, J ๐‘ฅ% โˆ’ ๐‘ฅ% 5 โ‰ค O ๐‘…( -โˆ— + 2ฮ”( holds (cf. ๐‘‰( -โˆ— โ‰ค O ๐‘…( -โˆ— if every ๐‘ฅ% is optimal). The claim follows from the sub-additivity of ๐‘ฅ โ†ฆ ๐‘ฅ and self-bounding.
  87. ๐›€(๐’) Lower Bound 91 Focus on the regret ๐‘…( -โˆ—

    = โˆ‘%&' ( ๐‘โˆ—, ๐‘ฅ% โˆ’ J ๐‘ฅ% . Theorem For any possibly randomized learner, there is an instance such that ๐‘…( -โˆ— = ฮฉ ๐‘› .
  88. ๐›€(๐’) Lower Bound 92 Focus on the regret ๐‘…( -โˆ—

    = โˆ‘%&' ( ๐‘โˆ—, ๐‘ฅ% โˆ’ J ๐‘ฅ% . Theorem For any possibly randomized learner, there is an instance such that ๐‘…( -โˆ— = ฮฉ ๐‘› . Intuition Since ๐‘โˆ— โˆˆ โ„" is unknown, if elements of ๐‘โˆ— are drawn at random and ๐‘‹', โ€ฆ , ๐‘‹" are restricted to line segments, any deterministic learner makes mistakes ฮฉ(๐‘›) times in expectation. Thanks to Yaoโ€™s minimax principle, for any randomized learner, there is the worst-case instance such that the ฮฉ(๐‘›) regert is inevitable.
  89. ๐›€(๐’) Lower Bound 93 Focus on the regret ๐‘…( -โˆ—

    = โˆ‘%&' ( ๐‘โˆ—, ๐‘ฅ% โˆ’ J ๐‘ฅ% . Theorem For any possibly randomized learner, there is an instance such that ๐‘…( -โˆ— = ฮฉ ๐‘› . Intuition Since ๐‘โˆ— โˆˆ โ„" is unknown, if elements of ๐‘โˆ— are drawn at random and ๐‘‹', โ€ฆ , ๐‘‹" are restricted to line segments, any deterministic learner makes mistakes ฮฉ(๐‘›) times in expectation. Thanks to Yaoโ€™s minimax principle, for any randomized learner, there is the worst-case instance such that the ฮฉ(๐‘›) regert is inevitable. Can the ๐ฅ๐จ๐  ๐‘ป in the upper bound removed?
  90. Revisiting Cone-Based Approach 94 Assume โ€ข ฮ˜ = ๐œƒ โˆˆ

    โ„" ๐œƒ = 1 = ๐•Š"0' and diam ๐‘‹% โ‰ค 1. โ€ข ๐‘ฅ% and J ๐‘ฅ% are optimal for ๐‘โˆ— and ฬ‚ ๐‘% , respectively.
  91. Revisiting Cone-Based Approach 95 Assume โ€ข ฮ˜ = ๐œƒ โˆˆ

    โ„" ๐œƒ = 1 = ๐•Š"0' and diam ๐‘‹% โ‰ค 1. โ€ข ๐‘ฅ% and J ๐‘ฅ% are optimal for ๐‘โˆ— and ฬ‚ ๐‘% , respectively. Lemma (Besbes et al. 2023) Let ๐œƒ(๐‘โˆ—, ฬ‚ ๐‘%) be the angle between ๐‘โˆ—, ฬ‚ ๐‘% โˆˆ ๐•Š"0'. If ๐œƒ ๐‘โˆ—, ฬ‚ ๐‘% โ‰ค Q 5 , we have ๐‘โˆ—, ๐‘ฅ% โˆ’ J ๐‘ฅ% โ‰ค cos ๐œƒ(๐‘โˆ—, ๐‘ฅ% โˆ’ J ๐‘ฅ%) โ‰ค sin ๐œƒ(๐‘โˆ—, ฬ‚ ๐‘%).
  92. Revisiting Cone-Based Approach 96 Assume โ€ข ฮ˜ = ๐œƒ โˆˆ

    โ„" ๐œƒ = 1 = ๐•Š"0' and diam ๐‘‹% โ‰ค 1. โ€ข ๐‘ฅ% and J ๐‘ฅ% are optimal for ๐‘โˆ— and ฬ‚ ๐‘% , respectively. Lemma (Besbes et al. 2023) Let ๐œƒ(๐‘โˆ—, ฬ‚ ๐‘%) be the angle between ๐‘โˆ—, ฬ‚ ๐‘% โˆˆ ๐•Š"0'. If ๐œƒ ๐‘โˆ—, ฬ‚ ๐‘% โ‰ค Q 5 , we have ๐‘โˆ—, ๐‘ฅ% โˆ’ J ๐‘ฅ% โ‰ค cos ๐œƒ(๐‘โˆ—, ๐‘ฅ% โˆ’ J ๐‘ฅ%) โ‰ค sin ๐œƒ(๐‘โˆ—, ฬ‚ ๐‘%). ฬ‚ ๐‘! ๐‘โˆ— ๐‘โˆ—, ๐‘ฅ% โˆ’ J ๐‘ฅ% โ‰ฅ 0 and ฬ‚ ๐‘%, ๐‘ฅ% โˆ’ J ๐‘ฅ% โ‰ค 0 must hold by the assumption. ๐‘ฅ! โˆ’ ' ๐‘ฅ! must lie here ฬ‚ ๐‘!, ๐‘ฅ! โˆ’ 4 ๐‘ฅ! โ‰ฅ 0 ๐œƒ ๐‘โˆ—, ฬ‚ ๐‘! โ‰ค ๐œ‹ 2 ๐‘โˆ—, ๐‘ฅ! โˆ’ 4 ๐‘ฅ! โ‰ฅ 0 Therefore, cos ๐œƒ(๐‘โˆ—, ๐‘ฅ% โˆ’ J ๐‘ฅ%) โ‰ค cos Q 5 โˆ’ ๐œƒ ๐‘โˆ—, ฬ‚ ๐‘% = sin ๐œƒ(๐‘โˆ—, ฬ‚ ๐‘%).
  93. An ๐‘ถ(๐Ÿ)-Regret Algorithm for ๐’ = ๐Ÿ 97 Independent of

    ๐‘‡ (but extending to ๐‘› > 2 seems challenging, as discussed later). Theorem Algorithm 1 achieves ๐”ผ ๐‘…( -โˆ— = 2๐œ‹. ๐’ฉ% โ‰” ๐‘ โˆˆ ๐•Š' ๐‘, ๐‘ฅ% โˆ’ ๐‘ฅ โ‰ฅ 0 โˆ€๐‘ฅ โˆˆ ๐‘‹% is the normal cone of ๐‘‹% at ๐‘ฅ% . ๐’ž% is the region such that ๐‘โˆ— โˆˆ ๐’ž% does not contradict โ€œ๐‘ฅR โˆˆ arg max )โˆˆ+4 ๐‘โˆ—, ๐‘ฅ for ๐‘  = 1, โ€ฆ , ๐‘ก โˆ’ 1.โ€
  94. Proof 98 ๐”ผ ๐‘โˆ—, ๐‘ฅ% โˆ’ J ๐‘ฅ% = Pr

    ๐’ž% โˆ– int ๐’ฉ% ๐”ผ ๐‘โˆ—, ๐‘ฅ% โˆ’ J ๐‘ฅ% | ๐’ž% โˆ– int ๐’ฉ% Focus on round ๐‘ก. If ฬ‚ ๐‘% โˆˆ int(๐’ฉ%), J ๐‘ฅ% = ๐‘ฅ% and hence ๐‘โˆ—, ๐‘ฅ% โˆ’ J ๐‘ฅ% = 0. Taking expectation of drawing ฬ‚ ๐‘% โˆˆ ๐’ž% , = :(๐’ž"โˆ–UV? ๐’ฉ " ) :(๐’ž") ๐”ผ ๐‘โˆ—, ๐‘ฅ% โˆ’ J ๐‘ฅ% | ๐’ž% โˆ– int ๐’ฉ% , where ๐ด(โ‹…) denotes the arc length (= central angle).
  95. Proof 99 Since ๐’ž%,' โ† ๐’ž% โˆฉ ๐’ฉ% , Hence

    ๐”ผ ๐‘โˆ—, ๐‘ฅ% โˆ’ J ๐‘ฅ% โ‰ค ๐ด(๐’ž% โˆ– int ๐’ฉ% ) in any case. ๐”ผ ๐‘…( -โˆ— = โˆ‘%&' ( ๐”ผ ๐‘โˆ—, ๐‘ฅ% โˆ’ J ๐‘ฅ% โ‰ค โˆ‘%&' ( ๐ด ๐’ž% โˆ– int ๐’ฉ% โ‰ค 2๐œ‹. If ๐ด ๐’ž% โ‰ฅ Q 5 , ๐”ผ ๐‘โˆ—, ๐‘ฅ% โˆ’ J ๐‘ฅ% = :(๐’ž"โˆ–UV? ๐’ฉ " ) :(๐’ž") ๐”ผ ๐‘โˆ—, ๐‘ฅ% โˆ’ J ๐‘ฅ% | ๐’ž% โˆ– int ๐’ฉ% If ๐ด ๐’ž% < Q 5 , ๐”ผ ๐‘โˆ—, ๐‘ฅ% โˆ’ J ๐‘ฅ% = :(๐’ž"โˆ–UV? ๐’ฉ " ) :(๐’ž") ๐”ผ ๐‘โˆ—, ๐‘ฅ% โˆ’ J ๐‘ฅ% | ๐’ž% โˆ– int ๐’ฉ% โ‰ค : ๐’ž"โˆ–UV? ๐’ฉ " : ๐’ž" sin ๐œƒ ๐‘โˆ—, ฬ‚ ๐‘% โ‰ค :(๐’ž"โˆ–UV? ๐’ฉ " ) :(๐’ž") sin ๐ด(๐’ž%) โ‰ค ๐ด(๐’ž% โˆ– int ๐’ฉ% ). โ‰ค 5 Q โ‹… 1 โ‹… ๐ด(๐’ž% โˆ– int ๐’ฉ% ) โ‰ค ๐ด(๐’ž% โˆ– int ๐’ฉ% ).
  96. Conclusion 100 โ€ข ๐‘…( -โˆ— = ๐‘‚ ๐‘› log ๐‘‡

    + ๐‘›ฮ”(log ๐‘‡ by ONS. โ€ข O ๐‘…( -โˆ— = ๐‘‚ ๐‘› log ๐‘‡ + ๐‘›ฮ”(log ๐‘‡ by MetaGrad for possibly suboptimal case. โ€ข ๐‘…( -โˆ— = ฮฉ ๐‘› . โ€ข ๐‘…( -โˆ— = ๐‘‚ 1 for ๐‘› = 2. Future work โ€ข Tight analysis for general ๐‘›. โ€“ Difficulty: sin ๐œƒ ๐‘โˆ—, ฬ‚ ๐‘! โ‰ค sin ๐ด(๐’ž! ) no longer holds. โ€ข Exploring other online-learning ideas useful for inverse optimization.