Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Online Inverse Linear Optimization

Sponsored · Your Podcast. Everywhere. Effortlessly. Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
Avatar for Shinsaku Sakaue Shinsaku Sakaue
March 06, 2025
260

Online Inverse Linear Optimization

Mar. 7, 2025, at Kyoto University.
https://www.ml.ist.i.kyoto-u.ac.jp/kiss

Avatar for Shinsaku Sakaue

Shinsaku Sakaue

March 06, 2025
Tweet

Transcript

  1. Online Inverse Linear Optimization Mar. 7, 2025 @ KyotoU, KISS

    Shinsaku Sakaue (Univ. Tokyo, RIKEN AIP) Joint work with Taira Tsuchiya (Univ. Tokyo, RIKEN AIP), Han Bao (Kyoto Univ.), Taihei Oki (Hokkaido Univ.) Preprint: https://arxiv.org/abs/2501.14349
  2. Forward Optimization 2 maximize 𝑓!(𝑥) subject to 𝑥 ∈ 𝑋

    Forward optimization: given 𝜃, find optimal solution 𝑥. https://www.geothermalnextgeneration.com/updates/seismic-tomography-a-cat-scan-of-the-earth https://en.wikipedia.org/wiki/Consumer_behaviour Seismic tomography 𝜃 = geological features of certain zones Seismic wave trajectories are modeled as shortest path problems • Decision variable 𝑥 ∈ ℝ" • Constraint 𝑋 ⊆ ℝ" (known) • Model parameter 𝜃 ∈ ℝ# Customer behavior 𝜃 = customer’s preference Purchase behavior is modeled as utility maximization
  3. Inverse Optimization 3 maximize 𝑓!(𝑥) subject to 𝑥 ∈ 𝑋

    Inverse optimization: estimate 𝜃 from optimal solution 𝑥. Forward optimization: given 𝜃, find optimal solution 𝑥. • Decision variable 𝑥 ∈ ℝ" • Constraint 𝑋 ⊆ ℝ" (known) • Model parameter 𝜃 ∈ ℝ# Seismic tomography Estimate geological features 𝜃 from observed waves 𝑥 Estimate customer’s preference 𝜃 from purchase behavior 𝑥 Customer behavior https://www.geothermalnextgeneration.com/updates/seismic-tomography-a-cat-scan-of-the-earth https://en.wikipedia.org/wiki/Consumer_behaviour
  4. Linear Optimization (forward model in this talk) 4 For 𝑡

    = 1, … , 𝑇, an agent solves maximize 𝑐∗, 𝑥 subject to 𝑥 ∈ 𝑋% . Budget = 2 𝑐∗ = 0.2 0.4 0.1 0.0 0.3 👩🦰 Agent • 𝑐∗ ∈ ℝ" is the agent’s internal objective vector. • 𝑐∗ lies in a convex set Θ ⊂ ℝ" with diam Θ = 1 (known to the learner). • 𝑋% ⊆ ℝ" is the agent’s 𝑡th action set with diam 𝑋% = 1. – not necessarily convex, but we assume an oracle to solve linear optimization on 𝑋! . • Let 𝑥% ∈ arg max 𝑐∗, 𝑥 𝑥 ∈ 𝑋% .
  5. Linear Optimization (forward model in this talk) 5 For 𝑡

    = 1, … , 𝑇, an agent solves maximize 𝑐∗, 𝑥 subject to 𝑥 ∈ 𝑋% . 𝑋% Sold out 𝑐∗ = 0.2 0.4 0.1 0.0 0.3 👩🦰 Agent Budget = 2 • 𝑐∗ ∈ ℝ" is the agent’s internal objective vector. • 𝑐∗ lies in a convex set Θ ⊂ ℝ" with diam Θ = 1 (known to the learner). • 𝑋% ⊆ ℝ" is the agent’s 𝑡th action set with diam 𝑋% = 1. – not necessarily convex, but we assume an oracle to solve linear optimization on 𝑋! . • Let 𝑥% ∈ arg max 𝑐∗, 𝑥 𝑥 ∈ 𝑋% .
  6. Linear Optimization (forward model in this talk) 6 For 𝑡

    = 1, … , 𝑇, an agent solves maximize 𝑐∗, 𝑥 subject to 𝑥 ∈ 𝑋% . 𝑋% Sold out 𝑥% = 1 0 0 0 1 𝑐∗ = 0.2 0.4 0.1 0.0 0.3 👩🦰 Agent Budget = 2 • 𝑐∗ ∈ ℝ" is the agent’s internal objective vector. • 𝑐∗ lies in a convex set Θ ⊂ ℝ" with diam Θ = 1 (known to the learner). • 𝑋% ⊆ ℝ" is the agent’s 𝑡th action set with diam 𝑋% = 1. – not necessarily convex, but we assume an oracle to solve linear optimization on 𝑋! . • Let 𝑥% ∈ arg max 𝑐∗, 𝑥 𝑥 ∈ 𝑋% .
  7. Inverse Linear Optimization 7 For 𝑡 = 1, … ,

    𝑇, an agent solves 𝑋% Sold out 𝑥% = 1 0 0 0 1 𝑐∗ = 0.2 0.4 0.1 0.0 0.3 👩🦰 Agent 🤖 Learner maximize 𝑐∗, 𝑥 subject to 𝑥 ∈ 𝑋% . Budget = 2 A learner aims to infer 𝑐∗ from 𝑋%, 𝑥% %&' ( .
  8. Online Inverse Linear Optimization 8 For 𝑡 = 1, …

    , 𝑇: 👩🦰 Agent 🤖 Learner Bärmann et al. 2017
  9. Online Inverse Linear Optimization 9 For 𝑡 = 1, …

    , 𝑇: 👩🦰 Agent 🤖 Learner Learner makes prediction ̂ 𝑐% ∈ Θ of 𝑐∗. Bärmann et al. 2017
  10. Online Inverse Linear Optimization 10 For 𝑡 = 1, …

    , 𝑇: 👩🦰 Agent 🤖 Learner Learner makes prediction ̂ 𝑐% ∈ Θ of 𝑐∗. Agent faces 𝑋% and takes action 𝑥% ∈ argmax )∈+" 𝑐∗, 𝑥 . Bärmann et al. 2017
  11. Online Inverse Linear Optimization 11 For 𝑡 = 1, …

    , 𝑇: 👩🦰 Agent 🤖 Learner Learner makes prediction ̂ 𝑐% ∈ Θ of 𝑐∗. Agent faces 𝑋% and takes action 𝑥% ∈ argmax )∈+" 𝑐∗, 𝑥 . Observes 𝑋%, 𝑥% and updates from ̂ 𝑐% to ̂ 𝑐%,' . Bärmann et al. 2017
  12. Online Inverse Linear Optimization 12 For 𝑡 = 1, …

    , 𝑇: 👩🦰 Agent 🤖 Learner Learner makes prediction ̂ 𝑐% ∈ Θ of 𝑐∗. Agent faces 𝑋% and takes action 𝑥% ∈ argmax )∈+" 𝑐∗, 𝑥 . Let J 𝑥% ∈ arg max ̂ 𝑐%, 𝑥 𝑥 ∈ 𝑋% and define regret 𝑅( -∗ as 𝑅( -∗ ≔ ∑%&' ( 𝑐∗, 𝑥% − J 𝑥% = ∑%&' ( 𝑐∗, 𝑥% − 𝑐∗, J 𝑥% . Observes 𝑋%, 𝑥% and updates from ̂ 𝑐% to ̂ 𝑐%,' . Bärmann et al. 2017
  13. Online Inverse Linear Optimization 13 For 𝑡 = 1, …

    , 𝑇: 👩🦰 Agent 🤖 Learner Learner makes prediction ̂ 𝑐% ∈ Θ of 𝑐∗. Agent faces 𝑋% and takes action 𝑥% ∈ argmax )∈+" 𝑐∗, 𝑥 . Let J 𝑥% ∈ arg max ̂ 𝑐%, 𝑥 𝑥 ∈ 𝑋% and define regret 𝑅( -∗ as 𝑅( -∗ ≔ ∑%&' ( 𝑐∗, 𝑥% − J 𝑥% = ∑%&' ( 𝑐∗, 𝑥% − 𝑐∗, J 𝑥% . Optimal objective value Objective value achieved by following learner’s prediction ̂ 𝑐! Observes 𝑋%, 𝑥% and updates from ̂ 𝑐% to ̂ 𝑐%,' . • 𝑅( -∗ measures the quality of actions suggested by predictions. • 𝑅( -∗ is non-negative; 𝑅( -∗ = 0 if 𝑐∗ = ̂ 𝑐% for all 𝑡; smaller is better. Bärmann et al. 2017
  14. Convenient Upper Bound on Regret 14 Bärmann et al. 2017

    For ̂ 𝑐% , J 𝑥% ∈ arg max ̂ 𝑐%, 𝑥 𝑥 ∈ 𝑋% , 𝑐∗, and 𝑥% ∈ arg max 𝑐∗, 𝑥 𝑥 ∈ 𝑋% , define O 𝑅( -∗ ≔ ∑%&' ( ̂ 𝑐% − 𝑐∗, J 𝑥% − 𝑥% .
  15. Convenient Upper Bound on Regret 15 Bärmann et al. 2017

    For ̂ 𝑐% , J 𝑥% ∈ arg max ̂ 𝑐%, 𝑥 𝑥 ∈ 𝑋% , 𝑐∗, and 𝑥% ∈ arg max 𝑐∗, 𝑥 𝑥 ∈ 𝑋% , define hence O 𝑅( -∗ ≥ 𝑅( -∗ . O 𝑅( -∗ ≔ ∑%&' ( ̂ 𝑐% − 𝑐∗, J 𝑥% − 𝑥% . O 𝑅( -∗ = ∑%&' ( ̂ 𝑐%, J 𝑥% − 𝑥% + ∑%&' ( 𝑐∗, 𝑥% − J 𝑥% = ∑%&' ( ̂ 𝑐%, J 𝑥% − 𝑥% + 𝑅( -∗ , = regret 𝑅" #∗ ≥ 0 as ' 𝑥! is optimal for ̂ 𝑐!
  16. Convenient Upper Bound on Regret 16 Bärmann et al. 2017

    For ̂ 𝑐% , J 𝑥% ∈ arg max ̂ 𝑐%, 𝑥 𝑥 ∈ 𝑋% , 𝑐∗, and 𝑥% ∈ arg max 𝑐∗, 𝑥 𝑥 ∈ 𝑋% , define hence O 𝑅( -∗ ≥ 𝑅( -∗ . O 𝑅( -∗ ≔ ∑%&' ( ̂ 𝑐% − 𝑐∗, J 𝑥% − 𝑥% . O 𝑅( -∗ = ∑%&' ( ̂ 𝑐%, J 𝑥% − 𝑥% + ∑%&' ( 𝑐∗, 𝑥% − J 𝑥% = ∑%&' ( ̂ 𝑐%, J 𝑥% − 𝑥% + 𝑅( -∗ , = regret 𝑅" #∗ ≥ 0 as ' 𝑥! is optimal for ̂ 𝑐! What is the additional term? ̂ 𝑐%, J 𝑥% − 𝑥% = max ̂ 𝑐%, 𝑥 𝑥 ∈ 𝑋% − ̂ 𝑐%, 𝑥% .
  17. Convenient Upper Bound on Regret 17 Bärmann et al. 2017

    For ̂ 𝑐% , J 𝑥% ∈ arg max ̂ 𝑐%, 𝑥 𝑥 ∈ 𝑋% , 𝑐∗, and 𝑥% ∈ arg max 𝑐∗, 𝑥 𝑥 ∈ 𝑋% , define hence O 𝑅( -∗ ≥ 𝑅( -∗ . O 𝑅( -∗ ≔ ∑%&' ( ̂ 𝑐% − 𝑐∗, J 𝑥% − 𝑥% . O 𝑅( -∗ = ∑%&' ( ̂ 𝑐%, J 𝑥% − 𝑥% + ∑%&' ( 𝑐∗, 𝑥% − J 𝑥% = ∑%&' ( ̂ 𝑐%, J 𝑥% − 𝑥% + 𝑅( -∗ , = regret 𝑅" #∗ ≥ 0 as ' 𝑥! is optimal for ̂ 𝑐! What is the additional term? ̂ 𝑐%, J 𝑥% − 𝑥% = max ̂ 𝑐%, 𝑥 𝑥 ∈ 𝑋% − ̂ 𝑐%, 𝑥% . Optimal value for ̂ 𝑐! Objective value achieved by 𝑥! for ̂ 𝑐! • Takes zero if 𝑐∗ = ̂ 𝑐% ; quantifies how well ̂ 𝑐% explains the agent’s choice 𝑥% . • Called the suboptimality loss in inverse optimization (Mohajerin Esfahani et al. 2018). • This alone is sometimes meaningless: ̂ 𝑐% = 0 trivially attains the zero suboptimality loss.
  18. Related Work: Online Learning Approach 18 Consider making the upper

    bound O 𝑅( -∗ = ∑%&' ( ̂ 𝑐% − 𝑐∗, J 𝑥% − 𝑥% as small as possible. Bärmann et al. 2017
  19. Related Work: Online Learning Approach 19 Consider making the upper

    bound O 𝑅( -∗ = ∑%&' ( ̂ 𝑐% − 𝑐∗, J 𝑥% − 𝑥% as small as possible. Regarding 𝑓%: Θ ∋ 𝑐 ↦ 𝑐, J 𝑥% − 𝑥% as a linear cost function, O 𝑅( -∗ = ∑%&' ( ̂ 𝑐%, J 𝑥% − 𝑥% − ∑%&' ( 𝑐∗, J 𝑥% − 𝑥% = ∑%&' ( 𝑓%( ̂ 𝑐%) − ∑%&' ( 𝑓%(𝑐∗). The standard regret in online learning. Bärmann et al. 2017
  20. Related Work: Online Learning Approach 20 Consider making the upper

    bound O 𝑅( -∗ = ∑%&' ( ̂ 𝑐% − 𝑐∗, J 𝑥% − 𝑥% as small as possible. The standard regret in online learning. By using online linear optimization methods (e.g., OGD) to compute ̂ 𝑐% , we obtain 𝑅( -∗ ≤ O 𝑅( -∗ = ∑%&' ( ̂ 𝑐%, J 𝑥% − 𝑥% + ∑%&' ( 𝑐∗, 𝑥% − J 𝑥% = 𝑂( 𝑇), achieving a vanishing regret (and cumulative suboptimality loss) on average as 𝑇 → ∞. Regarding 𝑓%: Θ ∋ 𝑐 ↦ 𝑐, J 𝑥% − 𝑥% as a linear cost function, O 𝑅( -∗ = ∑%&' ( ̂ 𝑐%, J 𝑥% − 𝑥% − ∑%&' ( 𝑐∗, J 𝑥% − 𝑥% = ∑%&' ( 𝑓%( ̂ 𝑐%) − ∑%&' ( 𝑓%(𝑐∗). Bärmann et al. 2017
  21. Related Work: Online Learning Approach 21 Consider making the upper

    bound O 𝑅( -∗ = ∑%&' ( ̂ 𝑐% − 𝑐∗, J 𝑥% − 𝑥% as small as possible. The standard regret in online learning. By using online linear optimization methods (e.g., OGD) to compute ̂ 𝑐% , we obtain 𝑅( -∗ ≤ O 𝑅( -∗ = ∑%&' ( ̂ 𝑐%, J 𝑥% − 𝑥% + ∑%&' ( 𝑐∗, 𝑥% − J 𝑥% = 𝑂( 𝑇), achieving a vanishing regret (and cumulative suboptimality loss) on average as 𝑇 → ∞. The rate of 𝑇 is optimal in general OLO. Is it also optimal in online inverse linear optimization? Regarding 𝑓%: Θ ∋ 𝑐 ↦ 𝑐, J 𝑥% − 𝑥% as a linear cost function, O 𝑅( -∗ = ∑%&' ( ̂ 𝑐%, J 𝑥% − 𝑥% − ∑%&' ( 𝑐∗, J 𝑥% − 𝑥% = ∑%&' ( 𝑓%( ̂ 𝑐%) − ∑%&' ( 𝑓%(𝑐∗). Bärmann et al. 2017
  22. Related Work: Ellipsoid Based Method 22 There is a method

    achieving 𝑅( -∗ = 𝑂(𝑛. log 𝑇), going beyond the limit of OLO! Besbes et al. 2023
  23. Related Work: Ellipsoid Based Method 23 There is a method

    achieving 𝑅( -∗ = 𝑂(𝑛. log 𝑇), going beyond the limit of OLO! Besbes et al. 2023 High-level Idea Maintain a cone 𝒞% representing possible existence of 𝑐∗. After observing (𝑋%, 𝑥%), 𝒞% can be narrowed down: 𝒞%,' ← 𝒞% ∩ 𝑐 ∈ Θ | 𝑐, 𝑥% ≥ 𝑐, 𝑥 for all 𝑥 ∈ 𝑋% . Normal cone of 𝑋! at 𝑥! : Cone of vectors that make 𝑥! optimal over 𝑋! .
  24. Related Work: Ellipsoid Based Method 24 There is a method

    achieving 𝑅( -∗ = 𝑂(𝑛. log 𝑇), going beyond the limit of OLO! Besbes et al. 2023 Figure 4 in Besbes, Fonseca, Lobel. Contextual Inverse Optimization: Offline and Online Learning (Oper. Res. 2023). High-level Idea Maintain a cone 𝒞% representing possible existence of 𝑐∗. After observing (𝑋%, 𝑥%), 𝒞% can be narrowed down: 𝒞%,' ← 𝒞% ∩ 𝑐 ∈ Θ | 𝑐, 𝑥% ≥ 𝑐, 𝑥 for all 𝑥 ∈ 𝑋% . Normal cone of 𝑋! at 𝑥! : Cone of vectors that make 𝑥! optimal over 𝑋! . Slightly inflate 𝒞% , making it an ellipsoidal cone. Based on the volume argument of the ellipsoid method for LPs, we can strike a good balance of exploration vs. exploitation.
  25. Related Work: Ellipsoid Based Method 25 There is a method

    achieving 𝑅( -∗ = 𝑂(𝑛. log 𝑇), going beyond the limit of OLO! Besbes et al. 2023 Figure 4 in Besbes, Fonseca, Lobel. Contextual Inverse Optimization: Offline and Online Learning (Oper. Res. 2023). High-level Idea Maintain a cone 𝒞% representing possible existence of 𝑐∗. After observing (𝑋%, 𝑥%), 𝒞% can be narrowed down: 𝒞%,' ← 𝒞% ∩ 𝑐 ∈ Θ | 𝑐, 𝑥% ≥ 𝑐, 𝑥 for all 𝑥 ∈ 𝑋% . Normal cone of 𝑋! at 𝑥! : Cone of vectors that make 𝑥! optimal over 𝑋! . But, there are downsides… • 𝑛. factor is prohibitive for large 𝑛. • Not very efficient, albeit polynomial in 𝑛 and 𝑇. Slightly inflate 𝒞% , making it an ellipsoidal cone. Based on the volume argument of the ellipsoid method for LPs, we can strike a good balance of exploration vs. exploitation.
  26. Our Results 26 Theorem There is a method achieving 𝑅(

    -∗ ≤ O 𝑅( -∗ = 𝑂(𝑛 log 𝑇). • Improving 𝑅( -∗ = 𝑂(𝑛. log 𝑇) of Besbes et al. (2023) by a factor of 𝑛/. • Applies to the upper bound O 𝑅( -∗ ≥ 𝑅( -∗ . • More efficient: based on the online Newton step (ONS), rather than the ellipsoid method.
  27. Our Results 27 Theorem There is a method achieving 𝑅(

    -∗ ≤ O 𝑅( -∗ = 𝑂(𝑛 log 𝑇). And more: • Dealing with suboptimal feedback 𝑥% with MetaGrad (ONS with multiple learning rates). • Lower bound of 𝑅( -∗ = Ω(𝑛), implying the tightness regarding 𝑛. • 𝑅( -∗ = 𝑂(1) for 𝑛 = 2 based on the method of Besbes et al. (2023). • Improving 𝑅( -∗ = 𝑂(𝑛. log 𝑇) of Besbes et al. (2023) by a factor of 𝑛/. • Applies to the upper bound O 𝑅( -∗ ≥ 𝑅( -∗ . • More efficient: based on the online Newton step (ONS), rather than the ellipsoid method.
  28. Online Convex Optimization 29 For 𝑡 = 1, … ,

    𝑇: 🤖 🌏 Learner Environment
  29. Online Convex Optimization 30 For 𝑡 = 1, … ,

    𝑇: Play ̂ 𝑐% 🤖 🌏 Learner Environment
  30. Online Convex Optimization 31 For 𝑡 = 1, … ,

    𝑇: Reveal 𝑓% Play ̂ 𝑐% 🤖 🌏 Learner Environment 𝑓% may be reactive to ̂ 𝑐%
  31. Online Convex Optimization 32 For 𝑡 = 1, … ,

    𝑇: Reveal 𝑓% Play ̂ 𝑐% 🤖 🌏 Learner Environment Incurs 𝑓%( ̂ 𝑐%) and computes ̂ 𝑐%,' 𝑓% may be reactive to ̂ 𝑐%
  32. Online Convex Optimization 33 • Learner’s domain Θ is convex,

    and diam Θ = 1. • Learner can use information up to the end of round 𝑡 when computing ̂ 𝑐%,' . • Loss function 𝑓%: Θ → ℝ is convex. For 𝑡 = 1, … , 𝑇: Reveal 𝑓% Play ̂ 𝑐% 🤖 🌏 Learner Environment 𝑓% may be reactive to ̂ 𝑐% Incurs 𝑓%( ̂ 𝑐%) and computes ̂ 𝑐%,'
  33. Online Convex Optimization 34 For 𝑡 = 1, … ,

    𝑇: Reveal 𝑓% Play ̂ 𝑐% 🤖 🌏 Learner Environment 𝑓% may be reactive to ̂ 𝑐% For any comparator 𝑐∗ ∈ Θ, the learner aims to make the regret as small as possible: ∑%&' ( 𝑓% ̂ 𝑐% − 𝑓%(𝑐∗) . • Learner’s domain Θ is convex, and diam Θ = 1. • Learner can use information up to the end of round 𝑡 when computing ̂ 𝑐%,' . • Loss function 𝑓%: Θ → ℝ is convex. Incurs 𝑓%( ̂ 𝑐%) and computes ̂ 𝑐%,'
  34. Exp-Concave Loss 35 Function 𝑓: Θ → ℝ is 𝛼-exp-concave

    for some 𝛼 > 0 if the following 𝑔: Θ → ℝ is concave: 𝑔: 𝑐 ↦ e012(-).
  35. Exp-Concave Loss 36 Function 𝑓: Θ → ℝ is 𝛼-exp-concave

    for some 𝛼 > 0 if the following 𝑔: Θ → ℝ is concave: 𝑔: 𝑐 ↦ e012(-). If 𝑓: Θ → ℝ is twice-differentiable, 𝛼-exp-concavity is equivalent to ∇5𝑓 𝑐 ≽ 𝛼∇𝑓 𝑐 ∇𝑓 𝑐 6 ∀𝑐 ∈ Θ. Cf. 𝛼-strong convexity requires ∇5𝑓 𝑐 ≽ 1 5 𝐼.
  36. Exp-Concave Loss 37 Function 𝑓: Θ → ℝ is 𝛼-exp-concave

    for some 𝛼 > 0 if the following 𝑔: Θ → ℝ is concave: 𝑔: 𝑐 ↦ e012(-). If 𝑓: Θ → ℝ is twice-differentiable, 𝛼-exp-concavity is equivalent to ∇5𝑓 𝑐 ≽ 𝛼∇𝑓 𝑐 ∇𝑓 𝑐 6 ∀𝑐 ∈ Θ. Cf. 𝛼-strong convexity requires ∇5𝑓 𝑐 ≽ 1 5 𝐼. Examples • 𝑓 𝑐 = −log(𝑟6𝑐) (appears in portfolio theory) satisfies ∇5𝑓 𝑐 = 77$ 7$- % = ∇𝑓 𝑐 ∇𝑓 𝑐 6. • 𝑓 𝑐 = 𝑟6𝑐 5 for 𝑟 , 𝑐 ≤ 1 (used later) satisfies ∇5𝑓 𝑐 = 2𝑟𝑟6 = ∇2 - ∇2 - $ 5 7 % - % ≽ ' 5 ∇𝑓 𝑐 ∇𝑓 𝑐 6.
  37. ONS Regret Bound 38 Set ̂ 𝑐' ∈ Θ. Fix

    𝛾, 𝜀 > 0 and let 𝐴9 = 𝜀𝐼. For 𝑡 = 1, … , 𝑇 Output ̂ 𝑐% and observe 𝑓% 𝐴% ← 𝐴%0' + ∇𝑓% ̂ 𝑐% ∇𝑓% ̂ 𝑐% 6 ̂ 𝑐%,' ← arg min ̂ 𝑐% − 𝛾0'𝐴% 0'∇𝑓% ̂ 𝑐% − 𝑐 :" 𝑐 ∈ Θ Assume: • 𝑓', … , 𝑓( are twice differentiable and 𝛼-exp-concave, • ∇𝑓% ̂ 𝑐% ≤ 𝐺 for all 𝑡 and 𝑐 ∈ Θ. Let 𝛾 = ' 5 min 1/𝐺, 𝛼 and 𝜀 = 1/𝛾5. Then, ̂ 𝑐', … , ̂ 𝑐( ∈ Θ computed by ONS satisfy For PSD 𝐴, 𝑥 : ≔ 𝑥6𝐴𝑥. ∑%&' ( 𝑓% ̂ 𝑐% − 𝑓%(𝑐∗) = 𝑂 𝑛 ' 1 + 𝐺 log 𝑇 . A well-known result. We’ll get back to this later Hazan et al. 2007
  38. Our ONS-based Method 39 Set ̂ 𝑐' ∈ Θ For

    𝑡 = 1, … , 𝑇 Output ̂ 𝑐% and observe (𝑋%, 𝑥%) Compute J 𝑥% ∈ arg max ̂ 𝑐%, 𝑥 𝑥 ∈ 𝑋% Get ̂ 𝑐%,' via ONS applied to 𝑓% ; For 𝜂 ∈ (0,1) (specified later), define 𝑓% ;: Θ → ℝ by 𝑓% ; 𝑐 ≔ −𝜂 ̂ 𝑐% − 𝑐, J 𝑥% − 𝑥% + 𝜂5 ̂ 𝑐% − 𝑐, J 𝑥% − 𝑥% 5. Theorem For ̂ 𝑐', … , ̂ 𝑐( ∈ Θ computed by the above method, it holds that 𝑅( -∗ ≤ O 𝑅( -∗ = 𝑂(𝑛 log 𝑇). Upper bound: * 𝑅" #∗ = ∑!$% " ̂ 𝑐! − 𝑐∗, ' 𝑥! − 𝑥! Regret: 𝑅" #∗ = ∑!$% " 𝑐∗, ' 𝑥! − 𝑥!
  39. Regret Analysis 40 𝑓% ; 𝑐 ≔ −𝜂 ̂ 𝑐%

    − 𝑐, J 𝑥% − 𝑥% + 𝜂5 ̂ 𝑐% − 𝑐, J 𝑥% − 𝑥% 5 with constant 𝜂 ∈ (0,1) enjoys Ω(1)-exp-concavity and ∇𝑓% ;( ̂ 𝑐%) = 𝑂(1) (by elementary calculation). ONS Regret Bound: ∑%&' ( 𝑓% ; ̂ 𝑐% − 𝑓% ; 𝑐∗ = 𝑂(𝑛 log 𝑇)
  40. Regret Analysis 41 𝑓% ; 𝑐 ≔ −𝜂 ̂ 𝑐%

    − 𝑐, J 𝑥% − 𝑥% + 𝜂5 ̂ 𝑐% − 𝑐, J 𝑥% − 𝑥% 5 with constant 𝜂 ∈ (0,1) enjoys Ω(1)-exp-concavity and ∇𝑓% ;( ̂ 𝑐%) = 𝑂(1) (by elementary calculation). ONS Regret Bound: ∑%&' ( 𝑓% ; ̂ 𝑐% − 𝑓% ; 𝑐∗ = 𝑂(𝑛 log 𝑇) Proof Define 𝑉( -∗ ≔ ∑%&' ( ̂ 𝑐% − 𝑐∗, J 𝑥% − 𝑥% 5, which satisfies 𝑉( -∗ ≤ ∑%&' ( ̂ 𝑐% − 𝑐∗ J 𝑥% − 𝑥% ̂ 𝑐% − 𝑐∗, J 𝑥% − 𝑥% ≤ O 𝑅( -∗ . ∵ ̂ 𝑐! − 𝑐∗, ' 𝑥! − 𝑥! ≥ 0 and Cauchy–Schwarz. ≤ 1 ≤ 1
  41. Regret Analysis 42 𝑓% ; 𝑐 ≔ −𝜂 ̂ 𝑐%

    − 𝑐, J 𝑥% − 𝑥% + 𝜂5 ̂ 𝑐% − 𝑐, J 𝑥% − 𝑥% 5 with constant 𝜂 ∈ (0,1) enjoys Ω(1)-exp-concavity and ∇𝑓% ;( ̂ 𝑐%) = 𝑂(1) (by elementary calculation). ONS Regret Bound: ∑%&' ( 𝑓% ; ̂ 𝑐% − 𝑓% ; 𝑐∗ = 𝑂(𝑛 log 𝑇) Hence 1 − 𝜂 O 𝑅( -∗ ≤ ' ; ∑%&' ( 𝑓% ; ̂ 𝑐% − 𝑓% ; 𝑐∗ = 𝑂(𝑛 log 𝑇), and setting 𝜂 = ' 5 completes the proof. Proof Define 𝑉( -∗ ≔ ∑%&' ( ̂ 𝑐% − 𝑐∗, J 𝑥% − 𝑥% 5, which satisfies 𝑉( -∗ ≤ ∑%&' ( ̂ 𝑐% − 𝑐∗ J 𝑥% − 𝑥% ̂ 𝑐% − 𝑐∗, J 𝑥% − 𝑥% ≤ O 𝑅( -∗ . ∵ ̂ 𝑐! − 𝑐∗, ' 𝑥! − 𝑥! ≥ 0 and Cauchy–Schwarz. O 𝑅( -∗ = ∑%&' ( ̂ 𝑐% − 𝑐∗, J 𝑥% − 𝑥% = − ' ; ∑% ( 𝑓% ; 𝑐∗ + 𝜂𝑉( -∗ ≤ ' ; ∑% ( 𝑓% ; ̂ 𝑐% − 𝑓% ; 𝑐∗ + 𝜂 O 𝑅( -∗ . 𝑓! ' ̂ 𝑐! = 0 = 𝑂(𝑛 log 𝑇) ≤ 1 ≤ 1
  42. Warm-up: Online Gradient Descent 44 Set ̂ 𝑐' ∈ Θ.

    Fix 𝜂 > 0. For 𝑡 = 1, … , 𝑇 Output ̂ 𝑐% and observe 𝑓% ̂ 𝑐%,' ← arg min ̂ 𝑐% − 𝜂∇𝑓% ̂ 𝑐% − 𝑐 𝑐 ∈ Θ
  43. Warm-up: Online Gradient Descent 45 Set ̂ 𝑐' ∈ Θ.

    Fix 𝜂 > 0. For 𝑡 = 1, … , 𝑇 Output ̂ 𝑐% and observe 𝑓% ̂ 𝑐%,' ← arg min ̂ 𝑐% − 𝜂∇𝑓% ̂ 𝑐% − 𝑐 𝑐 ∈ Θ From the update rule and the Pythagorean theorem, ̂ 𝑐%,' − 𝑐∗ 5 ≤ ̂ 𝑐% − 𝜂∇𝑓% ̂ 𝑐% − 𝑐∗ 5 = ̂ 𝑐% − 𝑐∗ 5 + 𝜂5 ∇𝑓% ̂ 𝑐% 5 − 2𝜂 ∇𝑓% ̂ 𝑐% , ̂ 𝑐% − 𝑐∗ .
  44. Warm-up: Online Gradient Descent 46 Set ̂ 𝑐' ∈ Θ.

    Fix 𝜂 > 0. For 𝑡 = 1, … , 𝑇 Output ̂ 𝑐% and observe 𝑓% ̂ 𝑐%,' ← arg min ̂ 𝑐% − 𝜂∇𝑓% ̂ 𝑐% − 𝑐 𝑐 ∈ Θ From the update rule and the Pythagorean theorem, ̂ 𝑐%,' − 𝑐∗ 5 ≤ ̂ 𝑐% − 𝜂∇𝑓% ̂ 𝑐% − 𝑐∗ 5 = ̂ 𝑐% − 𝑐∗ 5 + 𝜂5 ∇𝑓% ̂ 𝑐% 5 − 2𝜂 ∇𝑓% ̂ 𝑐% , ̂ 𝑐% − 𝑐∗ . Summing over 𝑡 and ignoring − ̂ 𝑐(,' − 𝑐∗ 5 ≤ 0, ∑%&' ( ∇𝑓% ̂ 𝑐% , ̂ 𝑐% − 𝑐∗ ≤ ∑%&' ( ̂ -"0-∗ %0 ̂ -"&'0-∗ % 5; + ; 5 ∑%&' ( ∇𝑓% ̂ 𝑐% 5 = ̂ -'0-∗ % 5; + ; 5 ∑%&' ( ∇𝑓% ̂ 𝑐% 5
  45. Warm-up: Online Gradient Descent 47 Set ̂ 𝑐' ∈ Θ.

    Fix 𝜂 > 0. For 𝑡 = 1, … , 𝑇 Output ̂ 𝑐% and observe 𝑓% ̂ 𝑐%,' ← arg min ̂ 𝑐% − 𝜂∇𝑓% ̂ 𝑐% − 𝑐 𝑐 ∈ Θ From the update rule and the Pythagorean theorem, ̂ 𝑐%,' − 𝑐∗ 5 ≤ ̂ 𝑐% − 𝜂∇𝑓% ̂ 𝑐% − 𝑐∗ 5 = ̂ 𝑐% − 𝑐∗ 5 + 𝜂5 ∇𝑓% ̂ 𝑐% 5 − 2𝜂 ∇𝑓% ̂ 𝑐% , ̂ 𝑐% − 𝑐∗ . Summing over 𝑡 and ignoring − ̂ 𝑐(,' − 𝑐∗ 5 ≤ 0, ∑%&' ( ∇𝑓% ̂ 𝑐% , ̂ 𝑐% − 𝑐∗ ≤ ∑%&' ( ̂ -"0-∗ %0 ̂ -"&'0-∗ % 5; + ; 5 ∑%&' ( ∇𝑓% ̂ 𝑐% 5 = ̂ -'0-∗ % 5; + ; 5 ∑%&' ( ∇𝑓% ̂ 𝑐% 5 𝜂 = 1/ 𝐺𝑇 ≤ ' 5 ' ; + 𝜂𝐺𝑇 = 𝐺𝑇. diam Θ = 1
  46. Warm-up: Online Gradient Descent 48 Set ̂ 𝑐' ∈ Θ.

    Fix 𝜂 > 0. For 𝑡 = 1, … , 𝑇 Output ̂ 𝑐% and observe 𝑓% ̂ 𝑐%,' ← arg min ̂ 𝑐% − 𝜂∇𝑓% ̂ 𝑐% − 𝑐 𝑐 ∈ Θ From the update rule and the Pythagorean theorem, ̂ 𝑐%,' − 𝑐∗ 5 ≤ ̂ 𝑐% − 𝜂∇𝑓% ̂ 𝑐% − 𝑐∗ 5 = ̂ 𝑐% − 𝑐∗ 5 + 𝜂5 ∇𝑓% ̂ 𝑐% 5 − 2𝜂 ∇𝑓% ̂ 𝑐% , ̂ 𝑐% − 𝑐∗ . Summing over 𝑡 and ignoring − ̂ 𝑐(,' − 𝑐∗ 5 ≤ 0, ∑%&' ( ∇𝑓% ̂ 𝑐% , ̂ 𝑐% − 𝑐∗ ≤ ∑%&' ( ̂ -"0-∗ %0 ̂ -"&'0-∗ % 5; + ; 5 ∑%&' ( ∇𝑓% ̂ 𝑐% 5 = ̂ -'0-∗ % 5; + ; 5 ∑%&' ( ∇𝑓% ̂ 𝑐% 5 By the convexity of 𝑓% , ∑%&' ( 𝑓% ̂ 𝑐% − 𝑓% 𝑐∗ ≤ ∑%&' ( ∇𝑓% ̂ 𝑐% , ̂ 𝑐% − 𝑐∗ ≤ 𝐺𝑇. 𝜂 = 1/ 𝐺𝑇 ≤ ' 5 ' ; + 𝜂𝐺𝑇 = 𝐺𝑇. diam Θ = 1
  47. Closer Look at the 𝑻 Rate 49 Let ∇%= ∇𝑓%(

    ̂ 𝑐%) for brevity. In sum, the 𝑂( 𝑇) regret follows from ∑%&' ( ∇%, ̂ 𝑐% − 𝑐∗ ≤ ' 5; ̂ 𝑐' − 𝑐∗ 5 − ̂ 𝑐5 − 𝑐∗ 5 + ̂ 𝑐5 − 𝑐∗ 5 + ⋯ − ̂ 𝑐% − 𝑐∗ 5 + ̂ 𝑐% − 𝑐∗ 5 ⋯ − ̂ 𝑐(,' − 𝑐∗ 5 + ; 5 ∑%&' ( ∇% 5 𝜂 = 1/ 𝐺𝑇 ≤ ' 5 ' ; + 𝜂𝐺𝑇 = 𝐺𝑇.
  48. Closer Look at the 𝑻 Rate 50 𝜂 = 1/

    𝐺𝑇 If the penalty and stability sum to 𝑂(1) and 𝑂(𝑇), respectively, the regret scales with 𝑇. Let ∇%= ∇𝑓%( ̂ 𝑐%) for brevity. In sum, the 𝑂( 𝑇) regret follows from ∑%&' ( ∇%, ̂ 𝑐% − 𝑐∗ stability ≤ ' 5; ̂ 𝑐' − 𝑐∗ 5 − ̂ 𝑐5 − 𝑐∗ 5 + ̂ 𝑐5 − 𝑐∗ 5 + ⋯ − ̂ 𝑐% − 𝑐∗ 5 + ̂ 𝑐% − 𝑐∗ 5 ⋯ − ̂ 𝑐(,' − 𝑐∗ 5 + ; 5 ∑%&' ( ∇% 5 ≤ ' 5 ' ; + 𝜂𝐺𝑇 = 𝐺𝑇. penalty = 𝟎 Consider achieving a better stability via the elliptical potential lemma. ignored
  49. Elliptical Potential Lemma 51 Recall 𝑥 : ≔ 𝑥6𝐴𝑥 for

    PSD matrix 𝐴. Let 𝐴9 = 𝜀𝐼 and 𝐴% = 𝐴%0' + ∇%∇% 6 for 𝑡 = 1, … , 𝑇. Then, it holds that ∑%&' ( ∇% :" (' 5 ≤ log =>? :) =>? :* ≤ 𝑛 log (@% A + 1 . ∇! ≤ 𝐺
  50. Elliptical Potential Lemma 52 Recall 𝑥 : ≔ 𝑥6𝐴𝑥 for

    PSD matrix 𝐴. Let 𝐴9 = 𝜀𝐼 and 𝐴% = 𝐴%0' + ∇%∇% 6 for 𝑡 = 1, … , 𝑇. Then, it holds that ∑%&' ( ∇% :" (' 5 ≤ log =>? :) =>? :* ≤ 𝑛 log (@% A + 1 . ∇! ≤ 𝐺 Proof ∇% :" (' 5 = ∇% 6𝐴% 0'∇%= 𝐴% 0' r ∇%∇% 6= 𝐴% 0' r 𝐴% − 𝐴%0' ≤ log =>? :" =>? :"(' . For 𝑎, 𝑏 > 0, 𝑎(% 𝑎 − 𝑏 ≤ log ) * . Carefully apply this to eigenvalues. Hence ∑%&' ( ∇% :" (' 5 ≤ log =>? :) =>? :* . Using det 𝐴9 = 𝜀" and det 𝐴( ≤ 𝑇𝐺5 " yields the bound. A.k.a. the concavity of log-det for PSD matrices: ∇log det 𝐴 r 𝐴 − 𝐵 = 𝐴0' r 𝐴 − 𝐵 ≼ log det 𝐴 det 𝐵 = log det 𝐴 − log det 𝐵.
  51. Online Newton Step 53 Set ̂ 𝑐' ∈ Θ. Fix

    𝛾, 𝜀 > 0 and let 𝐴9 = 𝜀𝐼. For 𝑡 = 1, … , 𝑇 Output ̂ 𝑐% and observe 𝑓% 𝐴% ← 𝐴%0' + ∇%∇% 6 ̂ 𝑐%,' ← arg min ̂ 𝑐% − 𝛾0'𝐴% 0'∇% − 𝑐 :" 𝑐 ∈ Θ
  52. Online Newton Step 54 Set ̂ 𝑐' ∈ Θ. Fix

    𝛾, 𝜀 > 0 and let 𝐴9 = 𝜀𝐼. For 𝑡 = 1, … , 𝑇 Output ̂ 𝑐% and observe 𝑓% 𝐴% ← 𝐴%0' + ∇%∇% 6 ̂ 𝑐%,' ← arg min ̂ 𝑐% − 𝛾0'𝐴% 0'∇% − 𝑐 :" 𝑐 ∈ Θ From the update rule and the Pythagorean theorem, ̂ 𝑐%,' − 𝑐∗ :" 5 ≤ ̂ 𝑐% − 𝛾0'𝐴% 0'∇% − 𝑐∗ :" 5 = ̂ 𝑐% − 𝑐∗ :" 5 + 𝛾05 ∇% :" (' 5 − 2𝛾0' ∇%, ̂ 𝑐% − 𝑐∗ .
  53. Online Newton Step 55 Set ̂ 𝑐' ∈ Θ. Fix

    𝛾, 𝜀 > 0 and let 𝐴9 = 𝜀𝐼. For 𝑡 = 1, … , 𝑇 Output ̂ 𝑐% and observe 𝑓% 𝐴% ← 𝐴%0' + ∇%∇% 6 ̂ 𝑐%,' ← arg min ̂ 𝑐% − 𝛾0'𝐴% 0'∇% − 𝑐 :" 𝑐 ∈ Θ From the update rule and the Pythagorean theorem, ̂ 𝑐%,' − 𝑐∗ :" 5 ≤ ̂ 𝑐% − 𝛾0'𝐴% 0'∇% − 𝑐∗ :" 5 = ̂ 𝑐% − 𝑐∗ :" 5 + 𝛾05 ∇% :" (' 5 − 2𝛾0' ∇%, ̂ 𝑐% − 𝑐∗ . Summing over 𝑡, ∑%&' ( ∇%, ̂ 𝑐% − 𝑐∗ ≤ B 5 ∑%&' ( ̂ 𝑐% − 𝑐∗ :" 5 − ̂ 𝑐%,' − 𝑐∗ :" 5 + ' 5B ∑%&' ( ∇% :" (' 5
  54. Online Newton Step 56 Set ̂ 𝑐' ∈ Θ. Fix

    𝛾, 𝜀 > 0 and let 𝐴9 = 𝜀𝐼. For 𝑡 = 1, … , 𝑇 Output ̂ 𝑐% and observe 𝑓% 𝐴% ← 𝐴%0' + ∇%∇% 6 ̂ 𝑐%,' ← arg min ̂ 𝑐% − 𝛾0'𝐴% 0'∇% − 𝑐 :" 𝑐 ∈ Θ From the update rule and the Pythagorean theorem, ̂ 𝑐%,' − 𝑐∗ :" 5 ≤ ̂ 𝑐% − 𝛾0'𝐴% 0'∇% − 𝑐∗ :" 5 = ̂ 𝑐% − 𝑐∗ :" 5 + 𝛾05 ∇% :" (' 5 − 2𝛾0' ∇%, ̂ 𝑐% − 𝑐∗ . Summing over 𝑡, ∑%&' ( ∇%, ̂ 𝑐% − 𝑐∗ ≤ B 5 ∑%&' ( ̂ 𝑐% − 𝑐∗ :" 5 − ̂ 𝑐%,' − 𝑐∗ :" 5 + ' 5B ∑%&' ( ∇% :" (' 5 ≤ B 5 ∑%&' ( ̂ 𝑐% − 𝑐∗ :" 5 − ̂ 𝑐%,' − 𝑐∗ :" 5 + " 5B log (@% A + 1 Elliptical potential lem.
  55. Online Newton Step 57 Set ̂ 𝑐' ∈ Θ. Fix

    𝛾, 𝜀 > 0 and let 𝐴9 = 𝜀𝐼. For 𝑡 = 1, … , 𝑇 Output ̂ 𝑐% and observe 𝑓% 𝐴% ← 𝐴%0' + ∇%∇% 6 ̂ 𝑐%,' ← arg min ̂ 𝑐% − 𝛾0'𝐴% 0'∇% − 𝑐 :" 𝑐 ∈ Θ From the update rule and the Pythagorean theorem, ̂ 𝑐%,' − 𝑐∗ :" 5 ≤ ̂ 𝑐% − 𝛾0'𝐴% 0'∇% − 𝑐∗ :" 5 = ̂ 𝑐% − 𝑐∗ :" 5 + 𝛾05 ∇% :" (' 5 − 2𝛾0' ∇%, ̂ 𝑐% − 𝑐∗ . Summing over 𝑡, ∑%&' ( ∇%, ̂ 𝑐% − 𝑐∗ ≤ B 5 ∑%&' ( ̂ 𝑐% − 𝑐∗ :" 5 − ̂ 𝑐%,' − 𝑐∗ :" 5 + ' 5B ∑%&' ( ∇% :" (' 5 ≤ B 5 ∑%&' ( ̂ 𝑐% − 𝑐∗ :" 5 − ̂ 𝑐%,' − 𝑐∗ :" 5 + " 5B log (@% A + 1 Elliptical potential lem. Take a closer look at the penalty.
  56. ONS: Penalty 58 ∑%&' ( ̂ 𝑐% − 𝑐∗ :"

    5 − ̂ 𝑐%,' − 𝑐∗ :" 5 = ̂ 𝑐' − 𝑐∗ :' 5 − ̂ 𝑐5 − 𝑐∗ :' 5 + ̂ 𝑐5 − 𝑐∗ :% 5 − ⋯ − ̂ 𝑐% − 𝑐∗ :"(' 5 + ̂ 𝑐% − 𝑐∗ :" 5 ⋯ − ̂ 𝑐(,' − 𝑐∗ :) 5
  57. ONS: Penalty 59 ∑%&' ( ̂ 𝑐% − 𝑐∗ :"

    5 − ̂ 𝑐%,' − 𝑐∗ :" 5 = ̂ 𝑐' − 𝑐∗ :' 5 − ̂ 𝑐5 − 𝑐∗ :' 5 + ̂ 𝑐5 − 𝑐∗ :% 5 − ⋯ − ̂ 𝑐% − 𝑐∗ :"(' 5 + ̂ 𝑐% − 𝑐∗ :" 5 ⋯ − ̂ 𝑐(,' − 𝑐∗ :) 5 penalty = ̂ 𝑐! − 𝑐∗ + 𝐴! − 𝐴!(% ̂ 𝑐! − 𝑐∗ = ∇! + ̂ 𝑐! − 𝑐∗ , ignored
  58. ONS: Penalty 60 ∑%&' ( ̂ 𝑐% − 𝑐∗ :"

    5 − ̂ 𝑐%,' − 𝑐∗ :" 5 = ̂ 𝑐' − 𝑐∗ :' 5 − ̂ 𝑐5 − 𝑐∗ :' 5 + ̂ 𝑐5 − 𝑐∗ :% 5 − ⋯ − ̂ 𝑐% − 𝑐∗ :"(' 5 + ̂ 𝑐% − 𝑐∗ :" 5 ⋯ − ̂ 𝑐(,' − 𝑐∗ :) 5 ≤ ̂ 𝑐' − 𝑐∗ :* 5 + ̂ 𝑐' − 𝑐∗ :' 5 − ̂ 𝑐' − 𝑐∗ :* 5 + ∑%&5 ( ∇% 6 ̂ 𝑐% − 𝑐∗ 5 penalty = ̂ 𝑐! − 𝑐∗ + 𝐴! − 𝐴!(% ̂ 𝑐! − 𝑐∗ = ∇! + ̂ 𝑐! − 𝑐∗ , ignored ≤ 𝜀 + ∑%&' ( ∇% 6 ̂ 𝑐% − 𝑐∗ 5
  59. ONS: Penalty 61 ∑%&' ( ̂ 𝑐% − 𝑐∗ :"

    5 − ̂ 𝑐%,' − 𝑐∗ :" 5 = ̂ 𝑐' − 𝑐∗ :' 5 − ̂ 𝑐5 − 𝑐∗ :' 5 + ̂ 𝑐5 − 𝑐∗ :% 5 − ⋯ − ̂ 𝑐% − 𝑐∗ :"(' 5 + ̂ 𝑐% − 𝑐∗ :" 5 ⋯ − ̂ 𝑐(,' − 𝑐∗ :) 5 ≤ ̂ 𝑐' − 𝑐∗ :* 5 + ̂ 𝑐' − 𝑐∗ :' 5 − ̂ 𝑐' − 𝑐∗ :* 5 + ∑%&5 ( ∇% 6 ̂ 𝑐% − 𝑐∗ 5 penalty = ̂ 𝑐! − 𝑐∗ + 𝐴! − 𝐴!(% ̂ 𝑐! − 𝑐∗ = ∇! + ̂ 𝑐! − 𝑐∗ , ≤ 𝜀 + ∑%&' ( ∇% 6 ̂ 𝑐% − 𝑐∗ 5 Therefore, setting 𝜀 = 1/𝛾5, ∑%&' ( ∇%, ̂ 𝑐% − 𝑐∗ ≤ BA 5 + B 5 ∑%&' ( ∇% 6 ̂ 𝑐% − 𝑐∗ 5 + " 5B log (@% A + 1 ⟹ ∑%&' ( ∇%, ̂ 𝑐% − 𝑐∗ − B 5 ∇% 6 ̂ 𝑐% − 𝑐∗ 5 ≤ " 5B log 𝑇𝐺5𝛾5 + 1 + ' 5B . ignored
  60. ONS: Using Exp-Concavity 62 If 𝑓% is 𝛼-exp-concave, for 𝛾

    ≤ ' 5 min {𝛼, 1/𝐺}, 𝑓% ̂ 𝑐% − 𝑓% 𝑐∗ ≤ ∇%, ̂ 𝑐% − 𝑐∗ − B 5 ∇% 6 ̂ 𝑐% − 𝑐∗ 5 . Therefore, setting 𝛾 = ' 5 min {𝛼, 1/𝐺}, we obtain the desired regret bound: ∑%&' ( 𝑓% ̂ 𝑐% − 𝑓% 𝑐∗ ≤ ∑%&' ( ∇%, ̂ 𝑐% − 𝑐∗ − B 5 ∇% 6 ̂ 𝑐% − 𝑐∗ 5 ≤ 𝑛 ' 1 + 𝐺 log ( . + 1 + 1 . cf. if 𝑓! is convex, 𝑓! ̂ 𝑐! − 𝑓! 𝑐∗ ≤ ∇! , ̂ 𝑐! − 𝑐∗ . ≤ " 5B log 𝑇𝐺5𝛾5 + 1 + ' 5B . + ,-. /,1 ≤ + / + + 1 (harmonic mean ≤ 𝑛 ×min)
  61. MetaGrad 63 Tim van Erven, Wouter M. Koolen, and Dirk

    van der Hoeven (NeurIPS 2016, JMLR 2021)
  62. ONS Requires Prior Knowledge of 𝜶 64 Set ̂ 𝑐'

    ∈ Θ. Fix 𝛾 = ' 5 min{𝛼, 1/𝐺}, 𝜀 = 1/𝛾5 and let 𝐴9 = 𝜀𝐼. For 𝑡 = 1, … , 𝑇 Output ̂ 𝑐% and observe 𝑓% 𝐴% ← 𝐴%0' + ∇%∇% 6 ̂ 𝑐%,' ← arg min ̂ 𝑐% − 𝛾0'𝐴% 0'∇% − 𝑐 :" 𝑐 ∈ Θ We may not know 𝛼 in advance. Even worse, we may encounter 𝛼 = 0 at some round. ONS fails in such uncertain situations… Adapt to the uncertainty with multiple ONS!
  63. MetaGrad: High-Level Idea 65 • Loss functions 𝑓% are 𝛼-exp-concave,

    but 𝛼 is unknown, possibly 𝛼 = 0. • Keep experts with different learning rates 𝜂 > 0, called 𝜂-experts. • Each 𝜂-expert runs ONS with its own exp-concave surrogate loss 𝑓% ;. • MetaGrad aggregates the experts’ outputs to return a single output. ⋯ 𝜼-experts 🤖 Learner
  64. MetaGrad: High-Level Idea 66 • Loss functions 𝑓% are 𝛼-exp-concave,

    but 𝛼 is unknown, possibly 𝛼 = 0. • Keep experts with different learning rates 𝜂 > 0, called 𝜂-experts. • Each 𝜂-expert runs ONS with its own exp-concave surrogate loss 𝑓% ;. • MetaGrad aggregates the experts’ outputs to return a single output. ⋯ 𝜼-experts 🤖 Learner Theorem (Van Erven et al. 2016, 2021) MetaGrad simultaneously enjoys • ∑%&' ( 𝑓% 𝑤% − 𝑓%(𝑢) ≤ 𝑂(𝑛 𝐺 + 1/𝛼 log 𝑇), • ∑%&' ( 𝑓% 𝑤% − 𝑓%(𝑢) ≤ 𝑂(𝐺 𝑇 log log 𝑇).
  65. Notation 67 • For each 𝑡: learner’s loss 𝑓% ,

    learner’s output 𝑤% , and 𝑔% ∈ 𝜕𝑓%(𝑤%). • 𝒢 ≔ 𝜂C = 5(2 D@ 𝑖 = 0,1,2, … , ' 5 log 𝑇 is the set of 𝜂 values; 𝒢 = Θ(log 𝑇). • For each 𝑡 and 𝜂: 𝜂-expert’s loss 𝑓% ; and 𝜂-expert’s output 𝑤% ;, where 🤖 ⋯ Learner 𝜼-experts 𝑓% ; 𝑤 ≔ −𝜂 𝑤% − 𝑤, 𝑔% + 𝜂5 𝑤% − 𝑤, 𝑔% 5 ∀𝑤 ∈ Θ.
  66. Notation 68 • For each 𝑡: learner’s loss 𝑓% ,

    learner’s output 𝑤% , and 𝑔% ∈ 𝜕𝑓%(𝑤%). • 𝒢 ≔ 𝜂C = 5(2 D@ 𝑖 = 0,1,2, … , ' 5 log 𝑇 is the set of 𝜂 values; 𝒢 = Θ(log 𝑇). • For each 𝑡 and 𝜂: 𝜂-expert’s loss 𝑓% ; and 𝜂-expert’s output 𝑤% ;, where 🤖 ⋯ Learner 𝜼-experts At round 𝑡 1. Play 𝑤% 2. Incur 𝑓%(𝑤%) and observe 𝑔% ∈ 𝜕𝑓%(𝑤%) 𝑓% ; 𝑤 ≔ −𝜂 𝑤% − 𝑤, 𝑔% + 𝜂5 𝑤% − 𝑤, 𝑔% 5 ∀𝑤 ∈ Θ.
  67. Notation 69 • For each 𝑡: learner’s loss 𝑓% ,

    learner’s output 𝑤% , and 𝑔% ∈ 𝜕𝑓%(𝑤%). • 𝒢 ≔ 𝜂C = 5(2 D@ 𝑖 = 0,1,2, … , ' 5 log 𝑇 is the set of 𝜂 values; 𝒢 = Θ(log 𝑇). • For each 𝑡 and 𝜂: 𝜂-expert’s loss 𝑓% ; and 𝜂-expert’s output 𝑤% ;, where 🤖 ⋯ Learner 𝜼-experts At round 𝑡 1. Play 𝑤% 2. Incur 𝑓%(𝑤%) and observe 𝑔% ∈ 𝜕𝑓%(𝑤%) 3. Send 𝑤% and 𝑔% 𝑓% ; 𝑤 ≔ −𝜂 𝑤% − 𝑤, 𝑔% + 𝜂5 𝑤% − 𝑤, 𝑔% 5 ∀𝑤 ∈ Θ.
  68. Notation 70 • For each 𝑡: learner’s loss 𝑓% ,

    learner’s output 𝑤% , and 𝑔% ∈ 𝜕𝑓%(𝑤%). • 𝒢 ≔ 𝜂C = 5(2 D@ 𝑖 = 0,1,2, … , ' 5 log 𝑇 is the set of 𝜂 values; 𝒢 = Θ(log 𝑇). • For each 𝑡 and 𝜂: 𝜂-expert’s loss 𝑓% ; and 𝜂-expert’s output 𝑤% ;, where 🤖 ⋯ Learner 𝜼-experts At round 𝑡 1. Play 𝑤% 2. Incur 𝑓%(𝑤%) and observe 𝑔% ∈ 𝜕𝑓%(𝑤%) 3. Send 𝑤% and 𝑔% 4. Compute 𝑤%,' ; via ONS applied to 𝑓% ; 𝑓% ; 𝑤 ≔ −𝜂 𝑤% − 𝑤, 𝑔% + 𝜂5 𝑤% − 𝑤, 𝑔% 5 ∀𝑤 ∈ Θ.
  69. Notation 71 • For each 𝑡: learner’s loss 𝑓% ,

    learner’s output 𝑤% , and 𝑔% ∈ 𝜕𝑓%(𝑤%). • 𝒢 ≔ 𝜂C = 5(2 D@ 𝑖 = 0,1,2, … , ' 5 log 𝑇 is the set of 𝜂 values; 𝒢 = Θ(log 𝑇). • For each 𝑡 and 𝜂: 𝜂-expert’s loss 𝑓% ; and 𝜂-expert’s output 𝑤% ;, where 🤖 ⋯ Learner 𝜼-experts At round 𝑡 1. Play 𝑤% 2. Incur 𝑓%(𝑤%) and observe 𝑔% ∈ 𝜕𝑓%(𝑤%) 𝑓% ; 𝑤 ≔ −𝜂 𝑤% − 𝑤, 𝑔% + 𝜂5 𝑤% − 𝑤, 𝑔% 5 ∀𝑤 ∈ Θ. 3. Send 𝑤% and 𝑔% 5. Aggregate 𝑤%,' ; to compute 𝑤%,' 4. Compute 𝑤%,' ; via ONS applied to 𝑓% ;
  70. Regret Decomposition 72 Since every 𝑓% is convex, for any

    comparator 𝑢 ∈ Θ, we have ∑%&' ( 𝑓% 𝑤% − 𝑓%(𝑢) ≤ ∑%&' ( 𝑤% − 𝑢, 𝑔% =∶ O 𝑅( E.
  71. Regret Decomposition 73 Since every 𝑓% is convex, for any

    comparator 𝑢 ∈ Θ, we have ∑%&' ( 𝑓% 𝑤% − 𝑓%(𝑢) ≤ ∑%&' ( 𝑤% − 𝑢, 𝑔% =∶ O 𝑅( E. Decomposition of O 𝑅( E O 𝑅( E = ∑%&' ( 𝑤% − 𝑢, 𝑔% = − ' ; ∑%&' ( 𝑓% ; 𝑢 + 𝜂 ∑%&' ( 𝑤% − 𝑢, 𝑔% 5 = − ' ; ∑%&' ( 𝑓% ; 𝑢 + 𝜂𝑉( E. =∶ 𝑉( E Recall 𝑓% ; 𝑤 = −𝜂 𝑤% − 𝑤, 𝑔% + 𝜂5 𝑤% − 𝑤, 𝑔% 5.
  72. Regret Decomposition 74 Since every 𝑓% is convex, for any

    comparator 𝑢 ∈ Θ, we have ∑%&' ( 𝑓% 𝑤% − 𝑓%(𝑢) ≤ ∑%&' ( 𝑤% − 𝑢, 𝑔% =∶ O 𝑅( E. Decomposition of O 𝑅( E O 𝑅( E = ∑%&' ( 𝑤% − 𝑢, 𝑔% = − ' ; ∑%&' ( 𝑓% ; 𝑢 + 𝜂 ∑%&' ( 𝑤% − 𝑢, 𝑔% 5 = − ' ; ∑%&' ( 𝑓% ; 𝑢 + 𝜂𝑉( E. =∶ 𝑉( E By using 𝑓% ; 𝑤% = 0, for all 𝜼 ∈ 𝓖 simultaneously, O 𝑅( E = ' ; ∑%&' ( 𝑓% ; 𝑤% − 𝑓% ; 𝑤% ; + ∑%&' ( 𝑓% ; 𝑤% ; − 𝑓% ; 𝑢 + 𝜂𝑉( E. Regret of learner against 𝑤! ' Regret of 𝜂-expert against 𝑢 Recall 𝑓% ; 𝑤 = −𝜂 𝑤% − 𝑤, 𝑔% + 𝜂5 𝑤% − 𝑤, 𝑔% 5.
  73. Bounding Each Component 75 For all 𝜂 ∈ 𝒢 simultaneously,

    O 𝑅( E = ' ; ∑%&' ( 𝑓% ; 𝑤% − 𝑓% ; 𝑤% ; + ∑%&' ( 𝑓% ; 𝑤% ; − 𝑓% ; 𝑢 + 𝜂𝑉( E. 1. Regret of learner against 𝑤! ' 2. Regret of 𝜂-expert against 𝑢
  74. Bounding Each Component 76 For all 𝜂 ∈ 𝒢 simultaneously,

    O 𝑅( E = ' ; ∑%&' ( 𝑓% ; 𝑤% − 𝑓% ; 𝑤% ; + ∑%&' ( 𝑓% ; 𝑤% ; − 𝑓% ; 𝑢 + 𝜂𝑉( E. If 𝑤% ; are aggregated by the exponentially weighted averaging, 1 is 𝑂(log log 𝑇). 1. Regret of learner against 𝑤! ' 2. Regret of 𝜂-expert against 𝑢
  75. Bounding Each Component 77 For all 𝜂 ∈ 𝒢 simultaneously,

    O 𝑅( E = ' ; ∑%&' ( 𝑓% ; 𝑤% − 𝑓% ; 𝑤% ; + ∑%&' ( 𝑓% ; 𝑤% ; − 𝑓% ; 𝑢 + 𝜂𝑉( E. If 𝑤% ; are aggregated by the exponentially weighted averaging, 1 is 𝑂(log log 𝑇). Since 𝑤% ; is computed by ONS applied to 𝑓% ;, 2 is 𝑂 𝑛 log 𝑇 . (By elementary calculation, 𝑓! ' is Ω(1)-exp-concave and ∇𝑓! '(𝑤! ') = 𝑂(1) for every 𝜂 ∈ 𝒢 ⊆ 0, % -. .) 1. Regret of learner against 𝑤! ' 2. Regret of 𝜂-expert against 𝑢
  76. Bounding Each Component 78 For all 𝜂 ∈ 𝒢 simultaneously,

    O 𝑅( E = ' ; ∑%&' ( 𝑓% ; 𝑤% − 𝑓% ; 𝑤% ; + ∑%&' ( 𝑓% ; 𝑤% ; − 𝑓% ; 𝑢 + 𝜂𝑉( E. If 𝑤% ; are aggregated by the exponentially weighted averaging, 1 is 𝑂(log log 𝑇). Since 𝑤% ; is computed by ONS applied to 𝑓% ;, 2 is 𝑂 𝑛 log 𝑇 . (By elementary calculation, 𝑓! ' is Ω(1)-exp-concave and ∇𝑓! '(𝑤! ') = 𝑂(1) for every 𝜂 ∈ 𝒢 ⊆ 0, % -. .) Therefore, for all 𝜂 ∈ 𝒢 simultaneously, O 𝑅( E = 𝑂 " FGH ( ; + 𝜂𝑉( E . 1. Regret of learner against 𝑤! ' 2. Regret of 𝜂-expert against 𝑢
  77. Infeasible Ideal Tuning 79 If 𝑉( E = ∑%&' (

    𝑤% − 𝑢, 𝑔% 5 is known a priori, by using only 𝜂 = 𝜂∗ ≃ " FGH ( I) 3 , O 𝑅( E = 𝑂 " FGH ( ; + 𝜂𝑉( E ≃ 𝑂 𝑉( E𝑛 log 𝑇 .
  78. Infeasible Ideal Tuning 80 If 𝑉( E = ∑%&' (

    𝑤% − 𝑢, 𝑔% 5 is known a priori, by using only 𝜂 = 𝜂∗ ≃ " FGH ( I) 3 , O 𝑅( E = 𝑂 " FGH ( ; + 𝜂𝑉( E ≃ 𝑂 𝑉( E𝑛 log 𝑇 . If it turns out that all 𝑓% are 𝛼-exp-concave, (informally,) ∑%&' ( 𝑓% 𝑤% − 𝑓%(𝑢) ≤ ∑%&' ( 𝑤% − 𝑢, 𝑔% − 1 5 ∑%&' ( 𝑤% − 𝑢, 𝑔% 5 = O 𝑅( E − 1 5 𝑉( E.
  79. Infeasible Ideal Tuning 81 If 𝑉( E = ∑%&' (

    𝑤% − 𝑢, 𝑔% 5 is known a priori, by using only 𝜂 = 𝜂∗ ≃ " FGH ( I) 3 , O 𝑅( E = 𝑂 " FGH ( ; + 𝜂𝑉( E ≃ 𝑂 𝑉( E𝑛 log 𝑇 . If it turns out that all 𝑓% are 𝛼-exp-concave, (informally,) ∑%&' ( 𝑓% 𝑤% − 𝑓%(𝑢) ≤ ∑%&' ( 𝑤% − 𝑢, 𝑔% − 1 5 ∑%&' ( 𝑤% − 𝑢, 𝑔% 5 = O 𝑅( E − 1 5 𝑉( E. By self-bounding, regardless of the 𝑉( E value, O 𝑅( E − 1 5 𝑉( E ≾ 𝑉( E𝑛 log 𝑇 − 𝛼𝑉( E ≾ " 1 log 𝑇, achieving the same bound as ONS without using 𝛼.
  80. Infeasible Ideal Tuning 82 If 𝑉( E = ∑%&' (

    𝑤% − 𝑢, 𝑔% 5 is known a priori, by using only 𝜂 = 𝜂∗ ≃ " FGH ( I) 3 , O 𝑅( E = 𝑂 " FGH ( ; + 𝜂𝑉( E ≃ 𝑂 𝑉( E𝑛 log 𝑇 . If it turns out that all 𝑓% are 𝛼-exp-concave, (informally,) ∑%&' ( 𝑓% 𝑤% − 𝑓%(𝑢) ≤ ∑%&' ( 𝑤% − 𝑢, 𝑔% − 1 5 ∑%&' ( 𝑤% − 𝑢, 𝑔% 5 = O 𝑅( E − 1 5 𝑉( E. By self-bounding, regardless of the 𝑉( E value, O 𝑅( E − 1 5 𝑉( E ≾ 𝑉( E𝑛 log 𝑇 − 𝛼𝑉( E ≾ " 1 log 𝑇, achieving the same bound as ONS without using 𝛼. However, 𝑽𝑻 𝒖 is unknown… Use the fact that „ 𝑹𝑻 𝒖 = 𝑶 𝒏 𝐥𝐨𝐠 𝑻 𝜼 + 𝜼𝑽𝑻 𝒖 holds for all 𝜼 ∈ 𝓖!
  81. Exploiting Multiple Learning Rates 83 Let 𝜂∗ ≃ " FGH

    ( I) 3 ≥ ' D@ ( be the unknown best learning rate. Recall 𝒢 = 𝜂C = 5(2 D@ 𝑖 = 0,1,2, … , ' 5 log 𝑇 ⊆ 0, ' D@ ( O 𝑅( E ≾ " FGH ( ; + 𝜂𝑉( E holds for all 𝜂 ∈ 𝒢).
  82. Exploiting Multiple Learning Rates 84 Let 𝜂∗ ≃ " FGH

    ( I) 3 ≥ ' D@ ( be the unknown best learning rate. If 𝜂∗ ∈ ' D@ ( , ' D@ , there exists 𝜂 ∈ 𝒢 s.t. 𝜂∗ ∈ ; 5 , 𝜂 , hence O 𝑅( E ≾ " FGH ( ; + 𝜂𝑉( E ≤ " FGH ( ;∗ + 2𝜂∗𝑉( E ≃ 𝑉( E𝑛 log 𝑇. Recall 𝒢 = 𝜂C = 5(2 D@ 𝑖 = 0,1,2, … , ' 5 log 𝑇 ⊆ 0, ' D@ ( O 𝑅( E ≾ " FGH ( ; + 𝜂𝑉( E holds for all 𝜂 ∈ 𝒢).
  83. Exploiting Multiple Learning Rates 85 Let 𝜂∗ ≃ " FGH

    ( I) 3 ≥ ' D@ ( be the unknown best learning rate. Recall 𝒢 = 𝜂C = 5(2 D@ 𝑖 = 0,1,2, … , ' 5 log 𝑇 ⊆ 0, ' D@ ( O 𝑅( E ≾ " FGH ( ; + 𝜂𝑉( E holds for all 𝜂 ∈ 𝒢). If 𝜂∗ ∈ ' D@ ( , ' D@ , there exists 𝜂 ∈ 𝒢 s.t. 𝜂∗ ∈ ; 5 , 𝜂 , hence O 𝑅( E ≾ " FGH ( ; + 𝜂𝑉( E ≤ " FGH ( ;∗ + 2𝜂∗𝑉( E ≃ 𝑉( E𝑛 log 𝑇. If 𝜂∗ ≃ " FGH ( I) 3 ≥ ' D@ , we have 𝑉( E ≾ 𝐺5𝑛 log 𝑇. Thus, for 𝜂 = 𝜂9 = ' D@ , O 𝑅( E ≃ " FGH ( ;* + 𝜂9𝑉( E ≾ 𝑛𝐺 log 𝑇. In any case, O 𝑅( E = 𝑂 𝑉( E𝑛 log 𝑇 + 𝑛𝐺 log 𝑇 , implying ∑%&' ( 𝑓% 𝑤% − 𝑓%(𝑢) ≤ O 𝑅( E ≾ 𝑛 ' 1 + 𝐺 log 𝑇.
  84. Learning with Suboptimal Actions 87 For 𝑡 = 1, …

    , 𝑇: 👩🦰 Agent 🤖 Learner Learner makes prediction ̂ 𝑐% ∈ Θ of 𝑐∗. Agent faces 𝑋% and takes possibly suboptimal action 𝑥% ∈ 𝑋% . Observes 𝑋%, 𝑥% and updates from ̂ 𝑐% to ̂ 𝑐%,' . Define: • O 𝑅( -∗ ≔ ∑%&' ( ̂ 𝑐% − 𝑐∗, J 𝑥% − 𝑥% . • 𝑉( -∗ ≔ ∑%&' ( ̂ 𝑐% − 𝑐∗, J 𝑥% − 𝑥% 5 . • Δ( ≔ ∑%&' ( max 𝑐∗, 𝑥 𝑥 ∈ 𝑋% − 𝑐∗, 𝑥% (cumulative suboptimality).
  85. Learning with Suboptimal Actions 88 ü Sublinear in Δ( (cf.

    corruption-robustness in bandits). ü Recovers O 𝑅( -∗ = 𝑂 𝑛 log 𝑇 when Δ( = 0. Theorem For ̂ 𝑐', … , ̂ 𝑐( ∈ Θ computed by MetaGrad with feedback subgradients J 𝑥% − 𝑥% , it holds that 𝑅( -∗ ≤ O 𝑅( -∗ = 𝑂 𝑛 log 𝑇 + 𝑛Δ(log 𝑇 .
  86. Learning with Suboptimal Actions 89 ü Sublinear in Δ( (cf.

    corruption-robustness in bandits). ü Recovers O 𝑅( -∗ = 𝑂 𝑛 log 𝑇 when Δ( = 0. Proof sketch Theorem For ̂ 𝑐', … , ̂ 𝑐( ∈ Θ computed by MetaGrad with feedback subgradients J 𝑥% − 𝑥% , it holds that 𝑅( -∗ ≤ O 𝑅( -∗ = 𝑂 𝑛 log 𝑇 + 𝑛Δ(log 𝑇 . By the same discussion as the regret analysis of MetaGrad, O 𝑅( -∗ = ∑%&' ( ̂ 𝑐% − 𝑐∗, J 𝑥% − 𝑥% ≲ 𝑂 𝑉( -∗ 𝑛 log 𝑇 + 𝑛 log 𝑇 . Also, 𝑉( -∗ = ∑%&' ( ̂ 𝑐% − 𝑐∗, J 𝑥% − 𝑥% 5 ≤ O 𝑅( -∗ + 2Δ( holds (cf. 𝑉( -∗ ≤ O 𝑅( -∗ if every 𝑥% is optimal). The claim follows from the sub-additivity of 𝑥 ↦ 𝑥 and self-bounding.
  87. 𝛀(𝒏) Lower Bound 91 Focus on the regret 𝑅( -∗

    = ∑%&' ( 𝑐∗, 𝑥% − J 𝑥% . Theorem For any possibly randomized learner, there is an instance such that 𝑅( -∗ = Ω 𝑛 .
  88. 𝛀(𝒏) Lower Bound 92 Focus on the regret 𝑅( -∗

    = ∑%&' ( 𝑐∗, 𝑥% − J 𝑥% . Theorem For any possibly randomized learner, there is an instance such that 𝑅( -∗ = Ω 𝑛 . Intuition Since 𝑐∗ ∈ ℝ" is unknown, if elements of 𝑐∗ are drawn at random and 𝑋', … , 𝑋" are restricted to line segments, any deterministic learner makes mistakes Ω(𝑛) times in expectation. Thanks to Yao’s minimax principle, for any randomized learner, there is the worst-case instance such that the Ω(𝑛) regert is inevitable.
  89. 𝛀(𝒏) Lower Bound 93 Focus on the regret 𝑅( -∗

    = ∑%&' ( 𝑐∗, 𝑥% − J 𝑥% . Theorem For any possibly randomized learner, there is an instance such that 𝑅( -∗ = Ω 𝑛 . Intuition Since 𝑐∗ ∈ ℝ" is unknown, if elements of 𝑐∗ are drawn at random and 𝑋', … , 𝑋" are restricted to line segments, any deterministic learner makes mistakes Ω(𝑛) times in expectation. Thanks to Yao’s minimax principle, for any randomized learner, there is the worst-case instance such that the Ω(𝑛) regert is inevitable. Can the 𝐥𝐨𝐠 𝑻 in the upper bound removed?
  90. Revisiting Cone-Based Approach 94 Assume • Θ = 𝜃 ∈

    ℝ" 𝜃 = 1 = 𝕊"0' and diam 𝑋% ≤ 1. • 𝑥% and J 𝑥% are optimal for 𝑐∗ and ̂ 𝑐% , respectively.
  91. Revisiting Cone-Based Approach 95 Assume • Θ = 𝜃 ∈

    ℝ" 𝜃 = 1 = 𝕊"0' and diam 𝑋% ≤ 1. • 𝑥% and J 𝑥% are optimal for 𝑐∗ and ̂ 𝑐% , respectively. Lemma (Besbes et al. 2023) Let 𝜃(𝑐∗, ̂ 𝑐%) be the angle between 𝑐∗, ̂ 𝑐% ∈ 𝕊"0'. If 𝜃 𝑐∗, ̂ 𝑐% ≤ Q 5 , we have 𝑐∗, 𝑥% − J 𝑥% ≤ cos 𝜃(𝑐∗, 𝑥% − J 𝑥%) ≤ sin 𝜃(𝑐∗, ̂ 𝑐%).
  92. Revisiting Cone-Based Approach 96 Assume • Θ = 𝜃 ∈

    ℝ" 𝜃 = 1 = 𝕊"0' and diam 𝑋% ≤ 1. • 𝑥% and J 𝑥% are optimal for 𝑐∗ and ̂ 𝑐% , respectively. Lemma (Besbes et al. 2023) Let 𝜃(𝑐∗, ̂ 𝑐%) be the angle between 𝑐∗, ̂ 𝑐% ∈ 𝕊"0'. If 𝜃 𝑐∗, ̂ 𝑐% ≤ Q 5 , we have 𝑐∗, 𝑥% − J 𝑥% ≤ cos 𝜃(𝑐∗, 𝑥% − J 𝑥%) ≤ sin 𝜃(𝑐∗, ̂ 𝑐%). ̂ 𝑐! 𝑐∗ 𝑐∗, 𝑥% − J 𝑥% ≥ 0 and ̂ 𝑐%, 𝑥% − J 𝑥% ≤ 0 must hold by the assumption. 𝑥! − ' 𝑥! must lie here ̂ 𝑐!, 𝑥! − 4 𝑥! ≥ 0 𝜃 𝑐∗, ̂ 𝑐! ≤ 𝜋 2 𝑐∗, 𝑥! − 4 𝑥! ≥ 0 Therefore, cos 𝜃(𝑐∗, 𝑥% − J 𝑥%) ≤ cos Q 5 − 𝜃 𝑐∗, ̂ 𝑐% = sin 𝜃(𝑐∗, ̂ 𝑐%).
  93. An 𝑶(𝟏)-Regret Algorithm for 𝒏 = 𝟐 97 Independent of

    𝑇 (but extending to 𝑛 > 2 seems challenging, as discussed later). Theorem Algorithm 1 achieves 𝔼 𝑅( -∗ = 2𝜋. 𝒩% ≔ 𝑐 ∈ 𝕊' 𝑐, 𝑥% − 𝑥 ≥ 0 ∀𝑥 ∈ 𝑋% is the normal cone of 𝑋% at 𝑥% . 𝒞% is the region such that 𝑐∗ ∈ 𝒞% does not contradict “𝑥R ∈ arg max )∈+4 𝑐∗, 𝑥 for 𝑠 = 1, … , 𝑡 − 1.”
  94. Proof 98 𝔼 𝑐∗, 𝑥% − J 𝑥% = Pr

    𝒞% ∖ int 𝒩% 𝔼 𝑐∗, 𝑥% − J 𝑥% | 𝒞% ∖ int 𝒩% Focus on round 𝑡. If ̂ 𝑐% ∈ int(𝒩%), J 𝑥% = 𝑥% and hence 𝑐∗, 𝑥% − J 𝑥% = 0. Taking expectation of drawing ̂ 𝑐% ∈ 𝒞% , = :(𝒞"∖UV? 𝒩 " ) :(𝒞") 𝔼 𝑐∗, 𝑥% − J 𝑥% | 𝒞% ∖ int 𝒩% , where 𝐴(⋅) denotes the arc length (= central angle).
  95. Proof 99 Since 𝒞%,' ← 𝒞% ∩ 𝒩% , Hence

    𝔼 𝑐∗, 𝑥% − J 𝑥% ≤ 𝐴(𝒞% ∖ int 𝒩% ) in any case. 𝔼 𝑅( -∗ = ∑%&' ( 𝔼 𝑐∗, 𝑥% − J 𝑥% ≤ ∑%&' ( 𝐴 𝒞% ∖ int 𝒩% ≤ 2𝜋. If 𝐴 𝒞% ≥ Q 5 , 𝔼 𝑐∗, 𝑥% − J 𝑥% = :(𝒞"∖UV? 𝒩 " ) :(𝒞") 𝔼 𝑐∗, 𝑥% − J 𝑥% | 𝒞% ∖ int 𝒩% If 𝐴 𝒞% < Q 5 , 𝔼 𝑐∗, 𝑥% − J 𝑥% = :(𝒞"∖UV? 𝒩 " ) :(𝒞") 𝔼 𝑐∗, 𝑥% − J 𝑥% | 𝒞% ∖ int 𝒩% ≤ : 𝒞"∖UV? 𝒩 " : 𝒞" sin 𝜃 𝑐∗, ̂ 𝑐% ≤ :(𝒞"∖UV? 𝒩 " ) :(𝒞") sin 𝐴(𝒞%) ≤ 𝐴(𝒞% ∖ int 𝒩% ). ≤ 5 Q ⋅ 1 ⋅ 𝐴(𝒞% ∖ int 𝒩% ) ≤ 𝐴(𝒞% ∖ int 𝒩% ).
  96. Conclusion 100 • 𝑅( -∗ = 𝑂 𝑛 log 𝑇

    + 𝑛Δ(log 𝑇 by ONS. • O 𝑅( -∗ = 𝑂 𝑛 log 𝑇 + 𝑛Δ(log 𝑇 by MetaGrad for possibly suboptimal case. • 𝑅( -∗ = Ω 𝑛 . • 𝑅( -∗ = 𝑂 1 for 𝑛 = 2. Future work • Tight analysis for general 𝑛. – Difficulty: sin 𝜃 𝑐∗, ̂ 𝑐! ≤ sin 𝐴(𝒞! ) no longer holds. • Exploring other online-learning ideas useful for inverse optimization.