Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Population Dynamics Meet Network Optimization

Avatar for Florian Dörfler Florian Dörfler
June 04, 2025
88

Population Dynamics Meet Network Optimization

Presented at NecSys 2025

Avatar for Florian Dörfler

Florian Dörfler

June 04, 2025
Tweet

Transcript

  1. Population Dynamics Meet Network Optimization Florian Dörfler A Decision-Dependent Stochastic

    Formulation IFAC Conference on Network Systems 2025 Joint work with Zhiyu He, Saverio Bolognani, & Michael Muehlebach
  2. Decision-making in probability spaces The state / decision-variable is a

    probability measure when • using first-principle physics (e.g., quantum or statistical mechanics) • taking a probabilistic view of the world (e.g., Bayesian modeling) • distributional robustification (DRO) accounting for distribution shifts • today: for large populations of agents, e.g., in mean-field settings 2
  3. Preview example: Affinity Maximization in a Polarized Population 3 opportunistic

    political party promoting an ideology large-scale population of individuals
  4. 𝑝𝑘+1 = normalize 𝜆 ⋅ 𝑝𝑘 + 1 − 𝜆

    ⋅ 𝑝0 + 𝜎 ⋅ 𝑝𝑘 ⊤𝑢 𝑢 Polarized population model 4 utility individual position ideology Model [Gaitonde et al., ’21; Dean et al., ’22; Hazła et al., ‘24] initial position Steady state given a fixed ideological position 𝑢 𝑝ss = ℎ(𝑝0 , 𝑢) perpendicular angle → not influenced 𝑝0 𝑝ss 𝑢 acute angle → influenced 𝑢 𝑝0 𝑝ss obtuse angle → anti-influenced 𝑢 𝑝0 𝑝ss 𝑢 𝑝𝑘 biased assimilation 𝑝𝑘 = 𝑢 = 1 𝑝𝑘 T𝑢
  5. Modeling in probability space 5 Distributional modeling for large-scale population

    without identifiers Model 𝜇ss 𝜇ss (𝑢) = ℎ(𝑢,⋅)# 𝜇0 Steady state given a fixed ideological position 𝑢 𝑝ss = ℎ(𝑝0 , 𝑢) pushforward 𝑝𝑘+1 = normalize 𝜆 ⋅ 𝑝𝑘 + 1 − 𝜆 ⋅ 𝑝0 + 𝜎 ⋅ 𝑝𝑘 ⊤𝑢 𝑢 utility individual position ideology initial position biased assimilation 𝑝0 ∼ 𝜇0 𝑝0 sampled from initial distribution 𝜇0 𝜇0
  6. s. t. 𝜇ss (𝑢) = ℎ(𝑢,⋅)# 𝜇0 max ‖𝑢‖≤1 𝔼𝑝∼𝜇ss(𝑢)

    [𝑝⊤𝑢] Affinity maximization 6 Goal: population-wide affinity optimization Distributional steady state for large-scale population Challenges: • nonlinear / ∞-dimensional map ℎ • unknown map ℎ & distribution 𝜇0 • system is not in steady state 𝜇ss • decision-dependence 𝔼𝑝∼𝜇ss(𝑢) decision-maker can at best sample from the currently observed 𝜇𝑘 𝜇ss (𝑢) = ℎ(𝑢,⋅)# 𝜇0
  7. 7 Feedback loop non-stationary distributions unknown & nonlinear dynamics convergence

    optimality random variable 𝑝 decision-making 𝜇𝑘 decision dependence population dynamics 𝜇𝑘+1 with distribution 𝜇𝑘 (approximated by samples) 𝑢𝑘+1 = arg max ‖𝑢‖≤1 𝔼𝑝∼𝜇𝑘 [𝑝⊤𝑢]
  8. Contents & take-home messages 8 ① distribution dynamics ③ finite-sample

    generalization ② online stochastic algorithm decision take into account & control even for fixed sample distribution adapt to & shape distributions 𝜇𝑘 𝜇𝑘+1 ④ case studies affinity maximization & recommender system
  9. 9 (re)optimize stochastic decision-making deciding on 𝑢 𝜇𝑘 𝜇𝑘+1 Broader

    picture: randomness is common (re)sample & random variable 𝑝 follows 𝜇𝑘 decision variable random variable performance objective positions ideology demands price optimal decision & so is feedback endogeneous uncertainty: 𝜇𝑘 depends on 𝑢 min 𝑢 𝔼𝑝∼𝜇𝑘 [Φ(𝑢; 𝑝)]
  10. s.t. 𝜇ss (𝑢) = ℎ(𝑢,⋅)# 𝜇𝑑 min 𝑢 𝔼𝑝∼𝜇ss 𝑢

    Φ 𝑢, 𝑝 Problem formulation 10 decision distribution shift steady-state distribution steady-state map utility decision-dependent problem (differentiable in 𝑢 & Lipschitz in 𝑑) (differentiable) 𝜇𝑘 𝜇𝑘−1 𝑢𝑘 → 𝜇ss (𝑢) = ℎ(𝑢,⋅)# 𝜇𝑑 𝑝 = ℎ(𝑢, 𝑑) 𝑝𝑘 = 𝑓 𝑝𝑘−1 , 𝑢𝑘 , 𝑑 , 𝑝0 ∼ 𝜇0 , 𝑑 ∼ 𝜇𝑑 distribution dynamics initial state decision exogenous disturbance (contracting in 𝑝 & Lipschitz in (𝑑, 𝑢)) (finite absolute moments)
  11. Ingredient: online feedback optimization 11 optimization problem s.t. 𝑝 =

    𝐻𝑢 𝑢 + 𝐻𝑑 𝑑 min 𝑢, 𝑝 Φ 𝑢, 𝑝 gradient descent requiring only steady-state sensitivity 𝐻𝑢 𝑢𝑘+1 = 𝑢𝑘 − 𝜂 ⋅ 𝐼 𝐻𝑢 T · ∇Φ 𝑢𝑘 , 𝒑𝒌 linear time-invariant dynamics 𝑝𝑘+1 = 𝐴𝑝𝑘 + 𝐵𝑢𝑘 + 𝐸𝑑 const. disturbance 𝑑 & steady-state map 𝑝ss = 𝐼 − 𝐴 −1𝐵𝑢 + 𝐼 − 𝐴 −1𝐸𝑑 𝐻𝑢 𝐻𝑑 𝐵 𝐸 𝐴 𝑧−1 𝜂 ∇𝑢 𝜙 𝐻𝑢 T ∇𝑦 𝜙 𝑧−1 𝑢𝑘 + + + 𝑑 + - - 𝑝𝑘 𝒑𝒌 evaluating measurement
  12. 12 Can we copy this recipe here ? Ideally evaluate

    exact gradients of 𝔼𝑝∼𝜇ss 𝑢 [Φ(𝑢, 𝑝)] ∇෩ Φ 𝑢𝑘 = 𝔼𝑑∼𝜇𝑑 [∇Φ(𝑢𝑘 , ℎ(𝑢𝑘 , 𝑑))] = 𝔼𝑑∼𝜇𝑑 [∇𝑢 Φ(𝑢𝑘 , ℎ(𝑢𝑘 , 𝑑)) + ∇𝑢 ℎ 𝑢𝑘 , 𝑑 𝛻𝑝 Φ(𝑢𝑘 , 𝑝)|𝑝=ℎ(𝑢𝑘,𝑑) ] = 𝔼(𝑝,𝑑)∼𝛾ss(𝑢𝑘) [∇𝑢 Φ(𝑢𝑘 , 𝑝) + ∇𝑢 ℎ(𝑢𝑘 , 𝑑) ∇𝑝 Φ(𝑢𝑘 , 𝑝)] steady state induced by 𝑑 ∼ 𝜇𝑑 & 𝑝 ∼ ℎ(𝑢,⋅)# 𝜇𝑑 chain rule & law of total derivative conditions on Φ allow swapping ∇ & 𝔼 Challenges hard to evaluate 𝔼 (integral) no access to the steady state online decision-making! use current samples from 𝜇𝑘 = 𝔼𝑑∼𝜇𝑑 Φ 𝑢, ℎ 𝑢, 𝑑 ≜ ෩ Φ(𝑢)
  13. ∇෩ Φ(𝑢𝑘 ) = 𝔼(𝑝,𝑑)∼𝛾ss(𝑢𝑘) [∇𝑢 Φ(𝑢𝑘 , 𝑝) +

    ∇𝑢 ℎ(𝑢𝑘 , 𝑑)∇𝑝 Φ(𝑢𝑘 , 𝑝)] ෡ ∇𝑘 ෩ Φ(𝑢𝑘 ) = 𝔼(𝑝,𝑑)∼𝛾𝑘 [∇𝑢 Φ(𝑢𝑘 , 𝑝) + ∇𝑢 ℎ(𝑢𝑘 , 𝑑)∇𝑝 Φ(𝑢𝑘 , 𝑝)] Online stochastic algorithm 13 adaption anticipation & steering current steady-state … a finite-sample approximation of 𝜇ss (𝑢𝑘 ) 𝜇𝑘 𝜇𝑘−1 𝑢𝑘+1 = 𝑢𝑘 − 𝜂 ⋅ 1 𝑛mb ෍ 𝑖=1 𝑛mb ∇𝑢 Φ 𝑢𝑘 , 𝑝𝑘 𝑖 + ∇𝑢 ℎ 𝑢𝑘 , 𝑑𝑖 ∇𝑝 Φ 𝑢𝑘 , 𝑝𝑘 𝑖 … which again approximates the true gradient ෡ ∇mb 𝑘 ෩ Φ (𝑢𝑘 ) mini batch composite stochastic gradient
  14. 𝑊1 𝜇, 𝜈 = inf 𝛾∈Γ 𝜇,𝜈 න 𝒳×𝒴 𝑐

    𝑥, 𝑦 d𝛾 𝑥, 𝑦 Ingredient: Wasserstein distance 14 𝜇(𝑥) Metric measuring the discrepancy between distributions 𝜇 & 𝜈 cost of moving mass 𝜈(𝑦) transport plan 𝛾(𝑥, 𝑦) minimum cost of transporting 𝜇 onto 𝜈 Why Wasserstein? general distributions with disjoint supports + tractable theory price 𝑐(𝑥, 𝑦) joint distribution with marginals 𝜇(𝑥) & 𝜈(𝑦) 𝒳 𝒴
  15. Contraction of distribution shifts under decisions 15 → contracting under

    the assumption of Lipschitz dynamics: 𝑊1 𝜇𝑘 , 𝜇ss 𝑢𝑘 ≤ න 𝑝 𝑘 𝑝0 , 𝑢0:𝑘 , 𝑑 − ℎ 𝑢𝑘 , 𝑑 d𝜇0 d𝜇𝑑 closeness between 𝜇𝑘 & 𝜇ss (𝑢𝑘 ) when 𝑢𝑘 is close to 𝑢𝑘−1 𝜇𝑘 weakly converges to 𝜇ss (𝑢𝑘 ) bounded Wasserstein distance 𝑉𝑘 ≤ 𝐿 𝑓 𝑝 𝑉𝑘−1 + 𝑐𝑜𝑛𝑠𝑡. ⋅ ‖𝑢𝑘 − 𝑢𝑘−1 ‖ ∈ (0, 1) drift of decisions implication ෡ ∇𝑘 ෩ Φ 𝑢𝑘 close to ∇ ෩ Φ(𝑢𝑘 ) ෡ ∇𝑘 ෩ Φ 𝑢𝑘 − ∇෩ Φ(𝑢𝑘 ) ≤ 𝑐𝑜𝑛𝑠𝑡. ⋅ 𝑊1 𝜇𝑘 , 𝜇ss 𝑢𝑘 𝜇𝑘 𝜇ss (𝑢𝑘 ) involves ≜ 𝑉𝑘
  16. Optimality in expectation 16 For any window of length 𝑇,

    with the step size 𝜂 = 𝒪(1/ 𝑇) convergence measure average expected second moments of gradients dynamics & composite structure same (best) 𝒪 1/ 𝑇 rate as SGD for static & non-convex problem = 𝒪 1 𝑇 ≤ 8 ෩ Φ 𝑢0 − ෩ Φ∗ 𝜂𝑇 + 𝑐𝑜𝑛𝑠𝑡.⋅ 𝜂 𝑛mb + 𝒪 1 𝑇 1 𝑇 ෍ 𝑘=0 𝑇−1 𝔼 ‖∇෩ Φ(𝑢𝑘 )‖2 naïve vanilla gradient (no anticipation) rate is slower & convergence only up to non-vanishing bias w.r.t. rand. in 𝑢𝑘 non- convex
  17. Population convergence 17 For any window of length 𝑇, with

    the step size 𝜂 = 𝒪(1/ 𝑇) 1 𝑇 ෍ 𝑘=0 𝑇−1 𝔼[𝑊1 (𝜇𝑘 , 𝜇ss(𝑢𝑘 ))] ≤ 𝑐𝑜𝑛𝑠𝑡.⋅ ෩ Φ 𝑢0 − ෩ Φ∗ 𝜂𝑇 + 𝑐𝑜𝑛𝑠𝑡.⋅ 𝜂 𝑛mb + 𝒪 1 𝑇 = 𝒪 𝑇− Τ 1 4 𝜇𝑘 weakly converges to 𝜇ss (𝑢𝑘 ) in the long run why this rate? Wasserstein distance coupling of distribution dynamics & algorithm rate of norm of gradients w.r.t. rand. in 𝑢𝑘 𝑉𝑘 ≤ 𝐿 𝑓 𝑝 𝑉𝑘−1 + 𝐿 𝑓 𝑝𝐿ℎ 𝑢 ‖𝑢𝑘 − 𝑢𝑘−1 ‖
  18. High-probability guarantees 18 For any window of length 𝑇, with

    𝜂 = 𝒪(1/ 𝑇), for fixed 𝜏 ∈ (0,1) holds with probability at least 1 − 𝜏 same dependence on 𝑇 & benign scaling with 𝜏 , e.g. 𝜏 = 10−4 ⟹ lg 1 𝜏 = 4 second moment of gradient 1 𝑇 ෍ 𝑘=0 𝑇−1 ‖∇෩ Φ(𝑢𝑘 )‖2 = 𝒪 1 𝑇 1 + lg 1 𝜏 1 𝑇 ෍ 𝑘=0 𝑇−1 𝑊1 (𝜇𝑘 , 𝜇ss (𝑢𝑘 )) = 𝒪 1 𝑇1/4 1 + lg 1 𝜏 Wasserstein distance failure probability
  19. Φ𝑁 (with 𝜇𝑘 𝑁) generalize to Φ (with 𝜇𝑘 )?

    how will solutions found based on Generalization from a fixed sample distribution 19 true distribution empirical distribution Why? seeding trials restricted sampling (privacy, resource constraints) decision decision Question V.S. 𝜇𝑘 𝜇𝑘 𝑁 = 1 𝑁 ෍ 𝑖=1 𝑁 𝛿 𝑝𝑘 𝑖 𝑝0 1, … 𝑝0 𝑁: fixed sample distribution
  20. Generalization with high probability 20 Can only access samples 𝑁

    & optimize the empirical objective ෩ Φ𝑁 = σ 𝑖=1 𝑁 Φ(𝑢, 𝑝k 𝑖 )/𝑁 with a high probability solutions generalize well to the true objective ෩ Φ For any window of length 𝑇, any fixed 𝜏 ∈ (0,1) when 𝑁 ≥ 1 𝑐2 lg 2𝑐1 𝜏 & 𝜂 = 𝒪(1/ 𝑇) holds with probability at least 1 − 𝜏 size of exogeneous disturbances from 𝜇𝑘 𝑁 1 𝑇 ෍ 𝑘=0 𝑇−1 ‖∇෩ Φ(𝑢𝑘 𝑁)‖2 = 𝒪 1 𝑁 2 𝑟 lg 1 𝜏 2 𝑟 + 𝒪 1 𝑇 1 + lg 1 𝜏 true objective same rate as before
  21. Recall: polarized population model & affinity maximization 22 Goal: population-wide

    affinity optimization Distributional steady state for large- scale population 𝜇ss (𝑢) = ℎ(𝑢,⋅)# 𝜇0 Model 𝑝𝑘+1 = normalize 𝜆 ⋅ 𝑝𝑘 + 1 − 𝜆 ⋅ 𝑝0 + 𝜎 ⋅ 𝑝𝑘 ⊤𝑢 𝑢 utility individual position ideology initial position biased assimilation 𝑝0 ∼ 𝜇0 𝑝0 sampled from initial distribution 𝜇0 𝜇ss 𝜇0 s. t. 𝜇ss (𝑢) = ℎ(𝑢,⋅)# 𝜇0 max ‖𝑢‖≤1 𝔼𝑝∼𝜇ss(𝑢) [𝑝⊤𝑢]
  22. Optimality 23 anticipate & shape distribution dynamics significant improvement in

    optimality vanilla stochastic gradient descent without anticipation our composite algorithm with anticipation & sensitivity estimation 𝑢∗
  23. Histograms of position-decision angles 24 ground-truth optimum (via IPOPT) vanilla

    gradient our composite algorithm higher population-wide affinities more acute angles close to histogram of the optimal solution many samples in [95°, 110°] lower affinity 𝑢 𝑝0 𝑝ss
  24. Dynamics of preference distribution & recommender system 26 choice distribution

    smoothened price with softmax −𝜖𝑢 ≈ min(𝑢) Model for users’ preference 𝑝 for a set of items (e.g., books) initial distribution of user population Goal of recommender system: maximize gains & preserve diversity gain from selling items aligned with preference diversification by entropy regularization 𝑝𝑘+1 = 𝜆1 ⋅ 𝑝𝑘 + 𝜆2 ⋅ softmax(−𝜖𝑢) + (1 − 𝜆1 − 𝜆2 ) ⋅ 𝑝0 max 𝑢 𝑝⊤𝑢 + 𝜌 ෍ 𝑖=1 𝑚 𝑝𝑖 log 𝑝𝑖 𝑝 = ℎ(𝑢, 𝑝0 ) 0 ≤ 𝑢𝑖 ≤ 𝑢 s. t. 1⊤𝑢 = 𝑢𝑡𝑜𝑡𝑎𝑙 = + steady-state choice distribution budget constraints read by user recommended to user similar books
  25. Optimality & distributional convergence 27 ((𝑢∗)) anticipate & shape distribution

    dynamics significant improvement in optimality vanilla stochastic gradient descent without anticipation our composite algorithm with anticipation & sensitivity estimation derivative-free gradient in feedback à la RL
  26. Conclusions distribution shift is amenable to control go with the

    flow ! online stochastic algorithm distribution perspective convergence optimality: expectation + high-prob. generalization for fixed samples 28 adapt to & shape distributions Future work: stochastic constraints, game setup, & model-free
  27. Z. He 29 S. Bolognani M. Muehlebach arXiv GitHub random

    variable 𝑝 decision-making 𝜇𝑘 decision dependence dynamics 𝜇𝑘+1 min 𝑢 𝔼𝑝∼𝜇 [Φ(𝑢; 𝑝)]