Population Dynamics Meet Network Optimization

Population Dynamics Meet Network Optimization Florian Dörfler A Decision-Dependent Stochastic
Formulation IFAC Conference on Network Systems 2025 Joint work with Zhiyu He, Saverio Bolognani, & Michael Muehlebach

Decision-making in probability spaces The state / decision-variable is a
probability measure when • using first-principle physics (e.g., quantum or statistical mechanics) • taking a probabilistic view of the world (e.g., Bayesian modeling) • distributional robustification (DRO) accounting for distribution shifts • today: for large populations of agents, e.g., in mean-field settings 2

Preview example: Affinity Maximization in a Polarized Population 3 opportunistic
political party promoting an ideology large-scale population of individuals

𝑝𝑘+1 = normalize 𝜆 ⋅ 𝑝𝑘 + 1 − 𝜆
⋅ 𝑝0 + 𝜎 ⋅ 𝑝𝑘 ⊤𝑢 𝑢 Polarized population model 4 utility individual position ideology Model [Gaitonde et al., ’21; Dean et al., ’22; Hazła et al., ‘24] initial position Steady state given a fixed ideological position 𝑢 𝑝ss = ℎ(𝑝0 , 𝑢) perpendicular angle → not influenced 𝑝0 𝑝ss 𝑢 acute angle → influenced 𝑢 𝑝0 𝑝ss obtuse angle → anti-influenced 𝑢 𝑝0 𝑝ss 𝑢 𝑝𝑘 biased assimilation 𝑝𝑘 = 𝑢 = 1 𝑝𝑘 T𝑢

Modeling in probability space 5 Distributional modeling for large-scale population
without identifiers Model 𝜇ss 𝜇ss (𝑢) = ℎ(𝑢,⋅)# 𝜇0 Steady state given a fixed ideological position 𝑢 𝑝ss = ℎ(𝑝0 , 𝑢) pushforward 𝑝𝑘+1 = normalize 𝜆 ⋅ 𝑝𝑘 + 1 − 𝜆 ⋅ 𝑝0 + 𝜎 ⋅ 𝑝𝑘 ⊤𝑢 𝑢 utility individual position ideology initial position biased assimilation 𝑝0 ∼ 𝜇0 𝑝0 sampled from initial distribution 𝜇0 𝜇0

s. t. 𝜇ss (𝑢) = ℎ(𝑢,⋅)# 𝜇0 max ‖𝑢‖≤1 𝔼𝑝∼𝜇ss(𝑢)
[𝑝⊤𝑢] Affinity maximization 6 Goal: population-wide affinity optimization Distributional steady state for large-scale population Challenges: • nonlinear / ∞-dimensional map ℎ • unknown map ℎ & distribution 𝜇0 • system is not in steady state 𝜇ss • decision-dependence 𝔼𝑝∼𝜇ss(𝑢) decision-maker can at best sample from the currently observed 𝜇𝑘 𝜇ss (𝑢) = ℎ(𝑢,⋅)# 𝜇0

7 Feedback loop non-stationary distributions unknown & nonlinear dynamics convergence
optimality random variable 𝑝 decision-making 𝜇𝑘 decision dependence population dynamics 𝜇𝑘+1 with distribution 𝜇𝑘 (approximated by samples) 𝑢𝑘+1 = arg max ‖𝑢‖≤1 𝔼𝑝∼𝜇𝑘 [𝑝⊤𝑢]

Contents & take-home messages 8 ① distribution dynamics ③ finite-sample
generalization ② online stochastic algorithm decision take into account & control even for fixed sample distribution adapt to & shape distributions 𝜇𝑘 𝜇𝑘+1 ④ case studies affinity maximization & recommender system

9 (re)optimize stochastic decision-making deciding on 𝑢 𝜇𝑘 𝜇𝑘+1 Broader
picture: randomness is common (re)sample & random variable 𝑝 follows 𝜇𝑘 decision variable random variable performance objective positions ideology demands price optimal decision & so is feedback endogeneous uncertainty: 𝜇𝑘 depends on 𝑢 min 𝑢 𝔼𝑝∼𝜇𝑘 [Φ(𝑢; 𝑝)]

s.t. 𝜇ss (𝑢) = ℎ(𝑢,⋅)# 𝜇𝑑 min 𝑢 𝔼𝑝∼𝜇ss 𝑢
Φ 𝑢, 𝑝 Problem formulation 10 decision distribution shift steady-state distribution steady-state map utility decision-dependent problem (differentiable in 𝑢 & Lipschitz in 𝑑) (differentiable) 𝜇𝑘 𝜇𝑘−1 𝑢𝑘 → 𝜇ss (𝑢) = ℎ(𝑢,⋅)# 𝜇𝑑 𝑝 = ℎ(𝑢, 𝑑) 𝑝𝑘 = 𝑓 𝑝𝑘−1 , 𝑢𝑘 , 𝑑 , 𝑝0 ∼ 𝜇0 , 𝑑 ∼ 𝜇𝑑 distribution dynamics initial state decision exogenous disturbance (contracting in 𝑝 & Lipschitz in (𝑑, 𝑢)) (finite absolute moments)

Ingredient: online feedback optimization 11 optimization problem s.t. 𝑝 =
𝐻𝑢 𝑢 + 𝐻𝑑 𝑑 min 𝑢, 𝑝 Φ 𝑢, 𝑝 gradient descent requiring only steady-state sensitivity 𝐻𝑢 𝑢𝑘+1 = 𝑢𝑘 − 𝜂 ⋅ 𝐼 𝐻𝑢 T · ∇Φ 𝑢𝑘 , 𝒑𝒌 linear time-invariant dynamics 𝑝𝑘+1 = 𝐴𝑝𝑘 + 𝐵𝑢𝑘 + 𝐸𝑑 const. disturbance 𝑑 & steady-state map 𝑝ss = 𝐼 − 𝐴 −1𝐵𝑢 + 𝐼 − 𝐴 −1𝐸𝑑 𝐻𝑢 𝐻𝑑 𝐵 𝐸 𝐴 𝑧−1 𝜂 ∇𝑢 𝜙 𝐻𝑢 T ∇𝑦 𝜙 𝑧−1 𝑢𝑘 + + + 𝑑 + - - 𝑝𝑘 𝒑𝒌 evaluating measurement

12 Can we copy this recipe here ? Ideally evaluate
exact gradients of 𝔼𝑝∼𝜇ss 𝑢 [Φ(𝑢, 𝑝)] ∇෩ Φ 𝑢𝑘 = 𝔼𝑑∼𝜇𝑑 [∇Φ(𝑢𝑘 , ℎ(𝑢𝑘 , 𝑑))] = 𝔼𝑑∼𝜇𝑑 [∇𝑢 Φ(𝑢𝑘 , ℎ(𝑢𝑘 , 𝑑)) + ∇𝑢 ℎ 𝑢𝑘 , 𝑑 𝛻𝑝 Φ(𝑢𝑘 , 𝑝)|𝑝=ℎ(𝑢𝑘,𝑑) ] = 𝔼(𝑝,𝑑)∼𝛾ss(𝑢𝑘) [∇𝑢 Φ(𝑢𝑘 , 𝑝) + ∇𝑢 ℎ(𝑢𝑘 , 𝑑) ∇𝑝 Φ(𝑢𝑘 , 𝑝)] steady state induced by 𝑑 ∼ 𝜇𝑑 & 𝑝 ∼ ℎ(𝑢,⋅)# 𝜇𝑑 chain rule & law of total derivative conditions on Φ allow swapping ∇ & 𝔼 Challenges hard to evaluate 𝔼 (integral) no access to the steady state online decision-making! use current samples from 𝜇𝑘 = 𝔼𝑑∼𝜇𝑑 Φ 𝑢, ℎ 𝑢, 𝑑 ≜ ෩ Φ(𝑢)

∇෩ Φ(𝑢𝑘 ) = 𝔼(𝑝,𝑑)∼𝛾ss(𝑢𝑘) [∇𝑢 Φ(𝑢𝑘 , 𝑝) +
∇𝑢 ℎ(𝑢𝑘 , 𝑑)∇𝑝 Φ(𝑢𝑘 , 𝑝)] ෡ ∇𝑘 ෩ Φ(𝑢𝑘 ) = 𝔼(𝑝,𝑑)∼𝛾𝑘 [∇𝑢 Φ(𝑢𝑘 , 𝑝) + ∇𝑢 ℎ(𝑢𝑘 , 𝑑)∇𝑝 Φ(𝑢𝑘 , 𝑝)] Online stochastic algorithm 13 adaption anticipation & steering current steady-state … a finite-sample approximation of 𝜇ss (𝑢𝑘 ) 𝜇𝑘 𝜇𝑘−1 𝑢𝑘+1 = 𝑢𝑘 − 𝜂 ⋅ 1 𝑛mb ෍ 𝑖=1 𝑛mb ∇𝑢 Φ 𝑢𝑘 , 𝑝𝑘 𝑖 + ∇𝑢 ℎ 𝑢𝑘 , 𝑑𝑖 ∇𝑝 Φ 𝑢𝑘 , 𝑝𝑘 𝑖 … which again approximates the true gradient ෡ ∇mb 𝑘 ෩ Φ (𝑢𝑘 ) mini batch composite stochastic gradient

𝑊1 𝜇, 𝜈 = inf 𝛾∈Γ 𝜇,𝜈 න 𝒳×𝒴 𝑐
𝑥, 𝑦 d𝛾 𝑥, 𝑦 Ingredient: Wasserstein distance 14 𝜇(𝑥) Metric measuring the discrepancy between distributions 𝜇 & 𝜈 cost of moving mass 𝜈(𝑦) transport plan 𝛾(𝑥, 𝑦) minimum cost of transporting 𝜇 onto 𝜈 Why Wasserstein? general distributions with disjoint supports + tractable theory price 𝑐(𝑥, 𝑦) joint distribution with marginals 𝜇(𝑥) & 𝜈(𝑦) 𝒳 𝒴

Contraction of distribution shifts under decisions 15 → contracting under
the assumption of Lipschitz dynamics: 𝑊1 𝜇𝑘 , 𝜇ss 𝑢𝑘 ≤ න 𝑝 𝑘 𝑝0 , 𝑢0:𝑘 , 𝑑 − ℎ 𝑢𝑘 , 𝑑 d𝜇0 d𝜇𝑑 closeness between 𝜇𝑘 & 𝜇ss (𝑢𝑘 ) when 𝑢𝑘 is close to 𝑢𝑘−1 𝜇𝑘 weakly converges to 𝜇ss (𝑢𝑘 ) bounded Wasserstein distance 𝑉𝑘 ≤ 𝐿 𝑓 𝑝 𝑉𝑘−1 + 𝑐𝑜𝑛𝑠𝑡. ⋅ ‖𝑢𝑘 − 𝑢𝑘−1 ‖ ∈ (0, 1) drift of decisions implication ෡ ∇𝑘 ෩ Φ 𝑢𝑘 close to ∇ ෩ Φ(𝑢𝑘 ) ෡ ∇𝑘 ෩ Φ 𝑢𝑘 − ∇෩ Φ(𝑢𝑘 ) ≤ 𝑐𝑜𝑛𝑠𝑡. ⋅ 𝑊1 𝜇𝑘 , 𝜇ss 𝑢𝑘 𝜇𝑘 𝜇ss (𝑢𝑘 ) involves ≜ 𝑉𝑘

Optimality in expectation 16 For any window of length 𝑇,
with the step size 𝜂 = 𝒪(1/ 𝑇) convergence measure average expected second moments of gradients dynamics & composite structure same (best) 𝒪 1/ 𝑇 rate as SGD for static & non-convex problem = 𝒪 1 𝑇 ≤ 8 ෩ Φ 𝑢0 − ෩ Φ∗ 𝜂𝑇 + 𝑐𝑜𝑛𝑠𝑡.⋅ 𝜂 𝑛mb + 𝒪 1 𝑇 1 𝑇 ෍ 𝑘=0 𝑇−1 𝔼 ‖∇෩ Φ(𝑢𝑘 )‖2 naïve vanilla gradient (no anticipation) rate is slower & convergence only up to non-vanishing bias w.r.t. rand. in 𝑢𝑘 non- convex

Population convergence 17 For any window of length 𝑇, with
the step size 𝜂 = 𝒪(1/ 𝑇) 1 𝑇 ෍ 𝑘=0 𝑇−1 𝔼[𝑊1 (𝜇𝑘 , 𝜇ss(𝑢𝑘 ))] ≤ 𝑐𝑜𝑛𝑠𝑡.⋅ ෩ Φ 𝑢0 − ෩ Φ∗ 𝜂𝑇 + 𝑐𝑜𝑛𝑠𝑡.⋅ 𝜂 𝑛mb + 𝒪 1 𝑇 = 𝒪 𝑇− Τ 1 4 𝜇𝑘 weakly converges to 𝜇ss (𝑢𝑘 ) in the long run why this rate? Wasserstein distance coupling of distribution dynamics & algorithm rate of norm of gradients w.r.t. rand. in 𝑢𝑘 𝑉𝑘 ≤ 𝐿 𝑓 𝑝 𝑉𝑘−1 + 𝐿 𝑓 𝑝𝐿ℎ 𝑢 ‖𝑢𝑘 − 𝑢𝑘−1 ‖

High-probability guarantees 18 For any window of length 𝑇, with
𝜂 = 𝒪(1/ 𝑇), for fixed 𝜏 ∈ (0,1) holds with probability at least 1 − 𝜏 same dependence on 𝑇 & benign scaling with 𝜏 , e.g. 𝜏 = 10−4 ⟹ lg 1 𝜏 = 4 second moment of gradient 1 𝑇 ෍ 𝑘=0 𝑇−1 ‖∇෩ Φ(𝑢𝑘 )‖2 = 𝒪 1 𝑇 1 + lg 1 𝜏 1 𝑇 ෍ 𝑘=0 𝑇−1 𝑊1 (𝜇𝑘 , 𝜇ss (𝑢𝑘 )) = 𝒪 1 𝑇1/4 1 + lg 1 𝜏 Wasserstein distance failure probability

Φ𝑁 (with 𝜇𝑘 𝑁) generalize to Φ (with 𝜇𝑘 )?
how will solutions found based on Generalization from a fixed sample distribution 19 true distribution empirical distribution Why? seeding trials restricted sampling (privacy, resource constraints) decision decision Question V.S. 𝜇𝑘 𝜇𝑘 𝑁 = 1 𝑁 ෍ 𝑖=1 𝑁 𝛿 𝑝𝑘 𝑖 𝑝0 1, … 𝑝0 𝑁: fixed sample distribution

Generalization with high probability 20 Can only access samples 𝑁
& optimize the empirical objective ෩ Φ𝑁 = σ 𝑖=1 𝑁 Φ(𝑢, 𝑝k 𝑖 )/𝑁 with a high probability solutions generalize well to the true objective ෩ Φ For any window of length 𝑇, any fixed 𝜏 ∈ (0,1) when 𝑁 ≥ 1 𝑐2 lg 2𝑐1 𝜏 & 𝜂 = 𝒪(1/ 𝑇) holds with probability at least 1 − 𝜏 size of exogeneous disturbances from 𝜇𝑘 𝑁 1 𝑇 ෍ 𝑘=0 𝑇−1 ‖∇෩ Φ(𝑢𝑘 𝑁)‖2 = 𝒪 1 𝑁 2 𝑟 lg 1 𝜏 2 𝑟 + 𝒪 1 𝑇 1 + lg 1 𝜏 true objective same rate as before

Affinity Maximization in a Polarized Population 21

Recall: polarized population model & affinity maximization 22 Goal: population-wide
affinity optimization Distributional steady state for large- scale population 𝜇ss (𝑢) = ℎ(𝑢,⋅)# 𝜇0 Model 𝑝𝑘+1 = normalize 𝜆 ⋅ 𝑝𝑘 + 1 − 𝜆 ⋅ 𝑝0 + 𝜎 ⋅ 𝑝𝑘 ⊤𝑢 𝑢 utility individual position ideology initial position biased assimilation 𝑝0 ∼ 𝜇0 𝑝0 sampled from initial distribution 𝜇0 𝜇ss 𝜇0 s. t. 𝜇ss (𝑢) = ℎ(𝑢,⋅)# 𝜇0 max ‖𝑢‖≤1 𝔼𝑝∼𝜇ss(𝑢) [𝑝⊤𝑢]

Optimality 23 anticipate & shape distribution dynamics significant improvement in
optimality vanilla stochastic gradient descent without anticipation our composite algorithm with anticipation & sensitivity estimation 𝑢∗

Histograms of position-decision angles 24 ground-truth optimum (via IPOPT) vanilla
gradient our composite algorithm higher population-wide affinities more acute angles close to histogram of the optimal solution many samples in [95°, 110°] lower affinity 𝑢 𝑝0 𝑝ss

Performance Optimization with Discrete Choice Distributions 25

Dynamics of preference distribution & recommender system 26 choice distribution
smoothened price with softmax −𝜖𝑢 ≈ min(𝑢) Model for users’ preference 𝑝 for a set of items (e.g., books) initial distribution of user population Goal of recommender system: maximize gains & preserve diversity gain from selling items aligned with preference diversification by entropy regularization 𝑝𝑘+1 = 𝜆1 ⋅ 𝑝𝑘 + 𝜆2 ⋅ softmax(−𝜖𝑢) + (1 − 𝜆1 − 𝜆2 ) ⋅ 𝑝0 max 𝑢 𝑝⊤𝑢 + 𝜌 ෍ 𝑖=1 𝑚 𝑝𝑖 log 𝑝𝑖 𝑝 = ℎ(𝑢, 𝑝0 ) 0 ≤ 𝑢𝑖 ≤ 𝑢 s. t. 1⊤𝑢 = 𝑢𝑡𝑜𝑡𝑎𝑙 = + steady-state choice distribution budget constraints read by user recommended to user similar books

Optimality & distributional convergence 27 ((𝑢∗)) anticipate & shape distribution
dynamics significant improvement in optimality vanilla stochastic gradient descent without anticipation our composite algorithm with anticipation & sensitivity estimation derivative-free gradient in feedback à la RL

Conclusions distribution shift is amenable to control go with the
flow ! online stochastic algorithm distribution perspective convergence optimality: expectation + high-prob. generalization for fixed samples 28 adapt to & shape distributions Future work: stochastic constraints, game setup, & model-free

Z. He 29 S. Bolognani M. Muehlebach arXiv GitHub random
variable 𝑝 decision-making 𝜇𝑘 decision dependence dynamics 𝜇𝑘+1 min 𝑢 𝔼𝑝∼𝜇 [Φ(𝑢; 𝑝)]

Population Dynamics Meet Network Optimization

Population Dynamics Meet Network Optimization

Florian Dörfler

More Decks by Florian Dörfler

Featured

Transcript

Population Dynamics Meet Network Optimization Florian Dörfler A Decision-Dependent Stochastic

Decision-making in probability spaces The state / decision-variable is a

Preview example: Affinity Maximization in a Polarized Population 3 opportunistic

𝑝𝑘+1 = normalize 𝜆 ⋅ 𝑝𝑘 + 1 − 𝜆

Modeling in probability space 5 Distributional modeling for large-scale population

s. t. 𝜇ss (𝑢) = ℎ(𝑢,⋅)# 𝜇0 max ‖𝑢‖≤1 𝔼𝑝∼𝜇ss(𝑢)

7 Feedback loop non-stationary distributions unknown & nonlinear dynamics convergence

Contents & take-home messages 8 ① distribution dynamics ③ finite-sample

9 (re)optimize stochastic decision-making deciding on 𝑢 𝜇𝑘 𝜇𝑘+1 Broader

s.t. 𝜇ss (𝑢) = ℎ(𝑢,⋅)# 𝜇𝑑 min 𝑢 𝔼𝑝∼𝜇ss 𝑢

Ingredient: online feedback optimization 11 optimization problem s.t. 𝑝 =

12 Can we copy this recipe here ? Ideally evaluate

∇෩ Φ(𝑢𝑘 ) = 𝔼(𝑝,𝑑)∼𝛾ss(𝑢𝑘) [∇𝑢 Φ(𝑢𝑘 , 𝑝) +

𝑊1 𝜇, 𝜈 = inf 𝛾∈Γ 𝜇,𝜈 න 𝒳×𝒴 𝑐

Contraction of distribution shifts under decisions 15 → contracting under

Optimality in expectation 16 For any window of length 𝑇,

Population convergence 17 For any window of length 𝑇, with

High-probability guarantees 18 For any window of length 𝑇, with

Φ𝑁 (with 𝜇𝑘 𝑁) generalize to Φ (with 𝜇𝑘 )?

Generalization with high probability 20 Can only access samples 𝑁

Affinity Maximization in a Polarized Population 21

Recall: polarized population model & affinity maximization 22 Goal: population-wide

Optimality 23 anticipate & shape distribution dynamics significant improvement in

Histograms of position-decision angles 24 ground-truth optimum (via IPOPT) vanilla

Performance Optimization with Discrete Choice Distributions 25

Dynamics of preference distribution & recommender system 26 choice distribution

Optimality & distributional convergence 27 ((𝑢∗)) anticipate & shape distribution

Conclusions distribution shift is amenable to control go with the

Z. He 29 S. Bolognani M. Muehlebach arXiv GitHub random