exact gradients of 𝔼𝑝∼𝜇ss 𝑢 [Φ(𝑢, 𝑝)] ∇෩ Φ 𝑢𝑘 = 𝔼𝑑∼𝜇𝑑 [∇Φ(𝑢𝑘 , ℎ(𝑢𝑘 , 𝑑))] = 𝔼𝑑∼𝜇𝑑 [∇𝑢 Φ(𝑢𝑘 , ℎ(𝑢𝑘 , 𝑑)) + ∇𝑢 ℎ 𝑢𝑘 , 𝑑 𝛻𝑝 Φ(𝑢𝑘 , 𝑝)|𝑝=ℎ(𝑢𝑘,𝑑) ] = 𝔼(𝑝,𝑑)∼𝛾ss(𝑢𝑘) [∇𝑢 Φ(𝑢𝑘 , 𝑝) + ∇𝑢 ℎ(𝑢𝑘 , 𝑑) ∇𝑝 Φ(𝑢𝑘 , 𝑝)] steady state induced by 𝑑 ∼ 𝜇𝑑 & 𝑝 ∼ ℎ(𝑢,⋅)# 𝜇𝑑 chain rule & law of total derivative conditions on Φ allow swapping ∇ & 𝔼 Challenges hard to evaluate 𝔼 (integral) no access to the steady state online decision-making! use current samples from 𝜇𝑘 = 𝔼𝑑∼𝜇𝑑 Φ 𝑢, ℎ 𝑢, 𝑑 ≜ ෩ Φ(𝑢)