Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Bernhard Schmitzer (Georg-August-Universität Gö...

Jia-Jie Zhu
March 16, 2024
150

Bernhard Schmitzer (Georg-August-Universität Göttingen, Germany) Entropic Transfer Operators for Data-driven Analysis of Dynamical Systems

WORKSHOP ON OPTIMAL TRANSPORT
FROM THEORY TO APPLICATIONS
INTERFACING DYNAMICAL SYSTEMS, OPTIMIZATION, AND MACHINE LEARNING
Venue: Humboldt University of Berlin, Dorotheenstraße 24

Berlin, Germany. March 11th - 15th, 2024

Jia-Jie Zhu

March 16, 2024
Tweet

More Decks by Jia-Jie Zhu

Transcript

  1. Entropic transfer operators for data-driven analysis of dynamical systems Bernhard

    Schmitzer joint work with Oliver Junge, Daniel Matthes
  2. Analyzing dynamical systems Challenge systems are high-dimensional, stochastic, chaotic, .

    . . Relevant questions separate slow from spurious dynamics find low-dimensional effective coordinates identify (almost) invariant subsystems Objective inference from data / simulations Images: wikipedia/Lorenz_system; wikipedia/Alanine; Stephan Weiss, MPI DS, Göttingen 2 / 25
  3. Overview 1. Dynamical systems and transfer operator 2. Optimal transport

    3. Entropic transfer operators 4. Examples and discussion 3 / 25
  4. Overview 1. Dynamical systems and transfer operator 2. Optimal transport

    3. Entropic transfer operators 4. Examples and discussion 4 / 25
  5. Dynamical systems Time-discrete deterministic dynamical system state space X, compact

    metric space update map F : X → X, for simplicity: F continuous xt+1 = F(xt ) Remarks time-continuous systems can be treated by integrating flow can be extended to stochastic dynamics xt+1 ∼ κxt ∈ P(X) Challenge systems of interest often high-dimensional, stochastic, chaotic ⇒ gain little insight from studying individual trajectories seek simplified, coarse-grained, effective description: cyclic behaviour, almost-invariant regions, fast and slow coordinates 5 / 25
  6. Transfer operator Evolution of point ensembles assume: at time t

    points distributed according to xt ∼ µt ∈ P(X) distribution at time t + 1: xt+1 ∼ µt+1 = F#µt Eµt [ϕ] = X ϕ(x) dµt (x) Eµt+1 [ϕ] = X ϕ(F(x)) dµt (x) = X ϕ dF# µt Transfer operator T : P(X) → P(X), µ → F#µ linear operator represents dynamical system at level of distributions adjoint Koopman operator: ϕ → ϕ ◦ F often interested in invariant measures: Tµ = µ restriction to densities: T : Lp(µ) → Lp(F#µ) less complex spaces ⇒ spectral analysis, recover dominant dynamics 6 / 25
  7. Simple examples: discrete Markov chains µt+1 = Tµt, µ0 =

    k αk · φk , µt = k αk · λt k · φk 1 2 3 0.9 0.94 0.94 0.05 0.05 0.01 φk = 1 1 1 0 −1 1 −2 1 1 λk = 1 0.93 0.85 1 2 3 4 0.74 0.74 0.74 0.74 0.25 0.25 0.01 0.01 φk = 1 1 1 1 −1 −1 1 1 −1 1 −1 1 1 −1 −1 1 λk = 1 0.98 0.5 0.48 7 / 25
  8. Estimation form data Ulam’s method reference measure m ∈ M+

    (X), partition X = i Xi (m-essentially disjoint), reduced space X := {X1, . . . , XN} Markov matrix P over X: Pi,j := m(Xj ∩F−1(Xi )) m(Xj ) , estimate by sampling slow convergence if support of m is high-dimensional Modern variants Markov state models reaction coordinates, transition manifolds, . . . Estimate adjoint Koopman operator K basis functions (ψ1, . . . , ψM ) : X → R, estimate K in subspace spanned by (ψa )a , based on samples (xi , yi = F(xi ))N i=1 ψa (yi ) = (Kψa )(xi ) ≈ b Ka,bψb (xi ) least squares approximation for coefficients Ka,b : min K i,a ψa (yi ) − b Ka,bψb (xi ) 2 wide variety of choices for (ψa )a , dictionary learning, kernel methods, . . . ⇒ Koopmanism 8 / 25
  9. Overview 1. Dynamical systems and transfer operator 2. Optimal transport

    3. Entropic transfer operators 4. Examples and discussion 9 / 25
  10. Kantorovich formulation of optimal transport µ ν π Transport plans

    Γ(µ, ν) := {γ ∈ M+ (X × X) : P1♯ γ = µ, P2♯ γ = ν} marginals: P1♯ γ(A) := γ(A × X), P2♯ γ(B) := γ(X × B) Optimal transport C(µ, ν) := inf X×X c(x, y) dγ(x, y) γ ∈ Γ(µ, ν) = sup X f dµ + X g dν f ∈ C(X), g ∈ C(X), f ⊕ g ≤ c cost function c ∈ C(X × X) for moving unit mass from x to y Wasserstein distance on probability measures P(X) Wp (µ, ν) := (C(µ, ν))1/p for c(x, y) := d(x, y)p, p ∈ [1, ∞) 10 / 25
  11. Entropic regularization Cε (µ, ν) := inf X×X c(x, y)

    dγ(x, y) + ε · KL(γ|µ ⊗ ν) γ ∈ Γ(µ, ν) = sup X f dµ + X g dν −ε X×X exp f ⊕g−c ε −1d(µ⊗ν) f ∈ C(X), g ∈ C(X) Unique minimizer γε = gε · (µ ⊗ ν), gε(x, y) = exp([f (x) + g(y) − c(x, y)]/ε) f (x) = −ε log X exp((g − c(x, ·))/ε) dν optimal dual variables: f , g inherit modulus of continuity of c Hs regularity for X ⊂ Rd : ∥f ∥Hs =O(1 + 1/εs−1) ⇒ more reliable estimation from empirical measures Numerical solution with Sinkhorn algorithm extremely simple (some caveats), efficient implementations, generalizes to unbalanced transport ε : 10−1 10−2 10−3 10−4 πε: 11 / 25
  12. Overview 1. Dynamical systems and transfer operator 2. Optimal transport

    3. Entropic transfer operators 4. Examples and discussion 12 / 25
  13. Entropic transfer operators I Problem statement input: observed pairs (xi

    , yi = F(xi ))N i=1 xi ∼ µ, µ: invariant measure µN:= 1 N N i=1 δxi , νN:= 1 N N i=1 δyi , µN, νN ∗ ⇀ µ as N → ∞ goal: estimate (approximate) transfer operator T : L2(µ) → L2(µ) Naive first proposal: TN : L2(µN) → L2(νN), TN1xi = 1yi usually µN ̸= νN, TN not endomorphism TN is identity matrix in canonical bases {1xi }i , {1yi }i ⇒ no useful information, need to map back from L2(νN) to L2(µN) θ 13 / 25
  14. Entropic transfer operators II Operators induced by transport plans γ

    ∈ Γ(νN, µN) induces operator G : L2(νN) → L2(µN) via: ⟨ϕ, Gψ⟩L2(µN ) = X ϕ (Gψ)dµN := X×X ϕ(y) ψ(x) dγ(x, y) discrete case: matrix rep. of G given by that of γ ‘Closing’ TN : L2(µN) → L2(νN) let γN be optimal W2 plan from νN to µN, induced operator GN composition GN ◦ TN : L2(µN) → L2(µN) product of two permutation matrices, spectrum dominated by combinatorial artefacts when T non-compact, do not expect convergence GN ◦ TN → T θ 14 / 25
  15. Entropic transfer operators III Remedy: use entropic transport γN,ε =

    argmin γ∈Γ(νN ,µN ) X×X d2 dγ + ε · KL(γ|νN ⊗ µN) ⟨ϕ, GN,εψ⟩L2(µN ) = X ϕ (GN,εψ)dµN := X×X ϕ(y) ψ(x) dγN,ε(x, y) γN,ε = gN,ε · µN ⊗ νN, gN,ε = exp((f ⊕ g − d2)/ε) TN,ε := GN,ε ◦ TN : L2(µN) → L2(µN) intuition: blur / stochastic perturbation at length scale √ ε construction also works on population measure µ: Gε : L2(µ) → L2(µ), Tε := Gε ◦ T θ 15 / 25
  16. Entropic transfer operators IV Recall: entropic smoothing of empirical operator

    TN,ε := GN,ε ◦ TN : L2(µN) → L2(µN) Analogous smoothing of T Gε : L2(µ) → L2(µ) by optimal entropic plan between µ and itself Tε := Gε ◦ T, Hilbert–Schmidt operator (compact) Embedding L2(µN) into L2(µ) induced by (optimal) transport plan γN ∈ Γ(µN, µ) Main result ˆ TN,ε → Tε in Hilbert–Schmidt norm as µN ∗ ⇀ µ main step: gN,ε → gε uniformly in C(X × X) γN,ε = gN,ε · (νN ⊗ µN), γε = gε · (µ ⊗ µ) Corollary: convergence of eigenvalues and eigenvectors F γε µ F# µ µ T Gε γN∗ F γN,ε γN µ µN F# µN µN µ TN GN,ε 16 / 25
  17. Overview 1. Dynamical systems and transfer operator 2. Optimal transport

    3. Entropic transfer operators 4. Examples and discussion 17 / 25
  18. Analytical example: d-torus Original operator T X = Rd /Zd

    , d-torus; F : x → x + θ, θ ∈ Rd eigenbasis of T: for k ∈ Zd vector φk (x) = exp(2πik⊤x), value λk = exp(−2πik⊤θ) Smoothed operator Tε = Gε ◦ T vector φε k = φk , value λε k ≈ exp(−π2ε∥k∥2) · λk for small ε Gε acts approximately like diffusion kernel, time step ∆t ∝ ε ⇒ spectrum of Tε good approximation of T for eigenvectors with length scale 1/∥k∥ above blur scale √ ε: √ ε ≪ 1/∥k∥ ⇔ ε∥k∥2 ≪ 1 ⇒ λε k ≈ λk Discretization N = nd , (xi )N i=1 : uniform Cartesian lattice, n points along each axis TN,ε = GN,ε ◦ TN vector φN,ε k = φk , value λN,ε k ≈ λε k if 1 n ≪ √ ε ≪ 1 ∥k∥ 18 / 25
  19. Numerical examples: unit circle Regular discretization, θ = 1/3 X

    = R/Z, F : x → x + 1/3 discretized with N = 1000 regular points spectra of TN,ε for ε = 10{−6,−5,−4,−3,−2} 1.0 0.5 0.0 0.5 1.0 1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00 1.0 0.5 0.0 0.5 1.0 1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00 1.0 0.5 0.0 0.5 1.0 1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00 1.0 0.5 0.0 0.5 1.0 1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00 1.0 0.5 0.0 0.5 1.0 1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00 Random discretization histograms for spectra, random points, 100 realizations 19 / 25
  20. Numerical examples: unit circle II Regular discretization, θ = 1/π

    X = R/Z, F : x → x + 1/π discretized with N = 1000 regular points spectra of TN,ε for ε = 10{−5,−4,−3,−2,−1} 1.0 0.5 0.0 0.5 1.0 1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00 1.0 0.5 0.0 0.5 1.0 1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00 1.0 0.5 0.0 0.5 1.0 1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00 1.0 0.5 0.0 0.5 1.0 1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00 1.0 0.5 0.0 0.5 1.0 1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00 Random discretization histograms for spectra, random points, 100 realizations 20 / 25
  21. Numerical examples: Lorenz system Spectra: N = 2000 and N

    = 4000 10 3 10 2 10 1 100 101 0.80 0.85 0.90 0.95 1.00 eigenvalues 10 3 10 2 10 1 100 101 0.80 0.85 0.90 0.95 1.00 eigenvalues Signs of two ‘largest’ eigenfunctions: N = 4000, ε = 1 21 / 25
  22. Numerical examples: mini Markov example, length scales 1 2 3

    0.9 0.94 0.94 0.05 0.05 0.01 10 1 10 1 100 101 102 103 0.5 0.6 0.7 0.8 0.9 1.0 1.1 eigenvalues 22 / 25
  23. Numerical examples: alanine dipeptide Alanine dipeptide small biomolecule, 10 atoms,

    X = R30, metastable conformal states below: (real) spectra of TN,ε for N = 5, 000 and 10, 000 point pairs 10 6 10 5 10 4 10 3 10 2 10 1 0.80 0.85 0.90 0.95 1.00 eigenvalues 10 6 10 5 10 4 10 3 10 2 10 1 0.80 0.85 0.90 0.95 1.00 eigenvalues Spectral clustering recovers conformal states spectral clustering at ε = 10−2, against (known) conformal angles 23 / 25
  24. Comparison: nothing new under the sun? Ulam’s method partition X

    = i Xi , discrete transition rates Pi,j := µ(Xj ∩F−1(Xi )) µ(Xj ) finding appropriate Xi in high dimensions is difficult entropic transfer operator is mesh-free, non-parametric, complexity controlled by ε (work in progress) Gaussian perturbations Tε Gauss := Gε Gauss ◦ T, Gε Gauss : Gaussian blur at scale √ ε perturbs invariant measure of T, full support restrict Gaussian to spt µ still perturbs invariant measure Diffusion maps, graph Laplacians bi-stochastic normalization [Marshall and Coifman, 2019] RKHS embedding embed xi , yi = F(xi ) into RKHS, k(x, y) = ⟨Φ(x), Φ(y)⟩ = exp(−c(x, y)/ε) (regularized) least squares regression problem for linear operator on span of {Φ(xi )}i , {Φ(yi )}i TRKHS is not a Markov operator, role of ε much less clear 24 / 25
  25. Entropic transfer operators: summary Main ingredients transfer operator for dynamical

    system analysis optimal transport Entropic transfer operators: first impression new method for estimating transfer operator fully data driven, only parameter: blur scale √ ε trade-off: nr. of samples ⇔ resolution of analysis mesh free, seems to work in high dimensions (high ≈ 30) OT theory provides framework for analysis Future work extension to stochastic systems out-of-sample embedding quantitative convergence analysis interpretation of Gε as (approximate) diffusion relation to other dimensionality reduction methods applications 25 / 25
  26. References I Nicholas F Marshall and Ronald R Coifman. Manifold

    learning with bi-stochastic kernels. IMA Journal of Applied Mathematics, 2019. doi: 10.1093/imamat/hxy065. 27 / 25