Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Interpolating between Optimal Transport and KL ...

Viktor Stein
September 12, 2024

Interpolating between Optimal Transport and KL regularized Optimal Transport using Rényi divergences

Regularized optimal transport (OT) has received much attention in recent years starting from Cuturi's paper with Kullback-Leibler (KL) divergence regularized OT. In this paper, we propose to regularize the OT problem using the family of α-Rényi divergences for α∈(0,1). Rényi divergences are neither f-divergences nor Bregman distances, but they recover the KL divergence in the limit α↗1. The advantage of introducing the additional parameter α is that for α↘0 we obtain convergence to the unregularized OT problem. For the KL regularized OT problem, this was achieved by letting the regularization parameter tend to zero, which causes numerical instabilities. We present two different ways to obtain premetrics on probability measures, namely by Rényi divergence constraints and penalization. The latter premetric interpolates between the unregularized and KL regularized OT problem with weak convergence of the minimizer, generalizing the interpolating property of KL regularized OT. We use a nested mirror descent algorithm for solving the primal formulation. Both on real and synthetic data sets Rényi regularized OT plans outperform their KL and Tsallis counterparts in terms of being closer to the unregularized transport plans and recovering the ground truth in inference tasks better.

This is joint work with Jonas Bresch (TU Berlin) available at https://arxiv.org/abs/2404.18834.

Viktor Stein

September 12, 2024
Tweet

More Decks by Viktor Stein

Other Decks in Research

Transcript

  1. Interpolating between Optimal Transport & KL regularized Optimal Transport with

    Rényi Divergences joint work with Jonas Bresch, TU Berlin University of South Carolina, Columbia, 12.09.2024. Graduate Colloquium (Alec Helm, Jonah Klein).
  2. Motivation - Deficiencies of KL regularized OT Optimal transport (OT)

    ⇝ distance on probability measures via transport plan. Problem: O(N3) for N samples. Solution: Entropic OT (Cuturi, NeurIPS’13): add ε times KL-regularizer to OT problem for ε > 0. Sinkhorn algorithm ⇝ O(N1+ 1 d ln(N)). Problem in practice: need ε very small to get accurate plan, but ⇝ numerical instabilities. Our solution: Add instead ε times different (=α-Rényi) regularizer and let α ↘ 0 instead of ε ↘ 0. Viktor Stein Interpolating between OT and KL-reg. OT with Rényi divergences September 12th, 2024 2
  3. 1. Tsallis divergence and α- Rényi divergence https://upload.wikimedia.org/wikipedia/commons/thumb/e/e2/Constantino_Tsallis_February_2010.jpg/800px-Constantino_Tsallis_February_2010.jpg, https://repository.aip.org/islandora/object/nbla:310720 2.

    Optimal transport and its regularization dsweber2.github.io/Optimal-Transport-Information-Geometry/ 3. Rényi regularized OT c⊥ M(X × X) Π(µ, ν) Πα γ (µ, ν) µ ⊗ ν πc πα,γ c πα,γ→∞ c πα→0,γ c 4. Interpolation properties OT ⟨c, ·⟩ OTε,α OT OTε α↘0 ε→∞ ε↘0 α↗1 5. Solving Rényi regularized OT with mirror descent 6. Numerical results 0 20 40 0 20 40 KL-reg 0.0000 0.0025 0.0050 0.0075 0 20 40 0 20 40 Rényi-reg 0.00 0.02 0.04 0 20 40 0 20 40 Tsallis-reg 0.000 0.005 0.010 0.015 0 20 40 0 20 40 EMD 0.000 0.025 0.050 0.075 0 20 40 0 20 40 abs-diff(Rényi, EMD) 0.00 0.01 0.02 0.03 0 20 40 0 20 40 KL-reg 0.002 0.004 0 20 40 0 20 40 Rényi-reg 0.01 0.02 0 20 40 0 20 40 Tsallis-reg 0.000 0.005 0.010 0 20 40 0 20 40 EMD 0.00 0.01 0.02 0.03 0 20 40 0 20 40 abs-diff(Rényi, EMD) 0.005 0.010
  4. q-Tsallis divergence and α-Rényi divergence Definition (α-Rényi divergence) The α-Rényi

    divergence of order α ∈ (0, 1) is Rα : P(X) × P(X) → [0, ∞], (µ | ν) → 1 α − 1 ln X ρµ (x) ρν (x) α dν(x) . where for σ ∈ P(X), ρσ is the density w.r.t. 1 2 (µ + ν), and ln(0) := −∞. Muzellec et. al (AAAI 2017) examine Tsallis-regularized OT. Definition (q-Tsallis divergence) The q-Tsallis divergence of order q > 0, q ̸= 1, is Tq = 1 q − 1 [exp ((q − 1)Rq ) − 1] : P(X) × P(X) → [0, ∞], (µ | ν) → 1 q − 1 X ρµ (x) ρν (x) q dν(x) − 1 Tsallis = 1st order approximation of Rényi since ln(y) ≈ y − 1 (1st order Taylor). Viktor Stein Interpolating between OT and KL-reg. OT with Rényi divergences September 12th, 2024 4
  5. Rényi versus Tsallis 0.0 0.2 0.4 0.6 0.8 1.0 0.0

    0.2 0.4 0.6 0.8 1.0 1.2 1.4 Renyi divergence from (p,1 p) to (0.25,0.75) 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 Tsalllis divergence from (p,1 p) to (0.25,0.75) 0.0 0.2 0.4 0.6 0.8 1.0 Theorem (Properties of the Rényi divergence) • Divergence property: Rα (µ | ν) ≥ 0 and Rα (µ | ν) = 0 if and only if µ = ν. • Rα is nondecreasing and continuous in α ∈ [0, 1] with limα↗1 Rα = KL pointwise. • Rα jointly convex, jointly weakly lower semicontinuous for α ∈ (0, 1]. Viktor Stein Interpolating between OT and KL-reg. OT with Rényi divergences September 12th, 2024 5
  6. 1. Tsallis divergence and α- Rényi divergence https://upload.wikimedia.org/wikipedia/commons/thumb/e/e2/Constantino_Tsallis_February_2010.jpg/800px-Constantino_Tsallis_February_2010.jpg, https://repository.aip.org/islandora/object/nbla:310720 2.

    Optimal transport and its regularization dsweber2.github.io/Optimal-Transport-Information-Geometry/ 3. Rényi regularized OT c⊥ M(X × X) Π(µ, ν) Πα γ (µ, ν) µ ⊗ ν πc πα,γ c πα,γ→∞ c πα→0,γ c 4. Interpolation properties OT ⟨c, ·⟩ OTε,α OT OTε α↘0 ε→∞ ε↘0 α↗1 5. Solving Rényi regularized OT with mirror descent 6. Numerical results 0 20 40 0 20 40 KL-reg 0.0000 0.0025 0.0050 0.0075 0 20 40 0 20 40 Rényi-reg 0.00 0.02 0.04 0 20 40 0 20 40 Tsallis-reg 0.000 0.005 0.010 0.015 0 20 40 0 20 40 EMD 0.000 0.025 0.050 0.075 0 20 40 0 20 40 abs-diff(Rényi, EMD) 0.00 0.01 0.02 0.03 0 20 40 0 20 40 KL-reg 0.002 0.004 0 20 40 0 20 40 Rényi-reg 0.01 0.02 0 20 40 0 20 40 Tsallis-reg 0.000 0.005 0.010 0 20 40 0 20 40 EMD 0.00 0.01 0.02 0.03 0 20 40 0 20 40 abs-diff(Rényi, EMD) 0.005 0.010
  7. Wasserstein-p metric space Let (X, d) metric space, with d

    lower semicontinuous. Let p ∈ [1, ∞), P(X) the set of probability measures. Pp (X) := µ ∈ P(X) : X d(x, x0 )p dµ(x) < ∞ , x0 ∈ X. On Pp (X), the Wasserstein-p metric is OT(µ, ν)p = min π∈Π(µ,ν) X×X d(x, y)p dπ(x, y), µ, ν ∈ Pp (X), where the transport polytope is Π(µ, ν) := {π ∈ P(X × X) : π(A × X) = µ(A), π(X × A) = ν(A) ∀A} https://dsweber2.github.io/Optimal-Transport-Information-Geometry/ The product measure µ ⊗ ν ∈ Π(µ, ν). Notation: ⟨f, µ⟩ := X f(x) dµ(x), so we can write OT(µ, ν)p = min{⟨dp, π⟩ : π ∈ Π(µ, ν)}. Viktor Stein Interpolating between OT and KL-reg. OT with Rényi divergences September 12th, 2024 7
  8. Cuturi’s Entropic Optimal Transport Regularizer: Kullback-Leibler divergence KL(· | µ

    ⊗ ν): Π(µ, ν) → [0, ∞), π → X×X ln dπ dµ ⊗ ν (x, y) dµ(x) dν(y) KL-regularized OT: OTε (µ, ν) := min π∈Π(µ,ν) ⟨dp, π⟩ + ε KL(π | µ ⊗ ν) = max f,g∈C(X) f ⊕ g − ε exp − 1 ε (f ⊕ g − dp) , µ ⊗ ν . (f ⊕ g)(x, y) := f(x) + g(y) for f, g ∈ C(X). Primal-dual relation: ˆ πε = exp ˆ f⊕ˆ g−dp ε · µ ⊗ ν Here, c = dp. ©G. Péyre, M. Cuturi, 2019 argmin {KL(π | µ ⊗ ν) : ⟨dp, π⟩ = OT(µ, ν)} ε↘0 ← − − − ˆ πε ε→∞ − − − → µ ⊗ ν OT(µ, ν) ε↘0 ← − − − OTε (µ, ν) ε→∞ − − − → ⟨dp, µ ⊗ ν⟩ Viktor Stein Interpolating between OT and KL-reg. OT with Rényi divergences September 12th, 2024 8
  9. Discretization and Sinkhorn algorithm Discretize X ≈ (xi )N i=1

    µ, ν ∈ P(X) become vectors r := (µ(xi ))N i=1 , c := (ν(xi ))N i=1 ∈ ΣN , where ΣN := x ∈ [0, 1]N : N i=1 xi = 1 . cost matrix: M := (d(xi , xj )p)N i,j=1 . transport polytope: Π(r, c) := {P ∈ ΣN×N : P 1N = r, P T 1N = c} ©T. Vayer. Optimal transport plan as KL-projection of Gibbs kernel ˆ P ε = argmin P ∈Π(r,c) KL P exp −M ε Sinkhorn algorithm finds this projection via matrix scaling. Viktor Stein Interpolating between OT and KL-reg. OT with Rényi divergences September 12th, 2024 9
  10. 1. Tsallis divergence and α- Rényi divergence https://upload.wikimedia.org/wikipedia/commons/thumb/e/e2/Constantino_Tsallis_February_2010.jpg/800px-Constantino_Tsallis_February_2010.jpg, https://repository.aip.org/islandora/object/nbla:310720 2.

    Optimal transport and its regularization dsweber2.github.io/Optimal-Transport-Information-Geometry/ 3. Rényi regularized OT c⊥ M(X × X) Π(µ, ν) Πα γ (µ, ν) µ ⊗ ν πc πα,γ c πα,γ→∞ c πα→0,γ c 4. Interpolation properties OT ⟨c, ·⟩ OTε,α OT OTε α↘0 ε→∞ ε↘0 α↗1 5. Solving Rényi regularized OT with mirror descent 6. Numerical results 0 20 40 0 20 40 KL-reg 0.0000 0.0025 0.0050 0.0075 0 20 40 0 20 40 Rényi-reg 0.00 0.02 0.04 0 20 40 0 20 40 Tsallis-reg 0.000 0.005 0.010 0.015 0 20 40 0 20 40 EMD 0.000 0.025 0.050 0.075 0 20 40 0 20 40 abs-diff(Rényi, EMD) 0.00 0.01 0.02 0.03 0 20 40 0 20 40 KL-reg 0.002 0.004 0 20 40 0 20 40 Rényi-reg 0.01 0.02 0 20 40 0 20 40 Tsallis-reg 0.000 0.005 0.010 0 20 40 0 20 40 EMD 0.00 0.01 0.02 0.03 0 20 40 0 20 40 abs-diff(Rényi, EMD) 0.005 0.010
  11. Rényi-regularized OT Definition (Rényi-regularized OT [Bresch, S. ’24]) The Rényi-regularized

    OT problem is OTε,α : Pp (X) × Pp (X) → [0, ∞), (µ, ν) → min π∈Π(µ,ν) ⟨c, π⟩ + εRα (π | µ ⊗ ν). Theorem (OTε,α is a pre-metric [Bresch, S. ’24]) Pp (X)2 ∋ (µ, ν) → 1[µ̸=ν] OTε,α (µ, ν) is a metric for α ∈ (0, 1), ε ∈ [0, ∞). Lemma (Monotonicity of Rényi regularized OT [Bresch, S. ’24]) Let µ, ν ∈ Pp (X), α, α′ ∈ (0, 1) and ε, ε′ ≥ 0 with α > α′ and ε < ε′. Then, we have OTε′,α (µ, ν) ≥ OTε,α (µ, ν) ≥ OTε,α′ (µ, ν). Viktor Stein Interpolating between OT and KL-reg. OT with Rényi divergences September 12th, 2024 11
  12. Theorem (Interpolation properties). ⟨d, µ ⊗ ν⟩ OT(µ, ν) OTε,α

    (µ, ν) OTε (µ, ν) some π ∈ Π(µ, ν) s.t. ⟨d, π⟩ = OT(µ, ν) µ ⊗ ν πε,α (µ, ν) argmin π∈Π(µ,ν), ⟨d,π⟩=OT(µ,ν) Rα (π | µ ⊗ ν) πε α↘0 or ε↘0 ε→∞ α↗1 α↘0 ε↘0 ε→∞ α↗1 Viktor Stein Interpolating between OT and KL-reg. OT with Rényi divergences September 12th, 2024 12
  13. 1. Tsallis divergence and α- Rényi divergence https://upload.wikimedia.org/wikipedia/commons/thumb/e/e2/Constantino_Tsallis_February_2010.jpg/800px-Constantino_Tsallis_February_2010.jpg, https://repository.aip.org/islandora/object/nbla:310720 2.

    Optimal transport and its regularization dsweber2.github.io/Optimal-Transport-Information-Geometry/ 3. Rényi regularized OT c⊥ M(X × X) Π(µ, ν) Πα γ (µ, ν) µ ⊗ ν πc πα,γ c πα,γ→∞ c πα→0,γ c 4. Interpolation properties OT ⟨c, ·⟩ OTε,α OT OTε α↘0 ε→∞ ε↘0 α↗1 5. Solving Rényi regularized OT with mirror descent 6. Numerical results 0 20 40 0 20 40 KL-reg 0.0000 0.0025 0.0050 0.0075 0 20 40 0 20 40 Rényi-reg 0.00 0.02 0.04 0 20 40 0 20 40 Tsallis-reg 0.000 0.005 0.010 0.015 0 20 40 0 20 40 EMD 0.000 0.025 0.050 0.075 0 20 40 0 20 40 abs-diff(Rényi, EMD) 0.00 0.01 0.02 0.03 0 20 40 0 20 40 KL-reg 0.002 0.004 0 20 40 0 20 40 Rényi-reg 0.01 0.02 0 20 40 0 20 40 Tsallis-reg 0.000 0.005 0.010 0 20 40 0 20 40 EMD 0.00 0.01 0.02 0.03 0 20 40 0 20 40 abs-diff(Rényi, EMD) 0.005 0.010
  14. Mirror descent Solve min x∈K f(x), where K ⊂ Rn

    compact. via the updates x(k+1) = argmin y∈K Dh y (∇h)−1 ∇h(x(k)) − ηk ∇f(x(k)) , x(0) ∈ K, ηk > 0, (1) for a convex function h: Rn → R with special properties. https://doi.org/10.48550/arXiv.2108.09489 Viktor Stein Interpolating between OT and KL-reg. OT with Rényi divergences September 12th, 2024 15
  15. Numerical Experiments - Better transport plans Choose K = ΣN

    (probability simplex), −h = Shannon entropy =⇒ Dh = KL. Rényi-regularized OT objective Π(r, c) → [0, ∞), P → ⟨M, P ⟩ + εRα (P | rcT). is not Lipschitz continuous, but locally Lipschitz on {P ∈ Π(r, c) : P |supp(r⊗c) > 0} = Π(c, r) ∩ RN >0 , which suffices for convergence of a mirror descent with special step size (ηk )k∈N (You, Li, 2022). In each iteration one KL projection onto ΣN (using Sinkhorn algorithm) is performed: P (k) ← Sinkhorn P (k−1) ⊙ exp −ηk M − ηk λ α α − 1 (rcT ⊘ P )1−α ⟨P α, (rcT)1−α⟩ ; r, c , k ∈ N . Viktor Stein Interpolating between OT and KL-reg. OT with Rényi divergences September 12th, 2024 16
  16. Rényi regularization yields more accurate plans Regularized OT plans for

    Gaussian (top) and Poisson (bottom) marginals with regularization parameter λ = 10, Rényi order α = 0.01, Tsallis order: q = 2. Viktor Stein Interpolating between OT and KL-reg. OT with Rényi divergences September 12th, 2024 17
  17. Numerical Experiments - Predicting voter migration regularizer, ε = 1

    abs error ± std KL error mean squared error KL 2.4221 × 101 ± 2.848 × 101 8.422 × 102 9.008 × 104 Tsallis 9.409 ± 1.529 × 101 3.173 × 102 2.063 × 104 OT 1.845 × 101 ± 2.358 × 101 7.655 × 102 5.738 × 104 3 10 -Renyi 6.611 ± 7.868 2.128 × 102 6.759 × 103 Viktor Stein Interpolating between OT and KL-reg. OT with Rényi divergences September 12th, 2024 18
  18. Conclusion − Contribution. Regularize optimal transport problem using the α-Rényi-divergences

    Rα for α ∈ (0, 1). Prove dual formulation and interpolation properties. − Prior work. Regularization with KL = limα↗1 Rα and with q-Tsallis divergence − Method. Solve primal problem with mirror descent and dual problem with subgradient descent. − Result. Rényi-regularized OT plans outperform KL / Tsallis regularized OT plans on real and synthetic data. − Novelty. Rα ̸∈ {f-divergence, Bregman divergence} and Rα not “separable” due to the logarithm. Viktor Stein Interpolating between OT and KL-reg. OT with Rényi divergences September 12th, 2024 19
  19. Thank you for your attention! I am happy to take

    any questions. Paper link: https://arxiv.org/abs/2404.18834 My website: https://viktorajstein.github.io [PC19, Cut13, MNPN17, vEH14, BT03, NY83, NS21, Rén61, Tsa88] Viktor Stein Interpolating between OT and KL-reg. OT with Rényi divergences September 12th, 2024 20
  20. References I [BT03] Amir Beck and Marc Teboulle, Mirror descent

    and nonlinear projected subgradient methods for convex optimization, Oper. Res. Lett. 31 (2003), no. 3, 167–175. [Cut13] Marco Cuturi, Sinkhorn distances: lightspeed computation of optimal transport, Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2 (Red Hook, NY, USA), NIPS’13, Curran Associates Inc., 2013, p. 2292–2300. [MNPN17] Boris Muzellec, Richard Nock, Giorgio Patrini, and Frank Nielsen, Tsallis regularized optimal transport and ecological inference, Proceedings of the AAAI conference on Artificial Intelligence (Hilton San Francisco, San Francisco, California, USA), vol. 31, 2017. [NS21] Sebastian Neumayer and Gabriele Steidl, From optimal transport to discrepancy, Handbook of Mathematical Models and Algorithms in Computer Vision and Imaging: Mathematical Imaging and Vision (2021), 1–36. [NY83] Arkadij Semenovič Nemirovskij and David Borisovich Yudin, Problem complexity and method efficiency in optimization, Wiley, New York, 1983. Viktor Stein Interpolating between OT and KL-reg. OT with Rényi divergences September 12th, 2024 1
  21. References II [PC19] Gabriel Peyré and Marco Cuturi, Computational optimal

    transport, Found. Trends Mach. Learn. 11 (2019), no. 5-6, 355–607. [Rén61] Alfréd Rényi, On measures of entropy and information, Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Contributions to the Theory of Statistics (Statistical Laboratory of the University of California, Berkeley, California, USA), vol. 4, University of California Press, 1961, pp. 547–562. [Tsa88] Constantino Tsallis, Possible generalization of boltzmann-gibbs statistics, Journal of statistical physics 52 (1988), 479–487. [vEH14] Tim van Erven and Peter Harremos, Rényi divergence and Kullback-Leibler divergence, IEEE Trans. Inf. Theory 60 (2014), no. 7, 3797–3820. Viktor Stein Interpolating between OT and KL-reg. OT with Rényi divergences September 12th, 2024 2
  22. Work in progress - Rényi-Sinkhorn Divergence OTε,α (µ, µ) ̸=

    0 To obtain valid, differentiable distance: Dε,α (µ, ν) := OTε,α (µ, ν) − 1 2 OTε,α (µ, µ) − 1 2 OTε,α (ν, ν). Can be used for gradient flows. Viktor Stein Interpolating between OT and KL-reg. OT with Rényi divergences September 12th, 2024 3 0 5 10 15 20 25 0.00 0.05 0.10 0.15 0.20
  23. First: A different way of regularizing For regularization parameter γ

    ∈ [0, ∞] and α ∈ (0, 1), the restricted transport polytope, Πα γ (µ, ν) := {π ∈ Π(µ, ν) : Rα (π | µ ⊗ ν) ≤ γ} , is weakly compact, since Rα (· | µ ⊗ ν) is weakly lsc. and Π(µ, ν) is weakly compact. Definition (Rényi-Sinkhorn distance) The Rényi-Sinkhorn distance between µ, ν ∈ Pp (X) is dγ,α : Pp (X) × Pp (X) → R, (µ, ν) → min ⟨dp, π⟩1 p : π ∈ Πα γ (µ, ν) . (2) Theorem (Bresch, S. ’24) • For (µ, ν) ∈ Pp (X), the optimization problem (2) is convex and has a unique minimizer. • Pp (X)2 ∋ (µ, ν) → 1[µ̸=ν] (µ, ν)dγ,α (µ, ν) is a metric for α ∈ (0, 1), γ ∈ [0, ∞]. Viktor Stein Interpolating between OT and KL-reg. OT with Rényi divergences September 12th, 2024 4
  24. c⊥ M(X × X) Π(µ, ν) Πα γ (µ, ν)

    µ ⊗ ν πc πα,γ c Transport polytope Π(µ, ν), restricted transport polytope Πα γ (µ, ν) for c = dp. (Plot inspired by (Cuturi, 2013).) Viktor Stein Interpolating between OT and KL-reg. OT with Rényi divergences September 12th, 2024 5
  25. The Dual Point of View - Penalizing the Constraint Instead

    of restricting the problems domain, penalize the Rényi divergence constraint in (2). Definition (Dual Rényi-Divergence-Sinkhorn distance) The dual Rényi-Divergence-Sinkhorn distance for α ∈ (0, 1), ε ∈ [0, ∞) is dα,ε : Pp (X) × Pp (X) → R, (µ, ν) → ⟨dp, πα,ε(µν)⟩1 p , where πα,ε(µ, ν) ∈ argmin {⟨dp, π⟩ + εRα (π | µ ⊗ ν) : π ∈ Π(µ, ν)} . (3) Theorem (Lagrangian point of view and pre-metric [Bresch, S. ’24]) Let (µ, ν) ∈ Pp (X). • The optimization problem (3) is convex and has a unique minimizer. • Rényi-Sinkhorn dγ,α (µ, ν) and dual Rényi-Sinkhorn dα,λ(µ, ν) are equivalent: for γ > 0, there exists ε ∈ [0, ∞], such that ⟨dp, πα,ε(µ, ν)⟩ = dγ,α (µ, ν)p. Viktor Stein Interpolating between OT and KL-reg. OT with Rényi divergences September 12th, 2024 6
  26. Dual formulation, Representation of πα,λ From now on: X compact.

    The dual space of all finite signed Borel measures on X, M(X), is C(X), the space of real-valued continuous functions on X. Recall (f ⊕ g)(x, y) := f(x) + g(y). Theorem (Dual problem, dual representation [Bresch, S. ’24]) We have the strong duality OTε,α (µ, ν) = max f,g∈C(X) f⊕g≤d ⟨f ⊕ g, µ ⊗ ν⟩ − ε ln (d − f ⊕ g) α α−1 , µ ⊗ ν + Cα,ε . (4) The optimal dual potentials ˆ f, ˆ g ∈ C(X) from (4) are unique supp(µ ⊗ ν)-a.e. up to additive constants and the unique optimal plan is πα,ε ∝ (d − ˆ f ⊕ ˆ g) 1 α−1 · (µ ⊗ ν). Proof idea. Use Fenchel-Rockafellar theorem, extend objective to M(X) × M(X) by ∞. Viktor Stein Interpolating between OT and KL-reg. OT with Rényi divergences September 12th, 2024 7