Laetitia Chapel (Université de Bretagne-Sud & Institut Agro, France) Fast Optimal Transport through Sliced Generalized Wasserstein Geodesics
WORKSHOP ON OPTIMAL TRANSPORT
FROM THEORY TO APPLICATIONS
INTERFACING DYNAMICAL SYSTEMS, OPTIMIZATION, AND MACHINE LEARNING
Venue: Humboldt University of Berlin, Dorotheenstraße 24
with Guillaume Mahey, Gilles Gasso, Clément Bonet and Nicolas Courty NeurIPS 2023 [5] Laetitia Chapel [email protected] IRISA, Rennes, France Institut Agro Rennes-Angers Workshop on Optimal Transport: from theory to applications, Berlin 2024
Wasserstein distance Transport map and Wasserstein Geodesics Curvature of the Wasserstein space Wasserstein Generalized Geodesic Computational Optimal Transport Sliced Wasserstein Generalized Geodesic SWGG with a PWD-like formulation SWGG with a Generalized Geodesic formulation Experimental results Computational aspects Gradient flows Pan sharpening / image colorization Point cloud matchings Optimal transport dataset distances Conclusion Bibliography L. Chapel ·Fast OT through SWGG ·Workshop on Optimal Transport: from theory to applications, Berlin 2024 2 / 24
Wasserstein distance Optimal transport and Wasserstein distance OT (µ1 , µ2 ) ≜ inf γ∈Γ(µ1,µ2) X×Y c(x, y) dγ(x, y) where Γ(µ1 , µ2 ) def = {γ ∈ M+ (X × Y) s.t. (πx)# γ = µ1 and (πy)# γ = µ2 } with πx : X × Y → X. Linear loss Marginal constraints L. Chapel ·Fast OT through SWGG ·Workshop on Optimal Transport: from theory to applications, Berlin 2024 3 / 24
Wasserstein distance Optimal transport and Wasserstein distance OT (µ1 , µ2 ) ≜ inf γ∈Γ(µ1,µ2) X×Y c(x, y) dγ(x, y) where Γ(µ1 , µ2 ) def = {γ ∈ M+ (X × Y) s.t. (πx)# γ = µ1 and (πy)# γ = µ2 } with πx : X × Y → X. Linear loss Marginal constraints µ1 µ2 γi,j > 0 and (πy)# γ = µ2 with (πx)# γ = µ1 The transport plan γ(x, y) specifies for each pair (x, y) how many particles go from x to y Wasserstein distance when c(x, y) = |x − y|p Wp(µ1 , µ2 ) ≜ inf γ∈Γ(µ1,µ2) X×Y c(x, y) dγ(x, y) 1/p L. Chapel ·Fast OT through SWGG ·Workshop on Optimal Transport: from theory to applications, Berlin 2024 3 / 24
Wasserstein Geodesics In some cases, the optimal plan γ∗ is a Monge map of the form (Id, T)#µ1 , e.g. for p = 2 Wp p (µ1 , µ2 ) ≜ inf T ∥x − T(x)∥2 2 dµ1 (x) where T is a transport map and T# µ1 = µ2 µ1 µ2 L. Chapel ·Fast OT through SWGG ·Workshop on Optimal Transport: from theory to applications, Berlin 2024 4 / 24
Wasserstein Geodesics In some cases, the optimal plan γ∗ is a Monge map of the form (Id, T)#µ1 , e.g. for p = 2 Wp p (µ1 , µ2 ) ≜ inf T ∥x − T(x)∥2 2 dµ1 (x) where T is a transport map and T# µ1 = µ2 x T(x) µ1 T(x) = µ2 Defines for each particle located at x what is its destination T(x) L. Chapel ·Fast OT through SWGG ·Workshop on Optimal Transport: from theory to applications, Berlin 2024 4 / 24
Wasserstein Geodesics In some cases, the optimal plan γ∗ is a Monge map of the form (Id, T)#µ1 , e.g. for p = 2 Wp p (µ1 , µ2 ) ≜ inf T ∥x − T(x)∥2 2 dµ1 (x) where T is a transport map and T# µ1 = µ2 Wasserstein geodesics µ1→2(t) ≜ (tT1→2 + (1 − t)Id)#µ1 with T1→2 the optimal map For short, we denote µ1→2 for t = 0.5 L. Chapel ·Fast OT through SWGG ·Workshop on Optimal Transport: from theory to applications, Berlin 2024 4 / 24 µ1 µ2
Geodesics Has been introduced by Ambrosio et al. [1] Wasserstein Geodesic: µ1→2(t) ≜ (t T1→2 +(1 − t)Id)#µ1 Wasserstein Generalized Geodesic: µ1→2 g (t) ≜ (t Tν→µ2 +(1 − t) Tν→µ1 )#ν for ν a pivot measure. L. Chapel ·Fast OT through SWGG ·Workshop on Optimal Transport: from theory to applications, Berlin 2024 6 / 24
For µ1 = n i=1 hi δxi and µ2 = m j=1 gj δyj and a quadratic cost, we solve W2 2 (µ1 , µ2 ) ≜ minγ∈Γ(µ1,µ2) i,j c(xi , yj )γi,j → linear solvers with O(n3 log(n)) complexity L. Chapel ·Fast OT through SWGG ·Workshop on Optimal Transport: from theory to applications, Berlin 2024 7 / 24
For µ1 = n i=1 hi δxi and µ2 = m j=1 gj δyj and a quadratic cost, we solve W2 2 (µ1 , µ2 ) ≜ minγ∈Γ(µ1,µ2) i,j c(xi , yj )γi,j → linear solvers with O(n3 log(n)) complexity When µ1 and µ2 are 1D distributions and n = m with uniform masses, the solution is given by W2 2 (µ1 , µ2 ) ≜ 1 n n i=1 (xσ(i) − yτ(i) )2 → the optimal transport plan respects the ordering of the elements xσ(i−1) ≤ xσ(i) and yτ(i−1) ≤ yτ(i) , complexity O(n log(n)) and O(n + n log(n)) for computing the distance xσ(1) yτ(1) xσ(2) yτ(2) xσ(3) yτ(3) xσ(4) yτ(4) xσ(5) yτ(5) xσ(6) yτ(6) L. Chapel ·Fast OT through SWGG ·Workshop on Optimal Transport: from theory to applications, Berlin 2024 7 / 24
1D, the middle of the geodesic can be easily computed (xσ(i) + yτ(i) )/2 And when we take the pivot measure ν to be the middle of the geodesic µ1→2, we have W2 2 (µ1 , µ2 ) = W2 ν (µ1 , µ2 ) = 2W2 2 (µ1 , ν) + 2W2 2 (ν, µ2 ) xσ(1) +yτ(1) 2 xσ(2) +yτ(2) 2 xσ(3) +yτ(3) 2 xσ(4) +yτ(4) 2 xσ(5) +yτ(5) 2 xσ(6) +yτ(6) 2 L. Chapel ·Fast OT through SWGG ·Workshop on Optimal Transport: from theory to applications, Berlin 2024 8 / 24
1. Slice the distribution along lines θ ∈ Sd−1 2. Project µ1 and µ2 onto θ: Pθ # µ, with Pθ : Rd → R, x → ⟨x, θ⟩ 3. Compute 1d Wasserstein onto the projected samples in 1d 4. Average all the distances SW2 2 (µ1 , µ2 ) ≜ Sd−1 W2 2 (Pθ # µ1 , Pθ # µ2 )dω(θ), with ω uniform distribution on Sd−1. µ1 µ2 Pθ # µ1 = x, θ Pθ # µ2 = y, θ θ ∈ Sd−1 → provides a lower bound of W2 2 (µ1 , µ2 ) with complexity O(Ln + Ln log(n)), L number of lines L. Chapel ·Fast OT through SWGG ·Workshop on Optimal Transport: from theory to applications, Berlin 2024 9 / 24
Rd 1. Slice the distribution along lines θ ∈ Sd−1 2. Project µ1 and µ2 onto θ: Pθ # µ, with Pθ : Rd → R, x → ⟨x, θ⟩ 3. Compute Rd Wasserstein onto the permutations obtained by sorting the projections 4. Average all the distances (mettre un theta en indice dans les sigma) PWD2 2 (µ1 , µ2 ) ≜ Sd−1 1 n n i=1 xσθ(i) − yτθ(i) 2 2 dω(θ), with ω uniform distribution on Sd−1. µ1 µ2 σ τ θ ∈ Sd−1 → provides an upper bound of W2 2 (µ1 , µ2 ) with complexity O(Ln d +Ln log(n)), L number of lines L. Chapel ·Fast OT through SWGG ·Workshop on Optimal Transport: from theory to applications, Berlin 2024 10 / 24
SWGG with a PWD-like formulation 1. Slice the distribution along lines θ ∈ Sd−1 2. Project µ1 and µ2 onto θ: Pθ # µ, with Pθ : Rd → R, x → ⟨x, θ⟩ 3. Compute Rd Wasserstein onto the permutations obtained by sorting the projections 4. Take the minimum over all the distances SWGG2 2 (µ1 , µ2 , θ) ≜ 1 n n i=1 xσθ(i) − yτθ(i) 2 2 , min-SWGG2 2 (µ1 , µ2 ) ≜ min θ∈Sd−1 SWGG2 2 (µ1 , µ2 , θ) µ1 µ2 σ τ θ ∈ Sd−1 L. Chapel ·Fast OT through SWGG ·Workshop on Optimal Transport: from theory to applications, Berlin 2024 11 / 24
SWGG with a PWD-like formulation Properties of min-SWGG It comes with a transport map: let θ∗ be the optimal projection direction T(xi ) = y τ−1 θ∗ (σθ∗ (i)) , ∀1 ≤ i ≤ n. It is an upper bound of W and a lower bound of PWD W2 2 ≤ min-SWGG2 2 ≤ PWD2 2 and W2 2 = min-SWGG2 2 when d > 2n [2] Complexity O(Lnd + Ln log(n)) with L number of lines The Monte-Carlo search over the L lines is effective in low dimension only → how to design gradient descent techniques for finding θ∗? → further properties, such as sample complexity? L. Chapel ·Fast OT through SWGG ·Workshop on Optimal Transport: from theory to applications, Berlin 2024 12 / 24
Geodesic SWGG with a Generalized Geodesic formulation 1. Slice the distribution along lines θ ∈ Sd−1 2. Project µ1 and µ2 onto θ: Qθ # µ, with Qθ : Rd → Rd, x → θ⟨x, θ⟩ 3. Define the pivot measure ν to be the Wasserstein mean of the measure Qθ # µ1 and Qθ # µ2 ν = µ1→2 θ ≜ arg min µ W2 2 (Qθ # µ1 , µ) + W2 2 (µ, Qθ # µ2 ) 4. Take the minimum over all the following distances SWGG2 2 (µ1 , µ2 , θ) = 2W2 2 (µ1 , µ1→2 θ ) + 2W2 2 (µ1→2 θ , µ2 ) − 4W2 2 (µ1→2 g,θ , µ1→2 θ ) µ1 µ2 µ1→2 g,θ Qθ # µ1 Qθ # µ2 ν = µ1→2 θ θ ∈ Sd−1 → the two formulations are equivalent (for continuous or discrete distributions) L. Chapel ·Fast OT through SWGG ·Workshop on Optimal Transport: from theory to applications, Berlin 2024 13 / 24
Geodesic SWGG with a Generalized Geodesic formulation Why this reformulation? Define a gradient descent algorithm for optimizing over θ Rewrite the problem as an OT formulation with a restricted constraint set Define new properties for SWGG Properties of min-SWGG Weak convergence Translation invariance SWGG is equal to W when one of the distributions (µ2 ) is supported on a line of direction θ: W2 2 (µ1 , µ2 ) = W2 2 (µ1 , Qθ # µ1 ) + W2 2 (Qθ # µ1 , µ2 ) that can be computed with a closed form L. Chapel ·Fast OT through SWGG ·Workshop on Optimal Transport: from theory to applications, Berlin 2024 14 / 24
Geodesic SWGG with a Generalized Geodesic formulation OT with a restricted constraint set Discrete optimal transport, with n = m and uniform masses W2 2 (µ1 , µ2 ) = minγ∈Γ(µ1,µ2) i,j c(xi , yj )γi,j where Γ(µ1 , µ2 ) = {γ ∈ Rn×n s.t. γ1n = 1n /n, γ⊤1n = 1n /n} (Birkhoff polytope). min-SWGG min-SWGG2 2 (µ1 , µ2 ) = minγθ∈Π(µ1,µ2) i,j c(xi , yj )γθi,j where Π(µ1 , µ2 ) = {γθ ∈ Rn×n s.t. it is constructed from the permutahedron of the proj. distributions} 0 50 100 150 200 250 300 Dimension 1.025 1.050 1.075 1.100 1.125 1.150 1.175 1.200 1.225 Ratio min − SWGG W2 2 for n = 50 0 50 100 150 200 250 300 Dimension 0 200 400 600 800 1000 1200 1400 log10 number of permutations Number permutations n = 310 Number of permutation from a line n = 310 L. Chapel ·Fast OT through SWGG ·Workshop on Optimal Transport: from theory to applications, Berlin 2024 16 / 24
Geodesic SWGG with a Generalized Geodesic formulation OT with a restricted constraint set Discrete optimal transport, with n = m and uniform masses W2 2 (µ1 , µ2 ) = minγ∈Γ(µ1,µ2) i,j c(xi , yj )γi,j where Γ(µ1 , µ2 ) = {γ ∈ Rn×n s.t. γ1n = 1n /n, γ⊤1n = 1n /n} (Birkhoff polytope). min-SWGG min-SWGG2 2 (µ1 , µ2 ) = minγθ∈Π(µ1,µ2) i,j c(xi , yj )γθi,j where Π(µ1 , µ2 ) = {γθ ∈ Rn×n s.t. it is constructed from the permutahedron of the proj. distributions} Π(µ1 , µ2 ) ⊂ Γ(µ1 , µ2 ) Gives a sample complexity similar to Sinkhorn n−1/2 measures lying on smaller dimensional subspaces has a better sample complexity than between the original measures L. Chapel ·Fast OT through SWGG ·Workshop on Optimal Transport: from theory to applications, Berlin 2024 16 / 24
distributions µ1 and µ2 10 1000 0 10 20 30 40 Distance W2 2 =32.4 10 1000 Number of projections 0 100 200 300 W2 2 =346.1 maxSW (optimized) SW (Monte Carlo) SWGG Optimized 10 1000 0 1000 2000 3000 4000 W2 2 =3836.0 d = 20 d = 2 d = 200 102 103 104 105 Number of samples in each distribution 10−3 10−2 10−1 100 101 102 103 Seconds SW, L =200 min-SWGG, L =200 min-SWGG optim Factored Coupling Wasserstein Sinkhorn max-SW SRW L. Chapel ·Fast OT through SWGG ·Workshop on Optimal Transport: from theory to applications, Berlin 2024 17 / 24
sharpening / image colorization, using the map One distribution is supported on a line Construct a super-resolution multi-chromatic satellite image from a high-resolution mono-chromatic image (source) and low-resolution multi-chromatic image (target) L. Chapel ·Fast OT through SWGG ·Workshop on Optimal Transport: from theory to applications, Berlin 2024 19 / 24
using the map Iterative Closest Point iterative algorithm for aligning point clouds Based on several one-to-one correspondences between points n 500 3000 150 000 NN 3.54 (0.02) 96.9 (0.30) 23.3 (59.37) OT 0.32 (0.18) 48.4 (58.46) · min-SWGG 0.05 (0.04) 37.6 (0.90) 6.7 (105.75) (the lower the better, timings into parenthesis) L. Chapel ·Fast OT through SWGG ·Workshop on Optimal Transport: from theory to applications, Berlin 2024 20 / 24
for Wasserstein comes with an associated transport map has a O(Lnd + n log(n)) complexity has good statistical properties Not the only approximation method based on a pivot measure Factored coupling [4], where ν = arg minµ∈P(Rk) W2 2 (µ, µ1) + W2 2 (µ, µ1) Exact OT Source samples Target samples Factored OT Template samples HROT (exact) HROT (thresholded) Partial OT Subspace detours [6], where ν = arg minν∈P(Rd) W2 2 (PE # µ1, ν) + W2 2 (ν, PE # µ2) Some open questions how do the Birkhoff polytope and the considered permutahedron relate? concentration results? extension to incomparable spaces through a pivot measure? L. Chapel ·Fast OT through SWGG ·Workshop on Optimal Transport: from theory to applications, Berlin 2024 22 / 24
with Guillaume Mahey, Gilles Gasso, Clément Bonet and Nicolas Courty NeurIPS 2023 [5] Laetitia Chapel [email protected] IRISA, Rennes, France Institut Agro Rennes-Angers Workshop on Optimal Transport: from theory to applications, Berlin 2024
Savaré. Gradient flows: in metric spaces and in the space of probability measures. Springer Science & Business Media, 2005. [2] Thomas M Cover. “The number of linearly inducible orderings of points in d-space”. In: SIAM Journal on Applied Mathematics 15.2 (1967), pp. 434–439. [3] Jean Feydy. “Geometric data analysis, beyond convolutions”. PhD thesis. École Normale Supérieure de Cachan, 2020. [4] Aden Forrow et al. “Statistical optimal transport via factored couplings”. In: The 22nd International Conference on Artificial Intelligence and Statistics. PMLR. 2019, pp. 2454–2465. [5] Guillaume Mahey et al. “Fast Optimal Transport through Sliced Generalized Wasserstein Geodesics”. In: Advances in Neural Information Processing Systems 36 (2024). [6] Boris Muzellec and Marco Cuturi. “Subspace detours: Building transport plans that are optimal on subspace projections”. In: Advances in Neural Information Processing Systems 32 (2019). L. Chapel ·Fast OT through SWGG ·Workshop on Optimal Transport: from theory to applications, Berlin 2024 24 / 24