Samuel Hurault

Convergence of Plug-and-Play algorithms for image inverse problems Samuel Hurault
Universit´ e de Bordeaux, ENS Paris PhD supervisors: Arthur Leclaire, Nicolas Papadakis Collaborations: Antonin Chambolle, Luca Calatroni, Charles Dossal, Ulugbek Kamilov, Thomas Moreau, Gabriel Peyre, Justin Solomon, Mathieu Terris 1 / 38

Image Inverse Problems Acquisition Copyright 2022 The Associated Press 2
/ 38

Image Inverse Problems Acquisition Inverse Problem Copyright 2022 The Associated
Press 2 / 38

Image Inverse Problems Degradation Inverse Problem 3 / 38

Image Inverse Problems Acquisition Inverse Problem Sources: Wikipedia, [Isella, ‘11]
4 / 38

Image Inverse Problems Acquisition Inverse Problem Source: Wikipedia 5 /
38

Image Inverse Problems x ∈ Rn y ∼ noise(Ax) ∈
Rm Acquisition / Degradation Inverse Problem 6 / 38

Supervised methods [Alder & ¨ Okten, ‘18, Bertocchi et. al
‘18, Zhang et. al ‘20] If access to a dataset (xi , yi ), state-of-the-art performance. Requires to retrain for each degradation A. 7 / 38

Variational formulation Estimate x from observation y ∼ noise(Ax) 8
/ 38

Variational formulation Estimate x from observation y ∼ noise(Ax) Estimation
x∗ must verify: Ax∗ “is close to” y. x∗ “looks like” a real image. 8 / 38

x∗ must verify: Ax∗ “is close to” y. x∗ “looks like” a real image. x∗ = arg min x∈Rn f(x) + g(x) 8 / 38

x∗ must verify: Ax∗ “is close to” y. x∗ “looks like” a real image. x∗ = arg min x∈Rn f(x) + g(x) Data-ﬁdelity ex: f(x) = 1 2 ||Ax − y||2 8 / 38

x∗ must verify: Ax∗ “is close to” y. x∗ “looks like” a real image. x∗ = arg min x∈Rn f(x) + g(x) Data-ﬁdelity ex: f(x) = 1 2 ||Ax − y||2 Regularization ex: Total Variation [Rudin et al. ‘92] Wavelet sparsity [Mallat ‘09] Deep prior [Bora et al. ‘17] 8 / 38

x∗ must verify: Ax∗ “is close to” y. x∗ “looks like” a real image. x∗ = arg min x∈Rn f(x) + g(x) Data-ﬁdelity ex: f(x) = 1 2 ||Ax − y||2 Regularization ex: Total Variation [Rudin et al. ‘92] Wavelet sparsity [Mallat ‘09] Deep prior [Bora et al. ‘17] Probabilistic formulation Estimate x from observation y ∼ p(y|x) 8 / 38

x∗ must verify: Ax∗ “is close to” y. x∗ “looks like” a real image. x∗ = arg min x∈Rn f(x) + g(x) Data-ﬁdelity ex: f(x) = 1 2 ||Ax − y||2 Regularization ex: Total Variation [Rudin et al. ‘92] Wavelet sparsity [Mallat ‘09] Deep prior [Bora et al. ‘17] Probabilistic formulation Estimate x from observation y ∼ p(y|x) Maximum A-Posteriori x∗ = arg max x∈Rn p(x|y) = arg min x∈Rn − log p(y|x) − log p(x) 8 / 38

x∗ must verify: Ax∗ “is close to” y. x∗ “looks like” a real image. x∗ = arg min x∈Rn f(x) + g(x) Data-ﬁdelity ex: f(x) = 1 2 ||Ax − y||2 Regularization ex: Total Variation [Rudin et al. ‘92] Wavelet sparsity [Mallat ‘09] Deep prior [Bora et al. ‘17] Probabilistic formulation Estimate x from observation y ∼ p(y|x) Maximum A-Posteriori x∗ = arg max x∈Rn p(x|y) = arg min x∈Rn − log p(y|x) − log p(x) Gaussian noise: f(x) = 1 2 ||Ax − y||2 Poisson noise: f(x) = KL(y, Ax) 8 / 38

x∗ must verify: Ax∗ “is close to” y. x∗ “looks like” a real image. x∗ = arg min x∈Rn f(x) + g(x) Data-ﬁdelity ex: f(x) = 1 2 ||Ax − y||2 Regularization ex: Total Variation [Rudin et al. ‘92] Wavelet sparsity [Mallat ‘09] Deep prior [Bora et al. ‘17] Probabilistic formulation Estimate x from observation y ∼ p(y|x) Maximum A-Posteriori x∗ = arg max x∈Rn p(x|y) = arg min x∈Rn − log p(y|x) − log p(x) Gaussian noise: f(x) = 1 2 ||Ax − y||2 Poisson noise: f(x) = KL(y, Ax) g(x) = − log p(x) p(x) 8 / 38

Heuristics of Plug-and-Play methods [Venkatakrishnan et al. ‘13] Estimate x
from observation y ∼ noise(Ax) Decouple optimization over f and g. [Combette & Pesquet ‘11] [Zoran & Weiss ‘11] 9 / 38

from observation y ∼ noise(Ax) Decouple optimization over f and g. [Combette & Pesquet ‘11] [Zoran & Weiss ‘11] Optimization step over the regularizer g ←− Gaussian Denoising → Easy Inverse problem A = Id, performant deep denoisers [Zhang et al. ‘17,‘19,‘21] [Song et al. ‘19] → g(Denoiser(x)) < g(x) (Implicit Regularization) 9 / 38

from observation y ∼ noise(Ax) Decouple optimization over f and g. [Combette & Pesquet ‘11] [Zoran & Weiss ‘11] Optimization step over the regularizer g ←− Gaussian Denoising → Easy Inverse problem A = Id, performant deep denoisers [Zhang et al. ‘17,‘19,‘21] [Song et al. ‘19] → g(Denoiser(x)) < g(x) (Implicit Regularization) From y ∼ noise(Ax), iterate: 1. Denoising step 2. Decrease the data-ﬁdelity term f. “One denoiser to solve them all!” [Chang et. al] 9 / 38

Presentation Outline 1. Introduction to Plug-and-Play methods How can an
image Denoiser be used to regularize an inverse problem? 2. Convergence of Plug-and-Play algorithms Can suﬃcient conditions on the Denoiser be found to ensure that Plug-and-Play algorithms converge while maintaining performance? 3. Bregman Generalization and inverse problems with Poisson noise How can these schemes be applied to less regular data ﬁdelity terms? 4. DeepInverse library 10 / 38

Introduction to Plug-and-Play methods 11 / 38

First-order optimization Find arg minx∈Rn F(x) Gradient Descent (explicit): xk+1
= (Id −τ∇F)(xk ) Implicit Gradient Descent / Proximal Point Algorithm: xk+1 = xk − τ∇F(xk+1 ) ⇔ xk+1 = ProxτF (xk ) where ProxF (y) = arg min x∈Rn 1 2 ||x−y||2+F(x) 12 / 38

= (Id −τ∇F)(xk ) Implicit Gradient Descent / Proximal Point Algorithm: xk+1 = xk − τ∇F(xk+1 ) ⇔ xk+1 = ProxτF (xk ) where ProxF (y) = arg min x∈Rn 1 2 ||x−y||2+F(x) Find arg minx∈Rn f(x) + g(x) Gradient descent: xk+1 = (Id −τ∇(f + g))(xk ) Proximal Gradient Descent (PGD): xk+1 = Proxτg ◦ (Id −τ∇f)(xk ) Douglas-Rashford Splitting / ADMM: xk+1 = 1 2 Id + 1 2 (2Proxτg − Id) ◦ (2Proxτf − Id) 12 / 38

= (Id −τ∇F)(xk ) Implicit Gradient Descent / Proximal Point Algorithm: xk+1 = xk − τ∇F(xk+1 ) ⇔ xk+1 = ProxτF (xk ) where ProxF (y) = arg min x∈Rn 1 2 ||x−y||2+F(x) Find arg minx∈Rn f(x) + g(x) Gradient descent: xk+1 = (Id −τ∇(f + g))(xk ) Proximal Gradient Descent (PGD): xk+1 = Proxτg ◦ (Id −τ∇f)(xk ) Douglas-Rashford Splitting / ADMM: xk+1 = 1 2 Id + 1 2 (2Proxτg − Id) ◦ (2Proxτf − Id) Involve ∇g or Proxτg but not g explicitly. 12 / 38

Denoising: an implicit regularization Estimate x from observation y =
x + ξ x ∼ p(x). ξ ∼ N(0, σ2 Id) Gaussian noise. = + 13 / 38

x + ξ x ∼ p(x). ξ ∼ N(0, σ2 Id) Gaussian noise. → Two theoretical denoisers: MMSE denoiser DMMSE σ (y) = Ex∼p(x|y) [x] MAP Denoiser DMAP σ (y) = arg max x p(x|y) 13 / 38

x + ξ x ∼ p(x). pσ = p ∗ Gσ with Gσ Gaussian pdf of variance σ2. ξ ∼ N(0, σ2 Id) Gaussian noise. → Two theoretical denoisers: MMSE denoiser DMMSE σ (y) = Ex∼p(x|y) [x] MAP Denoiser DMAP σ (y) = arg max x p(x|y) Tweedie: DMMSE σ = Id −σ2∇(− log pσ ) Bayes: DMAP σ = Prox−σ2 log p 13 / 38

x + ξ x ∼ p(x). pσ = p ∗ Gσ with Gσ Gaussian pdf of variance σ2. ξ ∼ N(0, σ2 Id) Gaussian noise. → Two theoretical denoisers: MMSE denoiser DMMSE σ (y) = Ex∼p(x|y) [x] MAP Denoiser DMAP σ (y) = arg max x p(x|y) Tweedie: DMMSE σ = Id −σ2∇(− log pσ ) Bayes: DMAP σ = Prox−σ2 log p → Given a real denoiser Dσ : Dσ ≈ DMMSE σ or Dσ ≈ DMAP σ Denoising ≈ explicit or implicit descent on g = − log p. 13 / 38

Small recap 1. g = − log p is the
ideal regularization 2. Minimize f + g requires access to ∇g or Prox g. 3. A Gaussian denoiser approximates ∇(− log p) or Prox− log p . 14 / 38

Plug-and-play algorithms Estimate arg minx∈Rn f(x) + g(x) with f
= − log p(y|.) and g = − log p PGD: xk+1 = Proxτf ◦(Id −τ∇g)(xk ) or xk+1 = Proxτg ◦ (Id −τ∇f)(xk ) unknown regularization g 15 / 38

= − log p(y|.) and g = − log p PGD: xk+1 = Proxτf ◦(Id −τ∇g)(xk ) or xk+1 = Proxτg ◦ (Id −τ∇f)(xk ) unknown regularization g MMSE Denoiser DMMSE σ = Id −σ2∇(− log pσ ) → Given a real denoiser Dσ : GradPnP algorithms (RED) [Romano et al., ‘16] ∇g ← Id −Dσ GradPnP-PGD: xk+1 = Proxτf ◦(τDσ + (1 − τ) Id)(xk ) 15 / 38

= − log p(y|.) and g = − log p PGD: xk+1 = Proxτf ◦(Id −τ∇g)(xk ) or xk+1 = Proxτg ◦ (Id −τ∇f)(xk ) unknown regularization g MMSE Denoiser DMMSE σ = Id −σ2∇(− log pσ ) MAP Denoiser DMAP σ = Prox−σ2 log p → Given a real denoiser Dσ : GradPnP algorithms (RED) [Romano et al., ‘16] ∇g ← Id −Dσ GradPnP-PGD: xk+1 = Proxτf ◦(τDσ + (1 − τ) Id)(xk ) ProxPnP algorithms [Venkatakrishnan et al., ‘13] Proxτg ← Dσ ProxPnP-PGD: xk+1 = Dσ ◦ (Id −τ∇f)(xk ) 15 / 38

Plug-and-play algorithms Estimate x from observation y ∼ p(y|x) MAP
/ 1st order optim. arg min x∈Rn f(x) + g(x) Converges. Unknown regularization g. Plug-and-Play Algorithms Implicit regularization. Dσ ≈ Id −∇g Dσ ≈ Proxτg 16 / 38

/ 1st order optim. arg min x∈Rn f(x) + g(x) Converges. Unknown regularization g. Plug-and-Play Algorithms Implicit regularization. State-of-the-art performance. Dσ ≈ Id −∇g Dσ ≈ Proxτg 16 / 38

/ 1st order optim. arg min x∈Rn f(x) + g(x) Converges. Unknown regularization g. Plug-and-Play Algorithms Implicit regularization. State-of-the-art performance. Dσ ≈ Id −∇g Dσ ≈ Proxτg DPIR [Zhang et al. ‘21] Parameters τ, σ PSNR(xk , x∗) ||xk+1−xk|| ||xk|| 16 / 38

/ 1st order optim. arg min x∈Rn f(x) + g(x) Converges. Unknown regularization g. Plug-and-Play Algorithms Implicit regularization. State-of-the-art performance. No convergence guarantees. Dσ ≈ Id −∇g Dσ ≈ Proxτg For a generic denoiser (e.g.: a deep neural net): Dσ = Id −∇g and Dσ = Proxτg . 16 / 38

/ 1st order optim. arg min x∈Rn f(x) + g(x) Converges. Unknown regularization g. Plug-and-Play Algorithms Implicit regularization. State-of-the-art performance. No convergence guarantees. Dσ ≈ Id −∇g Dσ ≈ Proxτg For a generic denoiser (e.g.: a deep neural net): Dσ = Id −∇g and Dσ = Proxτg . DPIR [Zhang et al. ‘21] + 500 iterations with ﬁxed τ and σ. Parameters τ, σ PSNR(xk , x∗) ||xk+1−xk|| ||xk|| 16 / 38

/ 1st order optim. arg min x∈Rn f(x) + g(x) Converges. Unknown regularization g. Plug-and-Play Algorithms Implicit regularization. State-of-the-art performance. No convergence guarantees. No minimized functional. Dσ ≈ Id −∇g Dσ ≈ Proxτg For a generic denoiser (e.g.: a deep neural net): Dσ = Id −∇g and Dσ = Proxτg . 16 / 38

/ 1st order optim. arg min x∈Rn f(x) + g(x) Converges. Unknown regularization g. Plug-and-Play Algorithms Implicit regularization. State-of-the-art performance. No convergence guarantees. No minimized functional. Dσ ≈ Id −∇g Dσ ≈ Proxτg For a generic denoiser (e.g.: a deep neural net): Dσ = Id −∇g and Dσ = Proxτg . Objectif: Find suﬃcient conditions on the denoiser to ensure convergence of PnP algorithms. 16 / 38

Convergence of Plug-and-Play algorithms 17 / 38

Existing works: ﬁxed-point convergence Show that PnP converges towards x∗
∈ Fix(TP nP ). xk+1 = Proxτf ◦ (τDσ + (1 − τ) Id)(xk ) xk+1 = Dσ ◦ (Id −τ∇f)(xk ) Show that TP nP is non-expansive: Dσ non-expansive [Sun et al. ’19, Terris et al. ’20, Hertrich et al. ‘21, Liu et al. ’21, Bohra et al. ‘21] ||Dσ (x) − Dσ (y)|| ≤ ||x − y|| A performant denoiser is not non-expansive. Why ? The optimal MAP and MMSE denoisers are non-expansive iif − log p is convex. Dσ expansive and data-ﬁdelity term f strongly convex [Ryu et al., ‘19] Excludes numerous problems 18 / 38

PnP algorithms Estimate x from observation y ∼ p(y|x) MAP
/ 1st order optim. arg min x∈Rn f(x) + g(x) Converges. Unknown regularization g. Plug-and-Play algorithms Implicit regularization. State-of-the-art performance. No convergence guarantees. No minimized functional. Fixed-point convergence x∗ ∈ Fix(TP nP ) Convergence. Sub-optimal performance . No minimized functional. Dσ ≈ Id −∇g Dσ ≈ Proxτg Dσ non-expansive 19 / 38

Convergence via minimization Prove that PnP algorithms converge towards x∗
∈ arg minx f(x) + g(x). For a generic denoiser (e.g.: a deep neural network): Dσ = Id −∇g et Dσ = Proxg . Solution: Make PnP real optimization algorithms by guaranteeing exactly GradPnP algorithms Dσ = Id −∇g ProxPnP algorithms Dσ = Proxg 20 / 38

Convergence via minimization Prove that PnP algorithms converge towards x∗
∈ arg minx f(x) + g(x). For a generic denoiser (e.g.: a deep neural network): Dσ = Id −∇g et Dσ = Proxg . Solution: Make PnP real optimization algorithms by guaranteeing exactly GradPnP algorithms Dσ = Id −∇g ] ProxPnP algorithms Dσ = Proxg 20 / 38

Gradient-Step Denoiser [H., Leclaire, Papadakis, ‘21] Gradient-Step Denoiser (GS) Dσ
= Id −∇gσ 21 / 38

= Id −∇gσ With the non-convex potential: gσ (x) = 1 2 ||x − Nσ (x)||2 Nσ : Rn → Rn UNet with diﬀerentiable activations [Zhang et al. ‘21] Dσ (x) = x − ∇ 1 2 ||x − Nσ (x)||2 = Nσ (x) + JNσ (x)T (x − Nσ (x)) 21 / 38

= Id −∇gσ With the non-convex potential: gσ (x) = 1 2 ||x − Nσ (x)||2 Trained by minimizing the L2 cost No degradation of denoising performance σ(./255) 15 25 50 UNet 33.90 31.25 28.00 GS-UNet 33.90 31.26 28.01 PSNR on the CBSD68 dataset 21 / 38

= Id −∇gσ With the non-convex potential: gσ (x) = 1 2 ||x − Nσ (x)||2 Trained by minimizing the L2 cost No degradation of denoising performance σ(./255) 15 25 50 UNet 33.90 31.25 28.00 GS-UNet 33.90 31.26 28.01 PSNR on the CBSD68 dataset Provides an explicit regularization gσ The MMSE is optimal for the L2 cost: Dσ = Id −∇gσ ≈ DMMSE σ = Id −∇(−σ2 log pσ ) gσ ≈ −σ2 log pσ + const 21 / 38

Nonconvex convergence of GradPnP algorithms [H., Leclaire, Papadakis, ‘21] With
GS denoiser, GradPnP algorithms converge towards a critical point of f + λgσ . ex: GradPnP-PGD xk+1 = Proxτf ◦(τλDσ + (1 − τλ) Id)(xk ) = Proxτf ◦(Id −τλ∇gσ )(xk ) gσ is non-convex. gσ has Lipschitz gradient Assumptions (i) f convex, proper, l.s.c (ii) ∇gσ L-Lipschitz (L estimated via stepsize backtracking) (iii) f + λgσ coercive and veriﬁes the Kurdyka-Lojasiewicz (KL) property (gσ analytic) Then, for τ < 2λ L , (xk ) converges towards a critical point of f + λgσ . [Attouch et al., ‘13] 22 / 38

Deblurring (a) ||xk+1−xk|| ||xk|| (b) F(xk ) − F∗ (c)
PSNR(xk ) 23 / 38

24 / 38

Experimental results Deblurring Super-resolution Input noise level 0.01 0.03 0.05
0.01 0.03 0.05 Bicubic - - - 24.85 23.96 22.79 EPLL [Zoran & Weiss, ‘11] 27.34 25.16 24.04 - - - IRCNN [Zhang et al., ‘17] 31.42 28.01 26.40 26.97 25.86 25.45 DPIR [Zhang et al., ‘21] 31.93 28.30 26.82 27.79 26.58 25.83 GradPnP-PGD 31.70 28.28 26.86 27.88 26.81 26.01 PSNR (dB) for deblurring and super-resolution tasks, averaged over the CBSD68 dataset and over a variety of blur kernels. Convergence + State-of-the-art performance among Plug-and-Play methods. 25 / 38

Convergence of PnP algorithms via minimization Prouve that PnP algorthms
converge towards x∗ ∈ arg minx f(x) + g(x). For a generic denoiser (e.g.: a deep neural network): Dσ = Proxτg . Solution: Make PnP algorithms real optimization algorithms by guaranteeing exactly GradPnP algorithms Dσ = Id −∇g ProxPnP algorithms Dσ = Proxg 26 / 38

Proximal Denoiser[H., Leclaire, Papadakis, ‘22] Dσ = Id −∇gσ =
∇hσ with hσ = 1 2 ||x||2 − gσ [Gribonval et Nikolova, ‘20] If ∇gσ = Id −Dσ is (L ≤ 1)-Lipschitz ⇒ Dσ = ∇hσ with hσ convex. ⇒ ∃φσ : Rn → R ∪ {+∞} L L+1 weakly convex s.t. Dσ = Proxφσ ∀x ∈ Im(Dσ ), φσ (x) = h∗ σ (x) − 1 2 ||x||2 27 / 38

How to ensure Lip(∇gσ ) ≤ 1 ? [H., Leclaire,
Papadakis, ‘22] Proximal Denoiser Dσ = Id −∇gσ = ∇hσ = Proxφσ Convex parametrization of hσ . [Amos et al., ‘17] [Goujon et al., ‘23] Suboptimal performance. Training cost penalization [Pesquet et. al, ‘21]: Ex∼p,y∼N (x,σ2 Id) ||Dσ (y) − x||2 + µ max(||∇2gσ (y)||S , 1 − ) σ(./255) 5 15 25 GS-UNet 1.26 1.96 3.27 Prox-UNet 0.92 0.99 0.96 maxx ||∇2gσ (x + ξσ )||S on the CBSD68 dataset σ(./255) 5 15 25 GS-UNet 40.26 33.90 31.26 Prox-UNet 40.12 33.60 30.82 PSNR on the CBSD68 dataset 28 / 38

Convergence of ProxPnP algorithms [H., Leclaire, Papadakis, ‘22, ‘23] With
Prox denoiser, ProxPnP algorithms converge towards a critical point of f + λφσ . ex: ProxPnP-PGD xk+1 = Dσ ◦ (Id − 1 λ ∇f)(xk ) = Proxφσ ◦ (Id − 1 λ ∇f)(xk ) ! Dσ = Proxφσ = Proxτφσ ⇒ Requires to keep a stepsize τ = 1. Hypothesis (i) f convex, diﬀerentiable, with Lf -Lipschitz gradient. (ii) φσ M-weakly convex. (iii) f + λφσ coercive and veriﬁes the Kurdyka-Lojasiewicz (KL) condition. Then, for 1 λ Lf + M < 2, (xk ) converges towards a critical point of f + λφσ . 29 / 38

Plug-and-Play algorithms Estimate x from observation y ∼ p(y|x) MAP
/ 1st order optim. arg min x∈Rn f(x) + g(x) Converging algorithms. Uknown regularization g. Plug-and-Play algorithms Implicit regularization. State-of-the-art performance. No convergence guarantees. No minimized functional. Fixed-point convergence x∗ ∈ Fix(TP nP ) Convergence. No minimized functional. Sub-optimal performance. GradPnP convergence x∗ ∈ arg min x∈Rn f(x) + gσ (x) ProxPnP convergence x∗ ∈ arg min x∈Rn f(x) + φσ (x) Convergence. Optimize an explicit functional. State-of-the-art performance. Dσ ≈ Id −∇g Dσ ≈ Proxτg Dσ non-expansive Dσ = Id −∇gσ Dσ = Proxφσ 30 / 38

Bregman generalization and inverse problems with Poisson noise. [H., Kamilov,
Leclaire, Papadakis, ‘23] 31 / 38

Poisson inverse problems If the observation noise is Poisson, y
∼ P(Ax), the data-fidelity term is f(x) = − log p(y|x) = KL(y, Ax) = n i=1 yi log yi Axi − yi + Axi ∇f non-Lipschitz. [Bauschke et. al, ‘17] For Burg’s entropy h(x) = − i log(xi ) and L ≥ ||y||1 : (Relative smoothness) Lh − f convex on int dom(h). Generalizes the Lipschitz gradient assumption: for h(x) = 1 2 ||x||2, implies ∇f L-Lipschitz. h defines a new geometry adapted to the data-fidelity term f: Euclidian distance 1 2 ||x − y||2 replaced by the Bregman divergence Dh (x, y) = h(x) − h(y) − ∇h(y), x − y 32 / 38

Bregman potential Euclidian 1 2 ||x||2 h : Rn →
R strictly convex, C2 1Dh (x, y) = h(x) − h(y) − ∇h(y), x − y 33 / 38

R strictly convex, C2 Distance 1 2 ||x − y||2 Bregman Divergence Dh (x, y) 1 Smoothness ∇f L-Lipschitz Lh − f convex 1Dh (x, y) = h(x) − h(y) − ∇h(y), x − y 33 / 38

R strictly convex, C2 Distance 1 2 ||x − y||2 Bregman Divergence Dh (x, y) 1 Smoothness ∇f L-Lipschitz Lh − f convex Proximal operator Proxf = arg minx f(x) + 1 2 ||x − y||2 Proxh f = arg minx f(x) + Dh (x, y) 1Dh (x, y) = h(x) − h(y) − ∇h(y), x − y 33 / 38

R strictly convex, C2 Distance 1 2 ||x − y||2 Bregman Divergence Dh (x, y) 1 Smoothness ∇f L-Lipschitz Lh − f convex Proximal operator Proxf = arg minx f(x) + 1 2 ||x − y||2 Proxh f = arg minx f(x) + Dh (x, y) Gradient Desc. (GD) Id −τ(∇f + ∇g) ∇h∗(∇h − τ(∇f + ∇g)) Prox. Grad. Desc. (PGD) Proxτg ◦ (Id −τ∇f) Proxh τg ◦ ∇h∗(∇h − τ∇f) [H., Kamilov, Leclaire, Papadakis, ‘23] 1Dh (x, y) = h(x) − h(y) − ∇h(y), x − y 33 / 38

R strictly convex, C2 Distance 1 2 ||x − y||2 Bregman Divergence Dh (x, y) 1 Smoothness ∇f L-Lipschitz Lh − f convex Proximal operator Proxf = arg minx f(x) + 1 2 ||x − y||2 Proxh f = arg minx f(x) + Dh (x, y) Gradient Desc. (GD) Id −τ(∇f + ∇g) ∇h∗(∇h − τ(∇f + ∇g)) Prox. Grad. Desc. (PGD) Proxτg ◦ (Id −τ∇f) Proxh τg ◦ ∇h∗(∇h − τ∇f) [H., Kamilov, Leclaire, Papadakis, ‘23] Noise model p(y|x) Gaussian ∝ exp − 1 σ2 Dh (x, y) 1Dh (x, y) = h(x) − h(y) − ∇h(y), x − y 33 / 38

R strictly convex, C2 Distance 1 2 ||x − y||2 Bregman Divergence Dh (x, y) 1 Smoothness ∇f L-Lipschitz Lh − f convex Proximal operator Proxf = arg minx f(x) + 1 2 ||x − y||2 Proxh f = arg minx f(x) + Dh (x, y) Gradient Desc. (GD) Id −τ(∇f + ∇g) ∇h∗(∇h − τ(∇f + ∇g)) Prox. Grad. Desc. (PGD) Proxτg ◦ (Id −τ∇f) Proxh τg ◦ ∇h∗(∇h − τ∇f) [H., Kamilov, Leclaire, Papadakis, ‘23] Noise model p(y|x) Gaussian ∝ exp − 1 σ2 Dh (x, y) MAP denoiser Prox−σ2 log p (y) Proxh −σ2 log p (y) MMSE denoiser (Tweedie) y + σ2∇ log pσ (y) y + σ2∇2h(y)−1 · ∇ log pσ (y) 1Dh (x, y) = h(x) − h(y) − ∇h(y), x − y 33 / 38

R strictly convex, C2 Distance 1 2 ||x − y||2 Bregman Divergence Dh (x, y) 1 Smoothness ∇f L-Lipschitz Lh − f convex Proximal operator Proxf = arg minx f(x) + 1 2 ||x − y||2 Proxh f = arg minx f(x) + Dh (x, y) Gradient Desc. (GD) Id −τ(∇f + ∇g) ∇h∗(∇h − τ(∇f + ∇g)) Prox. Grad. Desc. (PGD) Proxτg ◦ (Id −τ∇f) Proxh τg ◦ ∇h∗(∇h − τ∇f) [H., Kamilov, Leclaire, Papadakis, ‘23] Noise model p(y|x) Gaussian ∝ exp − 1 σ2 Dh (x, y) MAP denoiser Prox−σ2 log p (y) Proxh −σ2 log p (y) MMSE denoiser (Tweedie) y + σ2∇ log pσ (y) y + σ2∇2h(y)−1 · ∇ log pσ (y) GradPnP algorithms: GD with ... ∇g ← Id −Dσ ∇g(y) ← ∇2h(y) · (Id −Dσ )(y) ProxPnP algorithms: PGD with ... Proxτg ← Dσ Proxh τg ← Dσ 1Dh (x, y) = h(x) − h(y) − ∇h(y), x − y 33 / 38

R strictly convex, C2 Distance 1 2 ||x − y||2 Bregman Divergence Dh (x, y) 1 Smoothness ∇f L-Lipschitz Lh − f convex Proximal operator Proxf = arg minx f(x) + 1 2 ||x − y||2 Proxh f = arg minx f(x) + Dh (x, y) Gradient Desc. (GD) Id −τ(∇f + ∇g) ∇h∗(∇h − τ(∇f + ∇g)) Prox. Grad. Desc. (PGD) Proxτg ◦ (Id −τ∇f) Proxh τg ◦ ∇h∗(∇h − τ∇f) [H., Kamilov, Leclaire, Papadakis, ‘23] Noise model p(y|x) Gaussian ∝ exp − 1 σ2 Dh (x, y) MAP denoiser Prox−σ2 log p (y) Proxh −σ2 log p (y) MMSE denoiser (Tweedie) y + σ2∇ log pσ (y) y + σ2∇2h(y)−1 · ∇ log pσ (y) GradPnP algorithms: GD with ... ∇g ← Id −Dσ ∇g(y) ← ∇2h(y) · (Id −Dσ )(y) ProxPnP algorithms: PGD with ... Proxτg ← Dσ Proxh τg ← Dσ Gradient-Step Denoiser Dσ (y) = y − ∇gσ (y) Dσ (y) = y − ∇2h(y)−1 · ∇gσ (y) 1Dh (x, y) = h(x) − h(y) − ∇h(y), x − y 33 / 38

Application to Poisson inverse problems Data-ﬁdelity term f(x) = KL(y,
Ax) Bregman potential: Burg’s entropy h(x) = − n i=1 log(xi ) Lh − f convex for L ≥ ||y||1 [Bauschke, ‘17] Bregman noise model: Gamma Inverse (IG) distribution (γ = 1/σ2) p(y|x) ∝ n i=1 xi yi γ exp −γ xi yi = n i=1 IG(γ − 1, γxi ) Gradient-Step Denoiser: Dσ (y) = y − y2∇gσ (y) (a) Ground truth (b) noisy (γ = 25) (c) Denoiser GS 34 / 38

Deblurring with Poisson noise (a) Ground truth (b) Observation (16.21dB)
(c) GradPnP (23.01dB) (d) ProxPnP (22.96dB) (e) (λf + gσ )(xk ) GradPnP (f) (λf + φσ )(xk ) ProxPnP (g) ||xi+1 − xi ||2 35 / 38

Summary of the contributions Introduction of denoisers allowing to rewrite
PnP algorithms like real optimization algorithms. Learning of an explicit approximation of the clean image prior. Convergence proofs towards critical points of an explicit and non-convex objective. State-of-the-art restoration performance. Bregman extension for non-Gaussian noises. 36 / 38

DeepInverse: a Pytorch library for imaging with deep learning 37
/ 38

Thanks for your attention! https://samuelhurault.netlify.app/ [email protected] [H., Leclaire, Papadakis (ICLR
2022)] Gradient Step Denoiser for Convergent Plug-and-Play. [H., Leclaire., Papadakis (ICML 2022)] Proximal Denoiser for Convergent Plug-and-Play Optimization with Nonconvex Regularization. [H., Chambolle, Leclaire, Papadakis (SSVM 2023)] A relaxed proximal gradient descent algorithm for convergent plug-and-play with proximal denoiser. [H., Kamilov, Leclaire, Papadakis (Neurips 2023)] Convergent Bregman Plug-and-Play Image Restoration for Poisson Inverse Problems [Li, H., Solomon (Neurips 2023)] Self-consistent velocity matching of probability ﬂows. [Dossal, H., Papadakis (2024)] Optimization with ﬁrst order algorithms. [H., Chambolle, Leclaire, Papadakis (JMIV, 2024)] Convergent plug-and-play with proximal denoiser and unconstrained regularization parameter. 38 / 38

Samuel Hurault

Samuel Hurault

More Decks by S³ Seminar

Other Decks in Research

Featured

Transcript