Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Samuel Hurault

Samuel Hurault

(ENS Paris)

Title — Convergent plug-and-play methods for image inverse problems.

Abstract — Plug-and-play methods constitute a class of iterative algorithms for imaging inverse problems where regularization is performed by an off-the-shelf Gaussian denoiser. These methods have demonstrated impressive visual results, particularly when the denoiser is parameterized by deep neural networks. However, the theoretical convergence of PnP methods has yet to be fully established. This talk begins with an overview of the literature on PnP algorithms, followed by the introduction of new convergence results for these methods when paired with specific denoisers. Our analysis shows that the resulting PnP algorithms converge to stationary points of explicit nonconvex functionals. These algorithms are then applied to various ill-posed inverse problems, including deblurring, super-resolution, and inpainting. Finally, to address inverse problems corrupted by Poisson noise, we will introduce a novel Bregman version of PnP based on the Bregman Proximal Gradient (BPG) optimization algorithm.

Bio
I am a postdoctoral researcher at ENS Paris working with with Gabriel Peyré. I obtained my Ph.D from Université de Bordeaux, co-supervised by Nicolas Papadakis and Arthur Leclaire. My research interests include image restoration, generative modeling, nonconvex optimization. The main objective of my PhD was to develop inovative deep denoising priors along with convergent optimization schemes for solving image inverse problems. I am also one of the main developer of the Python library DeepInv, which provides a unified framework for solving inverse problems using deep neural networks.

Avatar for S³ Seminar

S³ Seminar

October 18, 2024
Tweet

More Decks by S³ Seminar

Other Decks in Research

Transcript

  1. Convergence of Plug-and-Play algorithms for image inverse problems Samuel Hurault

    Universit´ e de Bordeaux, ENS Paris PhD supervisors: Arthur Leclaire, Nicolas Papadakis Collaborations: Antonin Chambolle, Luca Calatroni, Charles Dossal, Ulugbek Kamilov, Thomas Moreau, Gabriel Peyre, Justin Solomon, Mathieu Terris 1 / 38
  2. Image Inverse Problems x ∈ Rn y ∼ noise(Ax) ∈

    Rm Acquisition / Degradation Inverse Problem 6 / 38
  3. Supervised methods [Alder & ¨ Okten, ‘18, Bertocchi et. al

    ‘18, Zhang et. al ‘20]  If access to a dataset (xi , yi ), state-of-the-art performance.  Requires to retrain for each degradation A. 7 / 38
  4. Variational formulation Estimate x from observation y ∼ noise(Ax) Estimation

    x∗ must verify: Ax∗ “is close to” y. x∗ “looks like” a real image. 8 / 38
  5. Variational formulation Estimate x from observation y ∼ noise(Ax) Estimation

    x∗ must verify: Ax∗ “is close to” y. x∗ “looks like” a real image. x∗ = arg min x∈Rn f(x) + g(x) 8 / 38
  6. Variational formulation Estimate x from observation y ∼ noise(Ax) Estimation

    x∗ must verify: Ax∗ “is close to” y. x∗ “looks like” a real image. x∗ = arg min x∈Rn f(x) + g(x) Data-fidelity ex: f(x) = 1 2 ||Ax − y||2 8 / 38
  7. Variational formulation Estimate x from observation y ∼ noise(Ax) Estimation

    x∗ must verify: Ax∗ “is close to” y. x∗ “looks like” a real image. x∗ = arg min x∈Rn f(x) + g(x) Data-fidelity ex: f(x) = 1 2 ||Ax − y||2 Regularization ex: Total Variation [Rudin et al. ‘92] Wavelet sparsity [Mallat ‘09] Deep prior [Bora et al. ‘17] 8 / 38
  8. Variational formulation Estimate x from observation y ∼ noise(Ax) Estimation

    x∗ must verify: Ax∗ “is close to” y. x∗ “looks like” a real image. x∗ = arg min x∈Rn f(x) + g(x) Data-fidelity ex: f(x) = 1 2 ||Ax − y||2 Regularization ex: Total Variation [Rudin et al. ‘92] Wavelet sparsity [Mallat ‘09] Deep prior [Bora et al. ‘17] Probabilistic formulation Estimate x from observation y ∼ p(y|x) 8 / 38
  9. Variational formulation Estimate x from observation y ∼ noise(Ax) Estimation

    x∗ must verify: Ax∗ “is close to” y. x∗ “looks like” a real image. x∗ = arg min x∈Rn f(x) + g(x) Data-fidelity ex: f(x) = 1 2 ||Ax − y||2 Regularization ex: Total Variation [Rudin et al. ‘92] Wavelet sparsity [Mallat ‘09] Deep prior [Bora et al. ‘17] Probabilistic formulation Estimate x from observation y ∼ p(y|x) Maximum A-Posteriori x∗ = arg max x∈Rn p(x|y) = arg min x∈Rn − log p(y|x) − log p(x) 8 / 38
  10. Variational formulation Estimate x from observation y ∼ noise(Ax) Estimation

    x∗ must verify: Ax∗ “is close to” y. x∗ “looks like” a real image. x∗ = arg min x∈Rn f(x) + g(x) Data-fidelity ex: f(x) = 1 2 ||Ax − y||2 Regularization ex: Total Variation [Rudin et al. ‘92] Wavelet sparsity [Mallat ‘09] Deep prior [Bora et al. ‘17] Probabilistic formulation Estimate x from observation y ∼ p(y|x) Maximum A-Posteriori x∗ = arg max x∈Rn p(x|y) = arg min x∈Rn − log p(y|x) − log p(x) Gaussian noise: f(x) = 1 2 ||Ax − y||2 Poisson noise: f(x) = KL(y, Ax) 8 / 38
  11. Variational formulation Estimate x from observation y ∼ noise(Ax) Estimation

    x∗ must verify: Ax∗ “is close to” y. x∗ “looks like” a real image. x∗ = arg min x∈Rn f(x) + g(x) Data-fidelity ex: f(x) = 1 2 ||Ax − y||2 Regularization ex: Total Variation [Rudin et al. ‘92] Wavelet sparsity [Mallat ‘09] Deep prior [Bora et al. ‘17] Probabilistic formulation Estimate x from observation y ∼ p(y|x) Maximum A-Posteriori x∗ = arg max x∈Rn p(x|y) = arg min x∈Rn − log p(y|x) − log p(x) Gaussian noise: f(x) = 1 2 ||Ax − y||2 Poisson noise: f(x) = KL(y, Ax) g(x) = − log p(x) p(x) 8 / 38
  12. Heuristics of Plug-and-Play methods [Venkatakrishnan et al. ‘13] Estimate x

    from observation y ∼ noise(Ax) Decouple optimization over f and g. [Combette & Pesquet ‘11] [Zoran & Weiss ‘11] 9 / 38
  13. Heuristics of Plug-and-Play methods [Venkatakrishnan et al. ‘13] Estimate x

    from observation y ∼ noise(Ax) Decouple optimization over f and g. [Combette & Pesquet ‘11] [Zoran & Weiss ‘11] Optimization step over the regularizer g ←− Gaussian Denoising → Easy Inverse problem A = Id, performant deep denoisers [Zhang et al. ‘17,‘19,‘21] [Song et al. ‘19] → g(Denoiser(x)) < g(x) (Implicit Regularization) 9 / 38
  14. Heuristics of Plug-and-Play methods [Venkatakrishnan et al. ‘13] Estimate x

    from observation y ∼ noise(Ax) Decouple optimization over f and g. [Combette & Pesquet ‘11] [Zoran & Weiss ‘11] Optimization step over the regularizer g ←− Gaussian Denoising → Easy Inverse problem A = Id, performant deep denoisers [Zhang et al. ‘17,‘19,‘21] [Song et al. ‘19] → g(Denoiser(x)) < g(x) (Implicit Regularization) From y ∼ noise(Ax), iterate: 1. Denoising step 2. Decrease the data-fidelity term f.  “One denoiser to solve them all!” [Chang et. al] 9 / 38
  15. Presentation Outline 1. Introduction to Plug-and-Play methods How can an

    image Denoiser be used to regularize an inverse problem? 2. Convergence of Plug-and-Play algorithms Can sufficient conditions on the Denoiser be found to ensure that Plug-and-Play algorithms converge while maintaining performance? 3. Bregman Generalization and inverse problems with Poisson noise How can these schemes be applied to less regular data fidelity terms? 4. DeepInverse library 10 / 38
  16. First-order optimization Find arg minx∈Rn F(x) Gradient Descent (explicit): xk+1

    = (Id −τ∇F)(xk ) Implicit Gradient Descent / Proximal Point Algorithm: xk+1 = xk − τ∇F(xk+1 ) ⇔ xk+1 = ProxτF (xk ) where ProxF (y) = arg min x∈Rn 1 2 ||x−y||2+F(x) 12 / 38
  17. First-order optimization Find arg minx∈Rn F(x) Gradient Descent (explicit): xk+1

    = (Id −τ∇F)(xk ) Implicit Gradient Descent / Proximal Point Algorithm: xk+1 = xk − τ∇F(xk+1 ) ⇔ xk+1 = ProxτF (xk ) where ProxF (y) = arg min x∈Rn 1 2 ||x−y||2+F(x) Find arg minx∈Rn f(x) + g(x) Gradient descent: xk+1 = (Id −τ∇(f + g))(xk ) Proximal Gradient Descent (PGD): xk+1 = Proxτg ◦ (Id −τ∇f)(xk ) Douglas-Rashford Splitting / ADMM: xk+1 = 1 2 Id + 1 2 (2Proxτg − Id) ◦ (2Proxτf − Id) 12 / 38
  18. First-order optimization Find arg minx∈Rn F(x) Gradient Descent (explicit): xk+1

    = (Id −τ∇F)(xk ) Implicit Gradient Descent / Proximal Point Algorithm: xk+1 = xk − τ∇F(xk+1 ) ⇔ xk+1 = ProxτF (xk ) where ProxF (y) = arg min x∈Rn 1 2 ||x−y||2+F(x) Find arg minx∈Rn f(x) + g(x) Gradient descent: xk+1 = (Id −τ∇(f + g))(xk ) Proximal Gradient Descent (PGD): xk+1 = Proxτg ◦ (Id −τ∇f)(xk ) Douglas-Rashford Splitting / ADMM: xk+1 = 1 2 Id + 1 2 (2Proxτg − Id) ◦ (2Proxτf − Id) Involve ∇g or Proxτg but not g explicitly. 12 / 38
  19. Denoising: an implicit regularization Estimate x from observation y =

    x + ξ x ∼ p(x). ξ ∼ N(0, σ2 Id) Gaussian noise. = + 13 / 38
  20. Denoising: an implicit regularization Estimate x from observation y =

    x + ξ x ∼ p(x). ξ ∼ N(0, σ2 Id) Gaussian noise. → Two theoretical denoisers: MMSE denoiser DMMSE σ (y) = Ex∼p(x|y) [x] MAP Denoiser DMAP σ (y) = arg max x p(x|y) 13 / 38
  21. Denoising: an implicit regularization Estimate x from observation y =

    x + ξ x ∼ p(x). pσ = p ∗ Gσ with Gσ Gaussian pdf of variance σ2. ξ ∼ N(0, σ2 Id) Gaussian noise. → Two theoretical denoisers: MMSE denoiser DMMSE σ (y) = Ex∼p(x|y) [x] MAP Denoiser DMAP σ (y) = arg max x p(x|y) Tweedie: DMMSE σ = Id −σ2∇(− log pσ ) Bayes: DMAP σ = Prox−σ2 log p 13 / 38
  22. Denoising: an implicit regularization Estimate x from observation y =

    x + ξ x ∼ p(x). pσ = p ∗ Gσ with Gσ Gaussian pdf of variance σ2. ξ ∼ N(0, σ2 Id) Gaussian noise. → Two theoretical denoisers: MMSE denoiser DMMSE σ (y) = Ex∼p(x|y) [x] MAP Denoiser DMAP σ (y) = arg max x p(x|y) Tweedie: DMMSE σ = Id −σ2∇(− log pσ ) Bayes: DMAP σ = Prox−σ2 log p → Given a real denoiser Dσ : Dσ ≈ DMMSE σ or Dσ ≈ DMAP σ Denoising ≈ explicit or implicit descent on g = − log p. 13 / 38
  23. Small recap 1. g = − log p is the

    ideal regularization 2. Minimize f + g requires access to ∇g or Prox g. 3. A Gaussian denoiser approximates ∇(− log p) or Prox− log p . 14 / 38
  24. Plug-and-play algorithms Estimate arg minx∈Rn f(x) + g(x) with f

    = − log p(y|.) and g = − log p PGD: xk+1 = Proxτf ◦(Id −τ∇g)(xk ) or xk+1 = Proxτg ◦ (Id −τ∇f)(xk )  unknown regularization g 15 / 38
  25. Plug-and-play algorithms Estimate arg minx∈Rn f(x) + g(x) with f

    = − log p(y|.) and g = − log p PGD: xk+1 = Proxτf ◦(Id −τ∇g)(xk ) or xk+1 = Proxτg ◦ (Id −τ∇f)(xk )  unknown regularization g MMSE Denoiser DMMSE σ = Id −σ2∇(− log pσ ) → Given a real denoiser Dσ : GradPnP algorithms (RED) [Romano et al., ‘16] ∇g ← Id −Dσ GradPnP-PGD: xk+1 = Proxτf ◦(τDσ + (1 − τ) Id)(xk ) 15 / 38
  26. Plug-and-play algorithms Estimate arg minx∈Rn f(x) + g(x) with f

    = − log p(y|.) and g = − log p PGD: xk+1 = Proxτf ◦(Id −τ∇g)(xk ) or xk+1 = Proxτg ◦ (Id −τ∇f)(xk )  unknown regularization g MMSE Denoiser DMMSE σ = Id −σ2∇(− log pσ ) MAP Denoiser DMAP σ = Prox−σ2 log p → Given a real denoiser Dσ : GradPnP algorithms (RED) [Romano et al., ‘16] ∇g ← Id −Dσ GradPnP-PGD: xk+1 = Proxτf ◦(τDσ + (1 − τ) Id)(xk ) ProxPnP algorithms [Venkatakrishnan et al., ‘13] Proxτg ← Dσ ProxPnP-PGD: xk+1 = Dσ ◦ (Id −τ∇f)(xk ) 15 / 38
  27. Plug-and-play algorithms Estimate x from observation y ∼ p(y|x) MAP

    / 1st order optim. arg min x∈Rn f(x) + g(x)  Converges.  Unknown regularization g. Plug-and-Play Algorithms  Implicit regularization. Dσ ≈ Id −∇g Dσ ≈ Proxτg 16 / 38
  28. Plug-and-play algorithms Estimate x from observation y ∼ p(y|x) MAP

    / 1st order optim. arg min x∈Rn f(x) + g(x)  Converges.  Unknown regularization g. Plug-and-Play Algorithms  Implicit regularization.  State-of-the-art performance. Dσ ≈ Id −∇g Dσ ≈ Proxτg 16 / 38
  29. Plug-and-play algorithms Estimate x from observation y ∼ p(y|x) MAP

    / 1st order optim. arg min x∈Rn f(x) + g(x)  Converges.  Unknown regularization g. Plug-and-Play Algorithms  Implicit regularization.  State-of-the-art performance. Dσ ≈ Id −∇g Dσ ≈ Proxτg DPIR [Zhang et al. ‘21] Parameters τ, σ PSNR(xk , x∗) ||xk+1−xk|| ||xk|| 16 / 38
  30. Plug-and-play algorithms Estimate x from observation y ∼ p(y|x) MAP

    / 1st order optim. arg min x∈Rn f(x) + g(x)  Converges.  Unknown regularization g. Plug-and-Play Algorithms  Implicit regularization.  State-of-the-art performance.  No convergence guarantees. Dσ ≈ Id −∇g Dσ ≈ Proxτg  For a generic denoiser (e.g.: a deep neural net): Dσ = Id −∇g and Dσ = Proxτg . 16 / 38
  31. Plug-and-play algorithms Estimate x from observation y ∼ p(y|x) MAP

    / 1st order optim. arg min x∈Rn f(x) + g(x)  Converges.  Unknown regularization g. Plug-and-Play Algorithms  Implicit regularization.  State-of-the-art performance.  No convergence guarantees. Dσ ≈ Id −∇g Dσ ≈ Proxτg  For a generic denoiser (e.g.: a deep neural net): Dσ = Id −∇g and Dσ = Proxτg . DPIR [Zhang et al. ‘21] + 500 iterations with fixed τ and σ. Parameters τ, σ PSNR(xk , x∗) ||xk+1−xk|| ||xk|| 16 / 38
  32. Plug-and-play algorithms Estimate x from observation y ∼ p(y|x) MAP

    / 1st order optim. arg min x∈Rn f(x) + g(x)  Converges.  Unknown regularization g. Plug-and-Play Algorithms  Implicit regularization.  State-of-the-art performance.  No convergence guarantees.  No minimized functional. Dσ ≈ Id −∇g Dσ ≈ Proxτg  For a generic denoiser (e.g.: a deep neural net): Dσ = Id −∇g and Dσ = Proxτg . 16 / 38
  33. Plug-and-play algorithms Estimate x from observation y ∼ p(y|x) MAP

    / 1st order optim. arg min x∈Rn f(x) + g(x)  Converges.  Unknown regularization g. Plug-and-Play Algorithms  Implicit regularization.  State-of-the-art performance.  No convergence guarantees.  No minimized functional. Dσ ≈ Id −∇g Dσ ≈ Proxτg  For a generic denoiser (e.g.: a deep neural net): Dσ = Id −∇g and Dσ = Proxτg . Objectif: Find sufficient conditions on the denoiser to ensure convergence of PnP algorithms. 16 / 38
  34. Existing works: fixed-point convergence Show that PnP converges towards x∗

    ∈ Fix(TP nP ). xk+1 = Proxτf ◦ (τDσ + (1 − τ) Id)(xk ) xk+1 = Dσ ◦ (Id −τ∇f)(xk ) Show that TP nP is non-expansive: Dσ non-expansive [Sun et al. ’19, Terris et al. ’20, Hertrich et al. ‘21, Liu et al. ’21, Bohra et al. ‘21] ||Dσ (x) − Dσ (y)|| ≤ ||x − y||  A performant denoiser is not non-expansive. Why ? The optimal MAP and MMSE denoisers are non-expansive iif − log p is convex. Dσ expansive and data-fidelity term f strongly convex [Ryu et al., ‘19]  Excludes numerous problems 18 / 38
  35. PnP algorithms Estimate x from observation y ∼ p(y|x) MAP

    / 1st order optim. arg min x∈Rn f(x) + g(x)  Converges.  Unknown regularization g. Plug-and-Play algorithms  Implicit regularization.  State-of-the-art performance.  No convergence guarantees.  No minimized functional. Fixed-point convergence x∗ ∈ Fix(TP nP )  Convergence.  Sub-optimal performance .  No minimized functional. Dσ ≈ Id −∇g Dσ ≈ Proxτg Dσ non-expansive 19 / 38
  36. Convergence via minimization Prove that PnP algorithms converge towards x∗

    ∈ arg minx f(x) + g(x).  For a generic denoiser (e.g.: a deep neural network): Dσ = Id −∇g et Dσ = Proxg . Solution: Make PnP real optimization algorithms by guaranteeing exactly GradPnP algorithms Dσ = Id −∇g ProxPnP algorithms Dσ = Proxg 20 / 38
  37. Convergence via minimization Prove that PnP algorithms converge towards x∗

    ∈ arg minx f(x) + g(x).  For a generic denoiser (e.g.: a deep neural network): Dσ = Id −∇g et Dσ = Proxg . Solution: Make PnP real optimization algorithms by guaranteeing exactly GradPnP algorithms Dσ = Id −∇g ] ProxPnP algorithms Dσ = Proxg 20 / 38
  38. Gradient-Step Denoiser [H., Leclaire, Papadakis, ‘21] Gradient-Step Denoiser (GS) Dσ

    = Id −∇gσ With the non-convex potential: gσ (x) = 1 2 ||x − Nσ (x)||2 Nσ : Rn → Rn UNet with differentiable activations [Zhang et al. ‘21] Dσ (x) = x − ∇ 1 2 ||x − Nσ (x)||2 = Nσ (x) + JNσ (x)T (x − Nσ (x)) 21 / 38
  39. Gradient-Step Denoiser [H., Leclaire, Papadakis, ‘21] Gradient-Step Denoiser (GS) Dσ

    = Id −∇gσ With the non-convex potential: gσ (x) = 1 2 ||x − Nσ (x)||2 Trained by minimizing the L2 cost  No degradation of denoising performance σ(./255) 15 25 50 UNet 33.90 31.25 28.00 GS-UNet 33.90 31.26 28.01 PSNR on the CBSD68 dataset 21 / 38
  40. Gradient-Step Denoiser [H., Leclaire, Papadakis, ‘21] Gradient-Step Denoiser (GS) Dσ

    = Id −∇gσ With the non-convex potential: gσ (x) = 1 2 ||x − Nσ (x)||2 Trained by minimizing the L2 cost  No degradation of denoising performance σ(./255) 15 25 50 UNet 33.90 31.25 28.00 GS-UNet 33.90 31.26 28.01 PSNR on the CBSD68 dataset Provides an explicit regularization gσ The MMSE is optimal for the L2 cost: Dσ = Id −∇gσ ≈ DMMSE σ = Id −∇(−σ2 log pσ )  gσ ≈ −σ2 log pσ + const 21 / 38
  41. Nonconvex convergence of GradPnP algorithms [H., Leclaire, Papadakis, ‘21] With

    GS denoiser, GradPnP algorithms converge towards a critical point of f + λgσ . ex: GradPnP-PGD xk+1 = Proxτf ◦(τλDσ + (1 − τλ) Id)(xk ) = Proxτf ◦(Id −τλ∇gσ )(xk )  gσ is non-convex.  gσ has Lipschitz gradient Assumptions (i) f convex, proper, l.s.c (ii) ∇gσ L-Lipschitz  (L estimated via stepsize backtracking) (iii) f + λgσ coercive and verifies the Kurdyka-Lojasiewicz (KL) property (gσ analytic)  Then, for τ < 2λ L , (xk ) converges towards a critical point of f + λgσ . [Attouch et al., ‘13] 22 / 38
  42. Experimental results Deblurring Super-resolution Input noise level 0.01 0.03 0.05

    0.01 0.03 0.05 Bicubic - - - 24.85 23.96 22.79 EPLL [Zoran & Weiss, ‘11] 27.34 25.16 24.04 - - - IRCNN [Zhang et al., ‘17] 31.42 28.01 26.40 26.97 25.86 25.45 DPIR [Zhang et al., ‘21] 31.93 28.30 26.82 27.79 26.58 25.83 GradPnP-PGD 31.70 28.28 26.86 27.88 26.81 26.01 PSNR (dB) for deblurring and super-resolution tasks, averaged over the CBSD68 dataset and over a variety of blur kernels.  Convergence + State-of-the-art performance among Plug-and-Play methods. 25 / 38
  43. Convergence of PnP algorithms via minimization Prouve that PnP algorthms

    converge towards x∗ ∈ arg minx f(x) + g(x).  For a generic denoiser (e.g.: a deep neural network): Dσ = Proxτg . Solution: Make PnP algorithms real optimization algorithms by guaranteeing exactly GradPnP algorithms Dσ = Id −∇g ProxPnP algorithms Dσ = Proxg 26 / 38
  44. Proximal Denoiser[H., Leclaire, Papadakis, ‘22] Dσ = Id −∇gσ =

    ∇hσ with hσ = 1 2 ||x||2 − gσ [Gribonval et Nikolova, ‘20] If ∇gσ = Id −Dσ is (L ≤ 1)-Lipschitz ⇒ Dσ = ∇hσ with hσ convex. ⇒ ∃φσ : Rn → R ∪ {+∞} L L+1 weakly convex s.t. Dσ = Proxφσ ∀x ∈ Im(Dσ ), φσ (x) = h∗ σ (x) − 1 2 ||x||2 27 / 38
  45. How to ensure Lip(∇gσ ) ≤ 1 ? [H., Leclaire,

    Papadakis, ‘22] Proximal Denoiser Dσ = Id −∇gσ = ∇hσ = Proxφσ Convex parametrization of hσ . [Amos et al., ‘17] [Goujon et al., ‘23]  Suboptimal performance. Training cost penalization [Pesquet et. al, ‘21]: Ex∼p,y∼N (x,σ2 Id) ||Dσ (y) − x||2 + µ max(||∇2gσ (y)||S , 1 − ) σ(./255) 5 15 25 GS-UNet 1.26 1.96 3.27 Prox-UNet 0.92 0.99 0.96 maxx ||∇2gσ (x + ξσ )||S on the CBSD68 dataset σ(./255) 5 15 25 GS-UNet 40.26 33.90 31.26 Prox-UNet 40.12 33.60 30.82 PSNR on the CBSD68 dataset 28 / 38
  46. Convergence of ProxPnP algorithms [H., Leclaire, Papadakis, ‘22, ‘23] With

    Prox denoiser, ProxPnP algorithms converge towards a critical point of f + λφσ . ex: ProxPnP-PGD xk+1 = Dσ ◦ (Id − 1 λ ∇f)(xk ) = Proxφσ ◦ (Id − 1 λ ∇f)(xk ) ! Dσ = Proxφσ = Proxτφσ ⇒ Requires to keep a stepsize τ = 1. Hypothesis (i) f convex, differentiable, with Lf -Lipschitz gradient. (ii) φσ M-weakly convex.  (iii) f + λφσ coercive and verifies the Kurdyka-Lojasiewicz (KL) condition.  Then, for 1 λ Lf + M < 2, (xk ) converges towards a critical point of f + λφσ . 29 / 38
  47. Plug-and-Play algorithms Estimate x from observation y ∼ p(y|x) MAP

    / 1st order optim. arg min x∈Rn f(x) + g(x)  Converging algorithms.  Uknown regularization g. Plug-and-Play algorithms  Implicit regularization.  State-of-the-art performance.  No convergence guarantees.  No minimized functional. Fixed-point convergence x∗ ∈ Fix(TP nP )  Convergence.  No minimized functional.  Sub-optimal performance. GradPnP convergence x∗ ∈ arg min x∈Rn f(x) + gσ (x) ProxPnP convergence x∗ ∈ arg min x∈Rn f(x) + φσ (x)  Convergence.  Optimize an explicit functional.  State-of-the-art performance. Dσ ≈ Id −∇g Dσ ≈ Proxτg Dσ non-expansive Dσ = Id −∇gσ Dσ = Proxφσ 30 / 38
  48. Poisson inverse problems If the observation noise is Poisson, y

    ∼ P(Ax), the data-fidelity term is f(x) = − log p(y|x) = KL(y, Ax) = n i=1 yi log yi Axi − yi + Axi  ∇f non-Lipschitz. [Bauschke et. al, ‘17] For Burg’s entropy h(x) = − i log(xi ) and L ≥ ||y||1 :  (Relative smoothness) Lh − f convex on int dom(h). Generalizes the Lipschitz gradient assumption: for h(x) = 1 2 ||x||2, implies ∇f L-Lipschitz. h defines a new geometry adapted to the data-fidelity term f: Euclidian distance 1 2 ||x − y||2 replaced by the Bregman divergence Dh (x, y) = h(x) − h(y) − ∇h(y), x − y 32 / 38
  49. Bregman potential Euclidian 1 2 ||x||2 h : Rn →

    R strictly convex, C2 1Dh (x, y) = h(x) − h(y) − ∇h(y), x − y 33 / 38
  50. Bregman potential Euclidian 1 2 ||x||2 h : Rn →

    R strictly convex, C2 Distance 1 2 ||x − y||2 Bregman Divergence Dh (x, y) 1 Smoothness ∇f L-Lipschitz Lh − f convex 1Dh (x, y) = h(x) − h(y) − ∇h(y), x − y 33 / 38
  51. Bregman potential Euclidian 1 2 ||x||2 h : Rn →

    R strictly convex, C2 Distance 1 2 ||x − y||2 Bregman Divergence Dh (x, y) 1 Smoothness ∇f L-Lipschitz Lh − f convex Proximal operator Proxf = arg minx f(x) + 1 2 ||x − y||2 Proxh f = arg minx f(x) + Dh (x, y) 1Dh (x, y) = h(x) − h(y) − ∇h(y), x − y 33 / 38
  52. Bregman potential Euclidian 1 2 ||x||2 h : Rn →

    R strictly convex, C2 Distance 1 2 ||x − y||2 Bregman Divergence Dh (x, y) 1 Smoothness ∇f L-Lipschitz Lh − f convex Proximal operator Proxf = arg minx f(x) + 1 2 ||x − y||2 Proxh f = arg minx f(x) + Dh (x, y) Gradient Desc. (GD) Id −τ(∇f + ∇g) ∇h∗(∇h − τ(∇f + ∇g)) Prox. Grad. Desc. (PGD) Proxτg ◦ (Id −τ∇f) Proxh τg ◦ ∇h∗(∇h − τ∇f) [H., Kamilov, Leclaire, Papadakis, ‘23] 1Dh (x, y) = h(x) − h(y) − ∇h(y), x − y 33 / 38
  53. Bregman potential Euclidian 1 2 ||x||2 h : Rn →

    R strictly convex, C2 Distance 1 2 ||x − y||2 Bregman Divergence Dh (x, y) 1 Smoothness ∇f L-Lipschitz Lh − f convex Proximal operator Proxf = arg minx f(x) + 1 2 ||x − y||2 Proxh f = arg minx f(x) + Dh (x, y) Gradient Desc. (GD) Id −τ(∇f + ∇g) ∇h∗(∇h − τ(∇f + ∇g)) Prox. Grad. Desc. (PGD) Proxτg ◦ (Id −τ∇f) Proxh τg ◦ ∇h∗(∇h − τ∇f) [H., Kamilov, Leclaire, Papadakis, ‘23] Noise model p(y|x) Gaussian ∝ exp − 1 σ2 Dh (x, y) 1Dh (x, y) = h(x) − h(y) − ∇h(y), x − y 33 / 38
  54. Bregman potential Euclidian 1 2 ||x||2 h : Rn →

    R strictly convex, C2 Distance 1 2 ||x − y||2 Bregman Divergence Dh (x, y) 1 Smoothness ∇f L-Lipschitz Lh − f convex Proximal operator Proxf = arg minx f(x) + 1 2 ||x − y||2 Proxh f = arg minx f(x) + Dh (x, y) Gradient Desc. (GD) Id −τ(∇f + ∇g) ∇h∗(∇h − τ(∇f + ∇g)) Prox. Grad. Desc. (PGD) Proxτg ◦ (Id −τ∇f) Proxh τg ◦ ∇h∗(∇h − τ∇f) [H., Kamilov, Leclaire, Papadakis, ‘23] Noise model p(y|x) Gaussian ∝ exp − 1 σ2 Dh (x, y) MAP denoiser Prox−σ2 log p (y) Proxh −σ2 log p (y) MMSE denoiser (Tweedie) y + σ2∇ log pσ (y) y + σ2∇2h(y)−1 · ∇ log pσ (y) 1Dh (x, y) = h(x) − h(y) − ∇h(y), x − y 33 / 38
  55. Bregman potential Euclidian 1 2 ||x||2 h : Rn →

    R strictly convex, C2 Distance 1 2 ||x − y||2 Bregman Divergence Dh (x, y) 1 Smoothness ∇f L-Lipschitz Lh − f convex Proximal operator Proxf = arg minx f(x) + 1 2 ||x − y||2 Proxh f = arg minx f(x) + Dh (x, y) Gradient Desc. (GD) Id −τ(∇f + ∇g) ∇h∗(∇h − τ(∇f + ∇g)) Prox. Grad. Desc. (PGD) Proxτg ◦ (Id −τ∇f) Proxh τg ◦ ∇h∗(∇h − τ∇f) [H., Kamilov, Leclaire, Papadakis, ‘23] Noise model p(y|x) Gaussian ∝ exp − 1 σ2 Dh (x, y) MAP denoiser Prox−σ2 log p (y) Proxh −σ2 log p (y) MMSE denoiser (Tweedie) y + σ2∇ log pσ (y) y + σ2∇2h(y)−1 · ∇ log pσ (y) GradPnP algorithms: GD with ... ∇g ← Id −Dσ ∇g(y) ← ∇2h(y) · (Id −Dσ )(y) ProxPnP algorithms: PGD with ... Proxτg ← Dσ Proxh τg ← Dσ 1Dh (x, y) = h(x) − h(y) − ∇h(y), x − y 33 / 38
  56. Bregman potential Euclidian 1 2 ||x||2 h : Rn →

    R strictly convex, C2 Distance 1 2 ||x − y||2 Bregman Divergence Dh (x, y) 1 Smoothness ∇f L-Lipschitz Lh − f convex Proximal operator Proxf = arg minx f(x) + 1 2 ||x − y||2 Proxh f = arg minx f(x) + Dh (x, y) Gradient Desc. (GD) Id −τ(∇f + ∇g) ∇h∗(∇h − τ(∇f + ∇g)) Prox. Grad. Desc. (PGD) Proxτg ◦ (Id −τ∇f) Proxh τg ◦ ∇h∗(∇h − τ∇f) [H., Kamilov, Leclaire, Papadakis, ‘23] Noise model p(y|x) Gaussian ∝ exp − 1 σ2 Dh (x, y) MAP denoiser Prox−σ2 log p (y) Proxh −σ2 log p (y) MMSE denoiser (Tweedie) y + σ2∇ log pσ (y) y + σ2∇2h(y)−1 · ∇ log pσ (y) GradPnP algorithms: GD with ... ∇g ← Id −Dσ ∇g(y) ← ∇2h(y) · (Id −Dσ )(y) ProxPnP algorithms: PGD with ... Proxτg ← Dσ Proxh τg ← Dσ Gradient-Step Denoiser Dσ (y) = y − ∇gσ (y) Dσ (y) = y − ∇2h(y)−1 · ∇gσ (y) 1Dh (x, y) = h(x) − h(y) − ∇h(y), x − y 33 / 38
  57. Application to Poisson inverse problems Data-fidelity term f(x) = KL(y,

    Ax) Bregman potential: Burg’s entropy h(x) = − n i=1 log(xi ) Lh − f convex for L ≥ ||y||1 [Bauschke, ‘17] Bregman noise model: Gamma Inverse (IG) distribution (γ = 1/σ2) p(y|x) ∝ n i=1 xi yi γ exp −γ xi yi = n i=1 IG(γ − 1, γxi ) Gradient-Step Denoiser: Dσ (y) = y − y2∇gσ (y) (a) Ground truth (b) noisy (γ = 25) (c) Denoiser GS 34 / 38
  58. Deblurring with Poisson noise (a) Ground truth (b) Observation (16.21dB)

    (c) GradPnP (23.01dB) (d) ProxPnP (22.96dB) (e) (λf + gσ )(xk ) GradPnP (f) (λf + φσ )(xk ) ProxPnP (g) ||xi+1 − xi ||2 35 / 38
  59. Summary of the contributions Introduction of denoisers allowing to rewrite

    PnP algorithms like real optimization algorithms. Learning of an explicit approximation of the clean image prior. Convergence proofs towards critical points of an explicit and non-convex objective. State-of-the-art restoration performance. Bregman extension for non-Gaussian noises. 36 / 38
  60. Thanks for your attention! https://samuelhurault.netlify.app/ [email protected] [H., Leclaire, Papadakis (ICLR

    2022)] Gradient Step Denoiser for Convergent Plug-and-Play. [H., Leclaire., Papadakis (ICML 2022)] Proximal Denoiser for Convergent Plug-and-Play Optimization with Nonconvex Regularization. [H., Chambolle, Leclaire, Papadakis (SSVM 2023)] A relaxed proximal gradient descent algorithm for convergent plug-and-play with proximal denoiser. [H., Kamilov, Leclaire, Papadakis (Neurips 2023)] Convergent Bregman Plug-and-Play Image Restoration for Poisson Inverse Problems [Li, H., Solomon (Neurips 2023)] Self-consistent velocity matching of probability flows. [Dossal, H., Papadakis (2024)] Optimization with first order algorithms. [H., Chambolle, Leclaire, Papadakis (JMIV, 2024)] Convergent plug-and-play with proximal denoiser and unconstrained regularization parameter. 38 / 38