Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Advanced Topics of Prior-based Image Restoratio...

Tatsuya Yokota
July 02, 2024
18

Advanced Topics of Prior-based Image Restoration: Tensors and Neural Networks (Tutorial at APSIPA-ASC 2021)

Tatsuya Yokota

July 02, 2024
Tweet

Transcript

  1. 2 Organization of This Tutorial Basics & Typical methods Tensors

    Neural Networks This tutorial consists of 3 parts. Q&A and breaks are inserted between each part. You can actively interrupt me for asking any question!!
  2. 9 All candidates can be solution without prior ! Quiz

    Candidates Signal A Signal B Signal C Signal D … … Prior information plays a role in selecting a solution from candidates - Example 1: Smoothness prior - This may select Signal A - Example 2: Periodicity prior - This may select Signal D PRIOR IS ALL YOU NEED !
  3. 10 Image restoration problems Typical image restoration problems Image denoising

    Image deblurring Image inpainting Image super-resolution
  4. 11 Vector representation of an image Basically, we represent an

    image as a vector in many cases Consider making an array of all pixel values into a vector 1 2 3 . . . . . . . . . . N . . . . . . . . . . . . 1 2 3 . . . . . . . . . . N .... For example, a gray-scale image of size 100 × 100 ➔ 10000-dimensional vector 100 100 10000
  5. 12 Matrix / Tensor representation of an image We sometimes

    represent an image as a matrix / tensor Directly consider an image as multi-dimensional array Matrix representation 100 100 100 100 255 255 255 250 220 142 32 0 0 38 168 240 255 255 255 255 255 255 248 202 94 0 0 0 0 0 44 202 254 255 255 255 255 250 202 70 0 0 0 104 90 0 0 170 252 255 255 255 255 232 120 0 0 36 162 220 144 0 0 166 250 255 255 255 255 226 98 0 48 184 240 232 110 0 4 188 252 255 255 255 255 244 180 112 180 242 252 200 40 0 96 226 255 255 255 255 255 255 242 226 242 254 240 142 0 0 172 248 255 255 255 255 255 255 255 255 255 254 208 56 0 82 220 254 255 255 255 255 255 255 255 255 255 248 164 0 0 154 244 255 255 255 255 255 255 255 255 255 254 226 98 0 52 206 254 255 255 255 255 255 255 255 255 255 246 172 8 0 144 240 255 255 255 255 255 255 255 255 255 254 218 90 0 68 210 252 255 255 255 255 255 255 255 255 255 236 144 0 0 140 226 238 238 234 228 240 254 255 255 255 244 170 6 0 0 86 116 120 118 112 122 192 244 255 255 254 208 56 0 0 0 0 0 0 0 0 46 168 240 255 255 252 186 24 0 0 2 38 58 76 110 134 174 224 252 255 Tensor representation Hyper-spectoral tensor RGB image MRI
  6. 13 Image denoising Observation model of noisy image = Noisy

    image Clean image + Noise = + known unknown unknown estimate
  7. 14 Image inpainting Observation model of incomplete image = Incomplete

    image Mask image Clean image = known unknown Hadamard (entry-wise) product known estimate
  8. 15 Image deblurring Observation model of blurred image = Blurred

    image Blur kernel Clean image = known unknown known estimate
  9. 16 Image super-resolution Observation model of low-resolution image = Low

    resolution image Down sampling of high resolution image = known unknown known estimate
  10. 17 Unified formulation of observation model = + = =

    = Noise Missing Blur Down sample = + General known unknown known unknown
  11. 18 Non-uniqueness of the solution without prior Goal of image

    restoration Without prior, solution is not unique in many cases. = + known unknown known unknown estimate #Equations #Parameters is not injective or
  12. 19 Motivation of prior Equation has no unique solution. Priors

    make solution space smaller or unique. In practice, we consider the following optimization problem. : penalty function for noise prior : penalty function for image prior solution space Priors
  13. 22 Image prior Smoothness: (Total variation) Non-negativity Graph regularization A

    class of image reconstruction methods Many image reconstruction methods can be produced from above framework When both and are convex, it can be solved by convex optimizers: Alternating direction methods of multiplier (ADMM), Primal-dual hybrid gradient (PDHG) Noise prior Gaussian noise Laplacian noise Poisson noise ×
  14. 23 Variants of image priors In this tutorial, I want

    to discuss the image priors Image priors can be characterized as a combinations of two elements Representation vector matrix tensor graph function in Fourier space in Wavelet coefficient space in other linear transforms Properties etc sparsity group sparsity low-rank orthogonality smoothness non-negativity self-similarity ×
  15. 25 Sparsity Sparsity plays very important roles in data science

    Noise reduction in signal/image processing Variable selection in machine learning Meaning of sparse: “there are fewer non-zero entries in a vector” 1 0 0 0 3 0 0 2 sparse 1 2 5 4 3 2 1 2 dense
  16. 26 Measures of sparsity The simplest penalty measure of the

    sparsity is number of non-zero entries It is called as “L0-norm” Its convex relaxation is “L1-norm” Both L0-norm and L1-norm penalty prefer sparsity 1 0 0 0 3 0 0 2 1 2 5 4 3 2 1 2 sparse dense
  17. 27 Optimization Now, we consider a given vector to be

    sparse. Following two sparsity inducing operators can be obtained L0-norm minimization L1-norm minimization Hard thresholding Soft thresholding input output input output
  18. 28 Sparsity in images Is the image data sparse? Intensity

    of image is not generally sparse… Histogram of intensity
  19. 29 Sparsity of images under linear transform The image itself

    is not sparse But it is known that the image is sparse under some linear transformations Block-wise DCT Wavelet Transform Differential
  20. 30 Optimization An image restoration problem with sparse penalty under

    linear transform For example, it can be solved by ADMM (alternating direction methods of multiplier) equivalent problem augmented Lagrangian alternating update
  21. 31 Optimization An image restoration problem with sparse penalty under

    linear transform For example, it can be solved by ADMM (alternating direction methods of multiplier) equivalent problem augmented Lagrangian alternating update
  22. 32 Demo Various values of in sparse regularization under linear

    transform Differential Block- wise DCT Wavelet (Haar)
  23. 33 Demo Noisy(PSNR=22.10) Missing (50%) Block-wise DCT Wavelet Transform Differential

    PSNR=27.36 PSNR=25.54 PSNR=25.81 PSNR=24.35 PSNR=23.95 PSNR=26.50
  24. 34 Demo Blur + noise Low resolution (1/4) Block-wise DCT

    Wavelet Transform Differential PSNR=22.88 PSNR=23.04 PSNR=23.23 PSNR=23.61 PSNR=22.77 PSNR=23.87
  25. 37 Smoothness for one-dimensional signals Penalty function for smoothness It

    must prefer smooth signals and hate non-smooth signals Sum of length of difference QV (Quadratic Variation) for one dimensional signal TV (Total Variation) for one dimensional signal smooth not smooth 1 1 1 1 1 1 1 1 4 3 5 2 0 5 4 1
  26. 38 Image as multi-dimensional function (or scalar field) Smoothness is

    evaluated by a sum of length of gradient field (vector field) Smoothness for multi-dimensional signals
  27. 39 Smoothing is the minimization of gradient strength It can

    be seen as block sparseness of images under linear transform Normal sparse is “there are fewer non-zero entries in a vector” Block sparse is “there are fewer non-zero vectors in a matrix” For example, Block sparseness under linear transform zero vectors non-zero vectors
  28. 40 QV regularization for images Gradient vectors are obtained by

    linear transformation of an image QV regularization problem is a minimization of quadratic function
  29. 41 TV regularization for images Furthermore, we consider convex conjugate

    of l2-norm For example, it can be solved by Primal-Dual Hybrid Gradient (PDHG) Equivalent min-max problem alternating update
  30. 42 TV regularization for images Furthermore, we consider convex conjugate

    of l2-norm For example, it can be solved by Primal-Dual Hybrid Gradient (PDHG) Equivalent min-max problem alternating update
  31. 45 Demo Blur + noise Low resolution (1/4) TV QV

    PSNR=25.97 PSNR=24.30 PSNR=23.11 PSNR=23.14
  32. 47 Non-local similarity / self-similarity Non-local similarity is a nature

    of images of “There are many similar patches (blocks) in a single image” Many images are constructed by planes, lines, and textures. It applies to both images of natural and artificial objects. similar
  33. 48 Typical methods Non-local means (NLM) filter, BM3D are typical

    methods for image denoising using non-local similarity prior of images. In roughly , NLM/BM3D applies the following processes for all patches: Select a reference patch from the image Extract a similar patches from the image by template matching Denoise the reference patch by weighted average of similar patches The denoised patch is returned to original position in the image ※ This is not strict explanation, but we use it for easy or intuitive understanding. ※ reference patch similar patches by template matching weighted average or filter
  34. 49 Actually, non-local filter is very simple (it is just

    a linear transform) The weight matrix is important. represents the weight of connection between i-th and j-th pixels. For weighted mean must be satisfied. Calculation method of determines the nature of the filter For example, it is calculated by i-th and j-th patches Mathematical formulation of non-local filter i-th patch j-th patch normalization constant
  35. 50 For a simple example, we consider one-dimensional time-series Two

    filters are prepared: Local filter vs non-local filter Original Non-local filter constructed by noisy signal Local filter (moving average) Noisy High frequency components disappear Noise is reduced successfully
  36. 51 Graph perspective of non-local filter Weight matrix represents graphical

    structure of pixels Node : pixel Edge : similarity between pixels In a sense, non-local is a re-definition of locality based on the graph local connection 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 non-local connection Isotropic diffusion An-isotropic diffusion
  37. 53 Demo Comparison with TV regularization in denoising It can

    not directly apply other tasks because template matching is required. Noisy(PSNR=22.10) TV NLM PSNR=27.28 PSNR=27.01
  38. 54 Image restoration problem is formulated. Classical prior based methods

    are introduced Sparseness Hard / Soft-thresholding Sparse regularization under linear transform Smoothness Quadratic variation Total variation Non-local similarity Non-local filter Summary of Part I = + known unknown known unknown estimate
  39. 56 Variants of image priors Image priors can be characterized

    as a combinations of two elements From now, we focus on matrix and tensor representation and its low-rank approximation Representation vector matrix tensor graph function in Fourier space in Wavelet coefficient space in other linear transforms Properties etc sparsity group sparsity low-rank orthogonality independency non-negativity self-similarity ×
  40. 57 Matrices / Tensors for Image Restoration Overview concept of

    this part There are various low-rank models for tensors Matrix factorization Nuclear norm regularization CP decomposition, Tucker decomposition TT decomposition, TR decomposition etc = Tensor representation Low-rank approximation
  41. 58 Recommended Book “Tensors for Data Processing: Theory, Methods, and

    Applications” Edited by Prof. Yipeng Liu Recommended Chapter
  42. 59 Variables having indices Generalization of scalars, vectors, and matrices

    Ex: Number of indices : “Order” Scalar : 0th order tensor Vector : 1st order tensor Matrices : 2nd order tensor … Tensors
  43. 60 Multi-way array 0th order tensors : 1st order tensors

    : 2nd order tensors : 3rd order tensors : Tensor data Vector → array of scalars Matrix → array of vectors 3rd order Tensor → array of matrices
  44. 61 Let us imagine 3rd order tensor by a “cube”

    4th order tensor A block 3rd order tensor of which elements are vectors A block matrix of which elements are matrices A block vector of which elements are 3rd order tensors 5th order tensor A block 4th order tensor of which elements are vectors A block 3rd order tensor of which elements are matrices A block matrix of which elements are 3rd order tensors A block vector of which elements are 4th order tensors Nth order tensor A block vector of which elements are (N-1)th order tensors Recursive representations of Nth order tensor ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・・・ ・・・ ・・・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・・・ ・・・ ・・・ ・ ・ ・ Block matrix Block matrix Block vector Block vector
  45. 62 Psychometrics (1960s-) Linguistics (1970s-) Chemometrics (1980s-) Food industry (1990s-)

    Computer Science / Data Engineering (1990s-) Multi-media signal processing (audio, speech, image, video) Bio-medical signal processing (EEG, MEG, functional MRI, diffusion MRI) Bio-informatics (gene, protein) Remote sensing (Hyperspectral image) Web/data mining (SNS analysis, recommender system) Non-linear time series analysis Wireless communications Deep learning etc Fields of applications ID 001 ID 002 ID 003 ID 004
  46. 64 Matrix / Tensor representation of an image We consider

    an image as a matrix Observation model is written by 100 100 100 100 255 255 255 250 220 142 32 0 0 38 168 240 255 255 255 255 255 255 248 202 94 0 0 0 0 0 44 202 254 255 255 255 255 250 202 70 0 0 0 104 90 0 0 170 252 255 255 255 255 232 120 0 0 36 162 220 144 0 0 166 250 255 255 255 255 226 98 0 48 184 240 232 110 0 4 188 252 255 255 255 255 244 180 112 180 242 252 200 40 0 96 226 255 255 255 255 255 255 242 226 242 254 240 142 0 0 172 248 255 255 255 255 255 255 255 255 255 254 208 56 0 82 220 254 255 255 255 255 255 255 255 255 255 248 164 0 0 154 244 255 255 255 255 255 255 255 255 255 254 226 98 0 52 206 254 255 255 255 255 255 255 255 255 255 246 172 8 0 144 240 255 255 255 255 255 255 255 255 255 254 218 90 0 68 210 252 255 255 255 255 255 255 255 255 255 236 144 0 0 140 226 238 238 234 228 240 254 255 255 255 244 170 6 0 0 86 116 120 118 112 122 192 244 255 255 254 208 56 0 0 0 0 0 0 0 0 46 168 240 255 255 252 186 24 0 0 2 38 58 76 110 134 174 224 252 255
  47. 65 Matrix rank Rank-1 matrix A matrix obtained by outer-product

    of two any non-zero vectors Matrix rank The minimum number of rank 1 matrices required to make for any  All columns have same direction  All rows have same direction
  48. 66 2 An example of matrix rank Both matrices are

    constructed by same entries but rank are different 3 2 1 6 4 2 9 6 3 3 2 1 6 3 4 6 9 3 2 1 1 2 3 × 0 0 1 1 4 9 × 2 3 6 1 1 0 × + rank is 1 rank is 2
  49. 67 SVD: Singular Value Decomposition SVD is a fundamental method

    for matrix analysis Matrix rank can be obtained by number of non-zero singular values = Left singular vectors Left singular vectors Diagonal matrix of singular values
  50. 69 Low-rank nature of images Singular values of an image

    Low rank approximations using truncated SVD Singular values Rank 1 Rank 5 Rank 10 Rank 30 Rank 50
  51. 70 Low-rank nature of images (cont’d) Similarity between fibers is

    implied Spatial continuity is ignored Matrix rank has invariance in switching columns or rows Similar Similar ≈ ≈ Switch R Rank is R Rank is R R R R R R Rank is R ≈
  52. 71 Formulation We consider the following low-rank optimization Convex relaxation

    Equivalent often used It is referred as trace norm or nuclear norm or Schatten-1 norm Equivalent
  53. 72 We define proximal operator of nuclear norm This operation

    is known as singular value shresholding It can be regarded as soft thresholding of singular values Singular value shresholding Use SVD of Y Soft thresholding input output Solution is given by
  54. 73 Low-rank image restoration Nuclear norm regularization Solution algorithm: ADMM

    equivalent problem augmented Lagrangian alternating update
  55. 74 Low-rank image restoration Nuclear norm regularization Solution algorithm: ADMM

    equivalent problem augmented Lagrangian alternating update
  56. 76 Demo Noisy(PSNR=22.10) PSNR=24.10 Missing (50%) Blur + noise Low

    resolution (1/4) PSNR=23.04 PSNR=24.12 PSNR=22.02
  57. 78 Tensor rank vs Matrix rank Unlike matrix ranks, tensor

    ranks have many variants Tensor ranks are based on its corresponding decomposition models Decomposition Rank CP decomposition (CPD) CP rank Tucker decomposition (TKD) Tucker rank Tensor train decomposition (TTD) TT rank Tensor ring decomposition (TRD) TR rank
  58. 79 Tensor decomposition is a higher order generalization of matrix

    decomposition General definition may be “a tensor represented by multiple tensors” A specific decomposition is given by a parametric model f and its parameter tensors. f( , , ) Tensor decompositions = Matrix decomposition = core tensor factor matrix
  59. 80 Tensor based image restoration There are many studies of

    low-rank based tensor data recovery Low-n-rank tensor recovery [Gandy+, 2011 (in Iverse Problems)] Low rank tensor completion [Liu+, 2012 (in IEEE-TPAMI)] Bayesian CP decomposition [Zhao+, 2015 (in IEEE-TPAMI)] t-SVD [Zhang+, 2017 (in IEEE-TSP)] Low-rank and smooth tensor recovery [Yokota+, 2017 (in CVPR)] [Yokota+, 2019 (in IEEE-Access)] Tensor ring completion [Wang+, 2017 (in ICCV)] Note that most of them are focusing on image inpainting (tensor completion) Therefore, we introduce on image inpainting methods in this part
  60. 81 Estimation of missing values, it can be also regarded

    as regression Image inpainting (revisit) Incomplete Tensor Complete Tensor x y z v 1 1 1 1.2 2 1 1 2.1 3 1 1 ? 1 2 1 ? 2 2 1 1.5 3 2 1 ? … … … … … … … … Nx Ny Nz 3.1 Learn tensor by parametric model x y z v 1 1 1 1.2 2 1 1 2.1 3 1 1 1.5 1 2 1 3.1 2 2 1 1.5 3 2 1 2.5 … … … … … … … … Nx Ny Nz 3.1
  61. 82 Two approaches for tensor completion Rank minimization by using

    penalty function (nuclear norms) When is convex, it can be solved by convex optimizer such as ADMM, PDS. Obtaining explicit low-rank tensor decompositions Incomplete tensor Projection operator Set of observed entries Entry of : Set of core tensor and factor matrices : Penalty for core tensor and factor matrices such as orthogonality, sparseness, smoothness
  62. 84 Tensor ranks Two typical tensor ranks CP rank: R

    Minimum number of rank-1 tensors to construct a target tensor Multi-linear tensor rank (Tucker rank): (R1, R2, R3) Ranks of matrices obtained by individual mode unfolding of a target tensor The 1st mode rank The 2nd mode rank The 3rd mode rank 1st mode 2nd mode 3rd mode = rank-1 tensor rank-1 tensor rank-1 tensor R target tensor target tensor R1 R2 R3
  63. 85 Two definitions of tensor nuclear norms Tensor nuclear norm

    based on CP decomposition (CPD) Tensor nuclear norm based on Tucker decomposition (TKD) Tensor nuclear norms Since obtaining CPD is NP-hard, it is not useful. n-th mode unfolding It is convex function, and quite useful.
  64. 86 Formulation [Gandy+, 2011 (in Inverse Problems)] [Liu+, 2012 (in

    IEEE-TPAMI)] It can be solved by ADMM LRTC: Low rank tensor completion equivalent problem augmented Lagrangian alternating update 1st mode 2nd mode 3rd mode target tensor Projection for X Singular value thresholding for Y Update for Z Indicator function
  65. 87 tSVD: tensor SVD Tensor SVD [Zhang+, 2017 (in IEEE-TSP)]

    It is a convolutional tensor decomposition model based on t-product It can be obtained by sequential steps = t-product DFT along the 3rd mode SVD of each frontal slices Inverse DFT
  66. 88 tSVD: tensor SVD (cont’d) TNN: Tensor nuclear norm [Zhang+,

    2017 (in IEEE-TSP)] A tensor nuclear norm based on tSVD Tubal nuclear norm is same as a matrix nuclear norm of its block circulant one. = Block circulant matrix Block diagonalization DFT blcirc
  67. 89 tSVD: tensor SVD (cont’d) Singular value thresholding for tSVD

    It can be obtained by the following steps: DFT → SVD → Soft-thresh → inverse DFT DFT along the 3rd mode SVD of each frontal slices Inverse DFT Soft thresholding = =
  68. 90 Formulation It can be solved by ADMM tSVD: tensor

    SVD (cont’d) equivalent problem augmented Lagrangian alternating update : complement of
  69. 91 TKD and tSVD based tensor nuclear norm TKD based

    nuclear norm tSVD based nuclear norm Comparison between TKD and tSVD SVD SVD
  70. 93 Demo Color image inpainting (tensor completion) tSVD based nuclear

    norm TKD based nuclear norm matrix nuclear norm (for each color frame) PSNR = 23.20 PSNR = 25.74 PSNR = 26.66
  71. 95 TD models Canonical Polyadic (CPD) Tucker (TKD) Tensor Train

    (TT) Tensor Ring (TR) Tensor decomposition (TD) approach Unified notation of a tensor decomposition (TD) Reconstruction tensor is parameterized by
  72. 96 Unified notation of a tensor decomposition In general, cost

    function is non-convex with respect to But, it is convex with respect to each ALS can be applied to wide range of TD models (CPD, TKD, TT, and TR) Alternating least squares (ALS) for TD …
  73. 97 Cost function for TD based tensor completion Majorization-Minimization (MM)

    algorithm can be applied The following auxiliary function is employed It satisfies , Update by (ALS can be applied!!) We have TD for Tensor Completion
  74. 98 Rank increment algorithm Rank determination (size of core tensor)

    is a critical problem in TD Rank increment strategy works well!! Start from R=1, optimization, RR+1, optimization, RR+1, …, until converge RR+1 Rank increment Optimize
  75. 99 Demo Color image inpainting (tensor completion) CPD with rank

    increment TKD with rank increment PSNR = 23.09 PSNR = 26.02 Incomplete
  76. 101 Various priors can be combined at the same time

    Penalty function based approach TD based approach Low-rank, smoothness, sparseness, non-negativity, etc Typical combinations is Low-rank and Smoothness Multiple priors
  77. 102 LRTV: Low-rank and TV minimization LRTV prior [Yokota+, 2017

    (in CVPR)] [Yokota+, 2019 (in IEEE-Access)] Smoothness (TV) Low-rank (TKD based nuclear norm) Range constraint (0~255) Noise inequality Primal-Dual Splitting (PDS) Algorithm
  78. 103 SPC: Smooth CPD based tensor completion Smooth CPD [Yokota+,

    2016 (in IEEE-TSP)] = ・・・ height width rgb Consistent + + ◦ ◦ ◦ ・・・ + + + + = smooth components Smoothness (QV) for factor matrix CPD model Square errors MM algorithm ALS algorithm Rank increment
  79. 104 Demo Combinations of low-rank and smoothness priors Smooth CPD

    (PSNR=28.29) Normal CPD (PSNR=26.02) LRTV (PSNR=26.86) Only LR (PSNR=25.74) Only TV (PSNR=26.21) tSVD (PSNR=26.66) Penalty Function Tensor Decomp.
  80. 106 Limitation of direct representation Revisiting signal restoration using tensor

    decomposition When whole slices are missing in a tensor, low-rank tensor decomposition does not work Random pixel missing can be recovered, but slice can not be recovered. Since slice missing deflates the matrix/tensor rank, its deflated component can not be recovered. Slices & random missing Recovered by low-rank decomposition
  81. 107 Tensorization for TD We consider a concept of “tensorize”

    and “decompose” A question here is “how to tensorize the signals” Generally, some specific knowledge on the signals is necessary to design the tensors Here, I introduce the Direct folding Ket augmentation Multi-way Hankelization signals tensors decomposition
  82. 108 Direct folding of images Image data can be naturally

    represented by matrix, but folding have some benefits for compression (256,256) Low rank approximation #param = 2048 #param = 2048 16th order tensor (2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2) Low rank approximation Folding
  83. 109 Ket augmentation (KA) KA [Bengua+, 2017 (in IEEE-TIP)] (256,256)

    #param = 2048 #param = 2048 (2,2,2,2,2,2,2,2) (2,2,2,2,2,2,2,2) 16th order tensor 8th order tensor Folding KA (4,4,4,4,4,4,4,4) Low rank approximation Low rank approximation
  84. 110 Approach to use Hankel representation Here, we introduce to

    use low-rank tensor decomposition not in direct tensor space, but in embedded (Hankel) tensor space. height width rgb Low-rank tensor completion Delay embedding / Hankelization High order incomplete tensor Low-rank tensor completion Inverse operation output High order completed tensor  ☺
  85. 111 Here, I introduce a technique from dynamical system analysis.

    For a time series: for We consider a state space : for It is called as “Delay-embedding (DE)” What is delay-embedding (DE)
  86. 112 DE is a transformation from a vector to a

    Hankel matrix. (i.e. Hankelization) Hankel matrix : a matrix of which anti-diagonal elements are the same Delay-embedding (DE) is Hankelization
  87. 114 Inverse of delay-embedding (DE) Can we define the inverse

    of DE? There is no inversion of DE since it is essentially a duplication We only consider the pseudo-inverse of DE folding unfolding duplicate … average … average duplicate …  ☺ ?
  88. 115 SVD of Hankel matrix Seeing singular values of Hankel

    matrix with various value of Hankel matrix has a low-rank structure 3d-plot of DE(x,τ=3) and its principal patterns Order of singular values
  89. 116 SVD of Hankel matrix (cont’d) Seeing left- and right-

    singular vectors of Hankel matrix Left singular vectors are very similar to discrete cosine transform (DCT) basis
  90. 117 Time-series recovery by using truncated SVD missing&noisy D.E. missing&noisy

    Hankel matrix low rank approximation Inverse D.E. recovered Completion
  91. 118 Considering “both” case for matrix delay-embedding Each column is

    Hankelized by DE It obtains time-series of Hankel matrices Further DE obtains “block Hankel matrix” Block Hankel matrix is defined as a block matrix which has Hankel structure and its all elements are also Hankel matrices Block Hankel structure … … … … … … … … … … DE DE a b c d z A B C D A B C B C D C D E Z Z
  92. 119 DE = linear duplication + folding Multiway delay-embedding =

    multilinear duplication + folding We call it as “Multiway delay-embedding transform (MDT)” MDT: A generalization of DE for tensors = I τ*(I-τ+1) I-τ+1 τ = 3rd order tensor (I1, I2, I3) 6th order tensor
  93. 120 DE vs Inverse DE MDT vs Inverse MDT Inverse

    MDT 6th order tensor † † † = 3rd order tensor (I1, I2, I3) 6th order tensor (I1, I2, I3) 3rd order tensor
  94. 121 MDT of Image and its high order SVD Seeing

    the components of Hankel tensor [Yokota+, 2018 (in APSIPA)] Very similar to DCT Bases
  95. 122 Application to Tensor Completion Incomplete tensor Incomplete block Hankel

    tensor Complete block Hankel tensor Complete tensor Low-rank Tucker decomposition in embedded space [Yokota+, 2018 (in CVPR)] ① MDT of given incomplete tensor of original size (Nth order ➔ 2Nth order) ② Tucker decomposition of incomplete block Hankel tensor ③ Inverse MDT of comlete block Hankel tensor (2Nth order ➔ Nth order) MDT Inv. MDT Tucker Decomp.
  96. 123 Tucker decomposition of complete tensor was explained before Here,

    I explain the method for Tucker decomposition of incomplete tensor Optimization problem can be given by It can be solved by auxiliary function approach and ALS algorithm Update Tucker decomposition of an incomplete tensor : Incomplete tensor : binary tensor of same size to → 0 represents missing entry → 1 represents observed entry ALS to minimize alternative optimization To convergence ↑ it is same as Tucker for complete tensor
  97. 124 (256,256,3) (32,225,32,225,3) (R1,R2,R3,R4,3) (256,256,3) Experimental results (32,32) → (R1,R3)

    (225,225) → (R2,R4) 16 32 64 4 8 Slice missing was recovered with appropriate rank decomposition ➔ But the best rank is unknown
  98. 125 Adaptive rank increment algorithm To be free from the

    rank estimation problem in tensor completion, we propose to use rank increment algorithm
  99. 127 Demo Low-rank decomposition under tensorization CPD with Rank inc.

    99% missing TKD with Rank inc. Raw tensor (256,256,3) Tensorization (8,8,4,8,8,4,3) KA (16,16,16,16,3) MDT (32,225,32,225,3)
  100. 128 Demo Comparison with various methods 50% missing tSVD LRTV

    SPC MDT-TKD 70% missing 90% missing 99% missing LR TV CPD TKD 26.67 21.71 17.10 9.96 25.86 21.98 16.92 7.88 26.21 23.17 19.32 13.97 26.89 23.69 19.53 13.97 26.18 22.05 17.05 6.06 23.50 20.27 16.44 10.60 28.32 25.50 21.46 15.91 23.50 23.44 20.28 16.59
  101. 129 Summary of Part II Prior based image restoration under

    matrix and tensor representation Penalty function based approach Matrix nuclear norm works well for low-rank prior Several nuclear norm for tensors TKD based nuclear norm tSVD based nuclear norm Their problem can be solved by ADMM or PDS TD based approach There are many TD models, TD is solved by ALS Tensor completion can be solved by MM Rank increment works well for adaptive rank selection Low-rank prior under tensorization is hot now.
  102. 131 Neural Networks Mathematical model of a function that outputs

    a signal obtained by repeatedly applying linear operations and nonlinear activations to an input signal. Various functions can be expressed by adjusting the weight matrix used for the linear operation of each layer as a parameter. Universal approximation ability A two-layer NN consisting of an infinite number of nodes can approximate any function Deep structure It is good for approximating complicated functions It helps supervised and unsupervised learning using large amount of data Neural Networks and Machine Learning
  103. 132 Assumption: There is a large amount of pairs of

    input x and output y (training data) Learning process Step 1: Design mathematical model Step 2: Fit the mathematical model to training data Trained model is obtained as follow Supervised learning 〇 : training samples
  104. 133 Supervised learning: example Image Recognition Targeted Function When an

    image x is inputted, then class y is outputted: y=f(x) “DOG” “CAT” input output
  105. 134 Supervised learning: example Age estimation Targeted System When a

    facial image x is inputted, then estimated age y is outputted: y=f(x) 7 years old 45 years old input output
  106. 135 Supervised learning: The impact of deep learning Large Scale

    Visual Recognition Challenge 2012 A huge database of images: ImageNet Train about 1.2 million image data Compete for 1000 class classification tasks Deep learning Univ. Toronto
  107. 136 Convolutional Neural Networks (CNNs) CNN at the time of

    2012 consists of 8 layers CNN at the time of 2015 consists of 152 layers!! ※below figure is 34 layers
  108. 137 Unsupervised learning Variational Auto-Encoder (VAE) [Kingma+, 2013] Generate arbitrary

    images which included in a specific domain Generated facial images by learned model Generated hand-written Digits by learned model
  109. 138 Unsupervised learning (cont’d) Variational Auto-Encoder (VAE) [Kingma+, 2013] Images

    are mapped to the appropriate coordinates in the latent space. Due to the low dimensionality of the latent space, similar images are closely located. Continuous changes of images on the latent space can be seen ・・・ ・・・ Latent space Encoder Decoder
  110. 139 Unsupervised learning (cont’d) Generative Adversarial Nets (GAN) [Goodfellow+, 2014]

    It performs adversarial training with a generator and a discriminator. Generator: Optimize to generate fake images so that it can deceive the discriminator Discriminator: Optimize to accurately classify fake images and real images Latent space Normal distribution Discriminator Generator (Decoder) Fake Real Real or Fake It is like a game of AI vs AI
  111. 140 Unsupervised learning (cont’d) StyleGAN [Karras+, 2019] [demo] [github] Very

    large GAN with various improvements Learned 70,000 images with a size of 1024 x 1024 StyleGAN
  112. 142 Machine Learning (ML) framework Learning a function that inputs

    corrupted image and outputs clean image Corrupted images are created by a pseudo manner from clean image database Optimize parametric function to fit the training data Corrupted image Clean image
  113. 143 ML based Super-Resolution SRCNN [Dong+, ECCV2014], [Dong+, TPAMI2015], SRResNet

    [Ledig+, CVPR2017] Create many low-resolution and high-resolution image pairs and learn the input / output relationship with a CNN High resolution image Low resolution image
  114. 144 ML based Super-Resolution SRGAN [Ledig+, CVPR2017] Extension of SRCNN,

    SRResNet with adversarial regularization SR image (fake) LR image {True, Fake} HR image (real) Content loss (VGG) Adversarial regularizer - G deceives D - D keeps accuracy Content loss for SRResNet
  115. 145 PULSE [Menon+, CVPR2020] It explicitly use data-driven prior for

    image restoration Training phase: Learn generative model from set of clean images Image restoration based on data-driven prior: ML based Super-Resolution Data-driven prior sphere surface Error bound
  116. 146 ML based Super-Resolution PULSE [Menon+, CVPR2020] PULSE with StyleGAN

    StyleGAN [Karras+, CVPR2019] PULSE [Menon+, CVPR2020]
  117. 147 Discussions about the dada-driven prior Almost all dataset have

    bias Skin color, hair color, hairstyle, glasses, gender, age, clothes, accessories, etc StyleGAN can generate high quality HR images However, these generated images are still in some specific domain (not universal) original Down scaled Upscaled by PULSE ※Demo code is available at [github]
  118. 149 Deep Image Prior [Ulyanov+, CVPR2018] No training data !

    Only CNN structure becomes image prior ! Deep Image Prior Minimize error mask Corrupted image output Input (noise) Randomly initialized CNN
  119. 150 Deep Image Prior: Optimization behavior Deep Image Prior [Ulyanov+,

    CVPR2018] Prepare CNN with randomly initialized weights Update weights by minimizing least squares error
  120. 151 Deep Image Prior: Noise Reduction Early stopping is necessary

    for noise reduction Noisy image 0 iterations 100 iterations 1200 iterations 1800 iterations
  121. 152 Deep Image Prior: PET reconstruction In PET reconstruction, we

    minimize KL divergence with Radon transform Observed sinogram Noise (fixed) optimize Reconstructed PET Reconstructed sinogram Evaluate by KL div. Iter=10 SN=1.8 Iter=100 SN=12.2 Iter=600 SN=16.2 Iter=2600 SN=18.8 Iter=10000 SN=10.2 Iter=5000 SN=17.4 T. Yokota+, “Dynamic PET image reconstruction using NMF incorporated with Deep Image Prior." ICCV, 2019.
  122. 153 Why noise can be reduced by CNNs ? Explanation

    in [Ulyanov+, 2018] Noise impedance was shown Four targets were tested to reconstruct Natural image Natural image with noise Natural image with pixel shuffle Uniform noise Convergence speeds were different Natural image was faster Noise was slower DIP exploits the delay of times However, it is just a report of phenomenon Question: Why CNNs? Why convolution?
  123. 155 Discussions Deep Image Prior Fully Convolutional Net (architecture) itself

    has some image prior However, the prior is difficult to be explained with words MDT × Low-rank model Convolution=DE + linear transform It has some relationship with CNN
  124. 157 Discussions (cont’d) Multiway delay-embedding transform (MDT) is a patch

    extractor It produces Multilevel Block Hankel Matrix (256, 256) (32, 225, 32, 225) (32*32, 225*225) (32, 32, 225, 225) MDT mode permutation & reshaping … 32 225 225 32 Two level block Hankel matrix
  125. 158 Discussions (cont’d) Multiway delay-embedding transform (MDT) is a patch

    extractor It produces Multilevel Block Hankel Matrix Its linear transform is equivalent to convolution Conv = MDT + Linear CNN = (Conv + Act)*L MLP = (Linear + Act)*L MLP is explainable as manifold learning Here, we consider MDT + MLP … * = = MDT reshape Input Convolution Activation Convolution Activation … Output Convolution Convolution Activation CNN structure
  126. 159 Manifold Modeling in Embedded Space (MMES) A new CNN

    structure using MDT [Yokota+, 2020 (in IEEE-TNNLS)] Simple structure (MDT ➔ MLP ➔ inverse MDT) Each column vector (image patch) is independently transformed by MLP MLP is trained as Auto-Encoder (AE), and it can be interpreted as manifold learning In practice, Denoising-Auto-Encoder (DAE) is employed MDT … … Inv MDT Multi Layer Perceptron (MLP) Train as Denoising-Auto-Encoder (DAE)
  127. 160 Manifold Modeling in Embedded Space (MMES) MMES can be

    interpreted as optimizing the following problem Prior in MMES is the low-dimensionality of the patch manifold MDT Prior: Low dimensional patch manifold … … vectorize
  128. 161 Manifold Modeling in Embedded Space (MMES) MMES can be

    interpreted as optimizing the following problem Prior in MMES is the low-dimensionality of the patch manifold Replacing the manifold with MLP(AE), it can be transformed as MDT Prior: Low dimensional patch manifold Error term Regularization term
  129. 162 Conceptual illustration Compressed in low-dimensional space Patches +noise +noise

    +noise … … Reconstructed patches Patch aggregation Patch extraction ・Compression works well when many similar patches ➔ Low-dimensionality can be image prior ➔ It can be regarded as non-local similarity ・Denoising-Auto-Encoder(DAE) learns low- dimensional manifold from patch distribution Data Fidelity Corrupted image Interpretation of DAE [Vincent+, 2010]
  130. 163 Implementation of MMES: Inpainting It can be implemented using

    Deep Learning toolbox (TensorFlow etc) Cost function is minimized by using Adam optimizer Various task can be solved by same framework MDT … … Inv MDT AE loss loss mask
  131. 164 It can be implemented using Deep Learning toolbox (TensorFlow

    etc) Cost function is minimized by using Adam optimizer Various task can be solved by same framework Implementation of MMES: Deblurring MDT … … Inv MDT AE loss loss blur
  132. 165 It can be implemented using Deep Learning toolbox (TensorFlow

    etc) Cost function is minimized by using Adam optimizer Various task can be solved by same framework Implementation of MMES: Super-Resolution MDT … … Inv MDT AE loss loss Down simple
  133. 166 Preliminary experiments (signal recovery) Recovery of a noisy and

    incomplete one-dimensional signal Original Observed Low-rank manifold
  134. 168 Two dimensional space on the patch manifold Learning two-dimensional

    latent space in AE from an incomplete image 168
  135. 169 Comparison of various image priors DIP and MMES work

    well Low-rank priors Low-rank &smoothness priors Deep image prior Sparse modeling Low-rank MDT Patch manifold 99%
  136. 175 Summary of Part III Two approaches use NN Learning

    data-driven prior from a large amount of data SRCNN, SRGAN, PULSE Network structure (CNN) provides image prior Deep image prior, Patch manifold model ML based image restoration has much attentions Data-driven DIP Patch manifold Accuracy ◎ ◦ ◦ Computation Fast Slow Slow Training data Necessary Not necessary No necessary Interpretability Depend on data Unclear Clear
  137. 176 Closing tutorial I’d like to show my appreciation to

    organizers of APSIPA ASC 2021 and all participants!! In this tutorial, I tried to summarize prior-based image restoration methods from multiple perspectives: classical methods (sparseness, smoothness, and non-local similarity priors), tensor methods, and neural network methods. In my opinion, two important points in the study of image restoration are a technical viewpoint about the behavior of mathematical models and optimization algorithms a realistic viewpoint for digging into the essential nature of images I hope this tutorial helps your study in the future. Basics Tensors Neural Nets