Advanced Topics of Prior-based Image Restoration: Tensors and Neural Networks (Tutorial at APSIPA-ASC 2021)

Advanced Topics of Prior-based Image Restoration Tensors and Neural Networks
Tatsuya Yokota Tutorial in APSIPA-ASC 2021

2 Organization of This Tutorial Basics & Typical methods Tensors
Neural Networks This tutorial consists of 3 parts. Q&A and breaks are inserted between each part. You can actively interrupt me for asking any question!!

3 3 Introduction What is image restoration problem Basics &
Typical methods

4 Quiz What’s in the blank? ？

5 Quiz What’s in the blank? ➔ How is it?

9 All candidates can be solution without prior ! Quiz
Candidates Signal A Signal B Signal C Signal D … … Prior information plays a role in selecting a solution from candidates - Example 1: Smoothness prior - This may select Signal A - Example 2: Periodicity prior - This may select Signal D PRIOR IS ALL YOU NEED !

10 Image restoration problems Typical image restoration problems Image denoising
Image deblurring Image inpainting Image super-resolution

11 Vector representation of an image Basically, we represent an
image as a vector in many cases Consider making an array of all pixel values into a vector 1 2 3 . . . . . . . . . . N . . . . . . . . . . . . 1 2 3 . . . . . . . . . . N .... For example, a gray-scale image of size 100 × 100 ➔ 10000-dimensional vector 100 100 10000

12 Matrix / Tensor representation of an image We sometimes
represent an image as a matrix / tensor Directly consider an image as multi-dimensional array Matrix representation 100 100 100 100 255 255 255 250 220 142 32 0 0 38 168 240 255 255 255 255 255 255 248 202 94 0 0 0 0 0 44 202 254 255 255 255 255 250 202 70 0 0 0 104 90 0 0 170 252 255 255 255 255 232 120 0 0 36 162 220 144 0 0 166 250 255 255 255 255 226 98 0 48 184 240 232 110 0 4 188 252 255 255 255 255 244 180 112 180 242 252 200 40 0 96 226 255 255 255 255 255 255 242 226 242 254 240 142 0 0 172 248 255 255 255 255 255 255 255 255 255 254 208 56 0 82 220 254 255 255 255 255 255 255 255 255 255 248 164 0 0 154 244 255 255 255 255 255 255 255 255 255 254 226 98 0 52 206 254 255 255 255 255 255 255 255 255 255 246 172 8 0 144 240 255 255 255 255 255 255 255 255 255 254 218 90 0 68 210 252 255 255 255 255 255 255 255 255 255 236 144 0 0 140 226 238 238 234 228 240 254 255 255 255 244 170 6 0 0 86 116 120 118 112 122 192 244 255 255 254 208 56 0 0 0 0 0 0 0 0 46 168 240 255 255 252 186 24 0 0 2 38 58 76 110 134 174 224 252 255 Tensor representation Hyper-spectoral tensor RGB image MRI

13 Image denoising Observation model of noisy image = Noisy
image Clean image + Noise = + known unknown unknown estimate

14 Image inpainting Observation model of incomplete image = Incomplete
image Mask image Clean image = known unknown Hadamard (entry-wise) product known estimate

15 Image deblurring Observation model of blurred image = Blurred
image Blur kernel Clean image = known unknown known estimate

16 Image super-resolution Observation model of low-resolution image = Low
resolution image Down sampling of high resolution image = known unknown known estimate

17 Unified formulation of observation model = + = =
= Noise Missing Blur Down sample = + General known unknown known unknown

18 Non-uniqueness of the solution without prior Goal of image
restoration Without prior, solution is not unique in many cases. = + known unknown known unknown estimate #Equations #Parameters is not injective or

19 Motivation of prior Equation has no unique solution. Priors
make solution space smaller or unique. In practice, we consider the following optimization problem. : penalty function for noise prior : penalty function for image prior solution space Priors

20 Motivation of prior (cont’d) For example, Unique!!

21 Interpretation based on MAP estimation Maximum A Posteriori (MAP)
estimation

22 Image prior Smoothness: (Total variation) Non-negativity Graph regularization A
class of image reconstruction methods Many image reconstruction methods can be produced from above framework When both and are convex, it can be solved by convex optimizers: Alternating direction methods of multiplier (ADMM), Primal-dual hybrid gradient (PDHG) Noise prior Gaussian noise Laplacian noise Poisson noise ×

23 Variants of image priors In this tutorial, I want
to discuss the image priors Image priors can be characterized as a combinations of two elements Representation vector matrix tensor graph function in Fourier space in Wavelet coefficient space in other linear transforms Properties etc sparsity group sparsity low-rank orthogonality smoothness non-negativity self-similarity ×

24 24 Sparseness Prior Basics & Typical methods

25 Sparsity Sparsity plays very important roles in data science
Noise reduction in signal/image processing Variable selection in machine learning Meaning of sparse: “there are fewer non-zero entries in a vector” 1 0 0 0 3 0 0 2 sparse 1 2 5 4 3 2 1 2 dense

26 Measures of sparsity The simplest penalty measure of the
sparsity is number of non-zero entries It is called as “L0-norm” Its convex relaxation is “L1-norm” Both L0-norm and L1-norm penalty prefer sparsity 1 0 0 0 3 0 0 2 1 2 5 4 3 2 1 2 sparse dense

27 Optimization Now, we consider a given vector to be
sparse. Following two sparsity inducing operators can be obtained L0-norm minimization L1-norm minimization Hard thresholding Soft thresholding input output input output

28 Sparsity in images Is the image data sparse? Intensity
of image is not generally sparse… Histogram of intensity

29 Sparsity of images under linear transform The image itself
is not sparse But it is known that the image is sparse under some linear transformations Block-wise DCT Wavelet Transform Differential

30 Optimization An image restoration problem with sparse penalty under
linear transform For example, it can be solved by ADMM (alternating direction methods of multiplier) equivalent problem augmented Lagrangian alternating update

31 Optimization An image restoration problem with sparse penalty under
linear transform For example, it can be solved by ADMM (alternating direction methods of multiplier) equivalent problem augmented Lagrangian alternating update

32 Demo Various values of in sparse regularization under linear
transform Differential Block- wise DCT Wavelet (Haar)

33 Demo Noisy(PSNR=22.10) Missing (50%) Block-wise DCT Wavelet Transform Differential
PSNR=27.36 PSNR=25.54 PSNR=25.81 PSNR=24.35 PSNR=23.95 PSNR=26.50

34 Demo Blur + noise Low resolution (1/4) Block-wise DCT
Wavelet Transform Differential PSNR=22.88 PSNR=23.04 PSNR=23.23 PSNR=23.61 PSNR=22.77 PSNR=23.87

35 35 Smoothness Prior Basics & Typical methods

36 Smoothness This is more intuitive nature for images Smooth
Non-smooth

37 Smoothness for one-dimensional signals Penalty function for smoothness It
must prefer smooth signals and hate non-smooth signals Sum of length of difference QV (Quadratic Variation) for one dimensional signal TV (Total Variation) for one dimensional signal smooth not smooth 1 1 1 1 1 1 1 1 4 3 5 2 0 5 4 1

38 Image as multi-dimensional function (or scalar field) Smoothness is
evaluated by a sum of length of gradient field (vector field) Smoothness for multi-dimensional signals

39 Smoothing is the minimization of gradient strength It can
be seen as block sparseness of images under linear transform Normal sparse is “there are fewer non-zero entries in a vector” Block sparse is “there are fewer non-zero vectors in a matrix” For example, Block sparseness under linear transform zero vectors non-zero vectors

40 QV regularization for images Gradient vectors are obtained by
linear transformation of an image QV regularization problem is a minimization of quadratic function

41 TV regularization for images Furthermore, we consider convex conjugate
of l2-norm For example, it can be solved by Primal-Dual Hybrid Gradient (PDHG) Equivalent min-max problem alternating update

42 TV regularization for images Furthermore, we consider convex conjugate
of l2-norm For example, it can be solved by Primal-Dual Hybrid Gradient (PDHG) Equivalent min-max problem alternating update

43 Demo Various values of QV TV

44 Demo Noisy(PSNR=22.10) Missing (50%) TV QV PSNR=26.29 PSNR=27.28 PSNR=26.73
PSNR=25.74

45 Demo Blur + noise Low resolution (1/4) TV QV
PSNR=25.97 PSNR=24.30 PSNR=23.11 PSNR=23.14

46 46 Non-local Similarity Prior Basics & Typical methods

47 Non-local similarity / self-similarity Non-local similarity is a nature
of images of “There are many similar patches (blocks) in a single image” Many images are constructed by planes, lines, and textures. It applies to both images of natural and artificial objects. similar

48 Typical methods Non-local means (NLM) filter, BM3D are typical
methods for image denoising using non-local similarity prior of images. In roughly , NLM/BM3D applies the following processes for all patches: Select a reference patch from the image Extract a similar patches from the image by template matching Denoise the reference patch by weighted average of similar patches The denoised patch is returned to original position in the image ※ This is not strict explanation, but we use it for easy or intuitive understanding. ※ reference patch similar patches by template matching weighted average or filter

49 Actually, non-local filter is very simple (it is just
a linear transform) The weight matrix is important. represents the weight of connection between i-th and j-th pixels. For weighted mean must be satisfied. Calculation method of determines the nature of the filter For example, it is calculated by i-th and j-th patches Mathematical formulation of non-local filter i-th patch j-th patch normalization constant

50 For a simple example, we consider one-dimensional time-series Two
filters are prepared: Local filter vs non-local filter Original Non-local filter constructed by noisy signal Local filter (moving average) Noisy High frequency components disappear Noise is reduced successfully

51 Graph perspective of non-local filter Weight matrix represents graphical
structure of pixels Node : pixel Edge : similarity between pixels In a sense, non-local is a re-definition of locality based on the graph local connection 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 non-local connection Isotropic diffusion An-isotropic diffusion

52 Demo Iteration of non-local filter Iterations=100 Iterations=1000 Iterations=10000

53 Demo Comparison with TV regularization in denoising It can
not directly apply other tasks because template matching is required. Noisy(PSNR=22.10) TV NLM PSNR=27.28 PSNR=27.01

54 Image restoration problem is formulated. Classical prior based methods
are introduced Sparseness Hard / Soft-thresholding Sparse regularization under linear transform Smoothness Quadratic variation Total variation Non-local similarity Non-local filter Summary of Part I = + known unknown known unknown estimate

55 55 What and Why Tensors? Tensors

56 Variants of image priors Image priors can be characterized
as a combinations of two elements From now, we focus on matrix and tensor representation and its low-rank approximation Representation vector matrix tensor graph function in Fourier space in Wavelet coefficient space in other linear transforms Properties etc sparsity group sparsity low-rank orthogonality independency non-negativity self-similarity ×

57 Matrices / Tensors for Image Restoration Overview concept of
this part There are various low-rank models for tensors Matrix factorization Nuclear norm regularization CP decomposition, Tucker decomposition TT decomposition, TR decomposition etc = Tensor representation Low-rank approximation

58 Recommended Book “Tensors for Data Processing: Theory, Methods, and
Applications” Edited by Prof. Yipeng Liu Recommended Chapter

59 Variables having indices Generalization of scalars, vectors, and matrices
Ex: Number of indices : “Order” Scalar : 0th order tensor Vector : 1st order tensor Matrices : 2nd order tensor … Tensors

60 Multi-way array 0th order tensors : 1st order tensors
: 2nd order tensors : 3rd order tensors : Tensor data Vector → array of scalars Matrix → array of vectors 3rd order Tensor → array of matrices

61 Let us imagine 3rd order tensor by a “cube”
4th order tensor A block 3rd order tensor of which elements are vectors A block matrix of which elements are matrices A block vector of which elements are 3rd order tensors 5th order tensor A block 4th order tensor of which elements are vectors A block 3rd order tensor of which elements are matrices A block matrix of which elements are 3rd order tensors A block vector of which elements are 4th order tensors Nth order tensor A block vector of which elements are (N-1)th order tensors Recursive representations of Nth order tensor ・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・・ Block matrix Block matrix Block vector Block vector

62 Psychometrics (1960s-) Linguistics (1970s-) Chemometrics (1980s-) Food industry (1990s-)
Computer Science / Data Engineering (1990s-) Multi-media signal processing (audio, speech, image, video) Bio-medical signal processing (EEG, MEG, functional MRI, diffusion MRI) Bio-informatics (gene, protein) Remote sensing (Hyperspectral image) Web/data mining (SNS analysis, recommender system) Non-linear time series analysis Wireless communications Deep learning etc Fields of applications ID 001 ID 002 ID 003 ID 004

63 63 Matrix representation and rank Tensors

64 Matrix / Tensor representation of an image We consider
an image as a matrix Observation model is written by 100 100 100 100 255 255 255 250 220 142 32 0 0 38 168 240 255 255 255 255 255 255 248 202 94 0 0 0 0 0 44 202 254 255 255 255 255 250 202 70 0 0 0 104 90 0 0 170 252 255 255 255 255 232 120 0 0 36 162 220 144 0 0 166 250 255 255 255 255 226 98 0 48 184 240 232 110 0 4 188 252 255 255 255 255 244 180 112 180 242 252 200 40 0 96 226 255 255 255 255 255 255 242 226 242 254 240 142 0 0 172 248 255 255 255 255 255 255 255 255 255 254 208 56 0 82 220 254 255 255 255 255 255 255 255 255 255 248 164 0 0 154 244 255 255 255 255 255 255 255 255 255 254 226 98 0 52 206 254 255 255 255 255 255 255 255 255 255 246 172 8 0 144 240 255 255 255 255 255 255 255 255 255 254 218 90 0 68 210 252 255 255 255 255 255 255 255 255 255 236 144 0 0 140 226 238 238 234 228 240 254 255 255 255 244 170 6 0 0 86 116 120 118 112 122 192 244 255 255 254 208 56 0 0 0 0 0 0 0 0 46 168 240 255 255 252 186 24 0 0 2 38 58 76 110 134 174 224 252 255

65 Matrix rank Rank-1 matrix A matrix obtained by outer-product
of two any non-zero vectors Matrix rank The minimum number of rank 1 matrices required to make for any  All columns have same direction  All rows have same direction

66 2 An example of matrix rank Both matrices are
constructed by same entries but rank are different 3 2 1 6 4 2 9 6 3 3 2 1 6 3 4 6 9 3 2 1 1 2 3 × 0 0 1 1 4 9 × 2 3 6 1 1 0 × + rank is 1 rank is 2

67 SVD: Singular Value Decomposition SVD is a fundamental method
for matrix analysis Matrix rank can be obtained by number of non-zero singular values = Left singular vectors Left singular vectors Diagonal matrix of singular values

68 68 Low-rank prior for matrices Tensors

69 Low-rank nature of images Singular values of an image
Low rank approximations using truncated SVD Singular values Rank 1 Rank 5 Rank 10 Rank 30 Rank 50

70 Low-rank nature of images (cont’d) Similarity between fibers is
implied Spatial continuity is ignored Matrix rank has invariance in switching columns or rows Similar Similar ≈ ≈ Switch R Rank is R Rank is R R R R R R Rank is R ≈

71 Formulation We consider the following low-rank optimization Convex relaxation
Equivalent often used It is referred as trace norm or nuclear norm or Schatten-1 norm Equivalent

72 We define proximal operator of nuclear norm This operation
is known as singular value shresholding It can be regarded as soft thresholding of singular values Singular value shresholding Use SVD of Y Soft thresholding input output Solution is given by

73 Low-rank image restoration Nuclear norm regularization Solution algorithm: ADMM
equivalent problem augmented Lagrangian alternating update

74 Low-rank image restoration Nuclear norm regularization Solution algorithm: ADMM
equivalent problem augmented Lagrangian alternating update

75 Demo Various values of 0.01 0.25 1.0 10 25

76 Demo Noisy(PSNR=22.10) PSNR=24.10 Missing (50%) Blur + noise Low
resolution (1/4) PSNR=23.04 PSNR=24.12 PSNR=22.02

77 77 Low-rank prior for tensors Tensors

78 Tensor rank vs Matrix rank Unlike matrix ranks, tensor
ranks have many variants Tensor ranks are based on its corresponding decomposition models Decomposition Rank CP decomposition (CPD) CP rank Tucker decomposition (TKD) Tucker rank Tensor train decomposition (TTD) TT rank Tensor ring decomposition (TRD) TR rank

79 Tensor decomposition is a higher order generalization of matrix
decomposition General definition may be “a tensor represented by multiple tensors” A specific decomposition is given by a parametric model f and its parameter tensors. f( , , ) Tensor decompositions = Matrix decomposition = core tensor factor matrix

80 Tensor based image restoration There are many studies of
low-rank based tensor data recovery Low-n-rank tensor recovery [Gandy+, 2011 (in Iverse Problems)] Low rank tensor completion [Liu+, 2012 (in IEEE-TPAMI)] Bayesian CP decomposition [Zhao+, 2015 (in IEEE-TPAMI)] t-SVD [Zhang+, 2017 (in IEEE-TSP)] Low-rank and smooth tensor recovery [Yokota+, 2017 (in CVPR)] [Yokota+, 2019 (in IEEE-Access)] Tensor ring completion [Wang+, 2017 (in ICCV)] Note that most of them are focusing on image inpainting (tensor completion) Therefore, we introduce on image inpainting methods in this part

81 Estimation of missing values, it can be also regarded
as regression Image inpainting (revisit) Incomplete Tensor Complete Tensor x y z v 1 1 1 1.2 2 1 1 2.1 3 1 1 ? 1 2 1 ? 2 2 1 1.5 3 2 1 ? … … … … … … … … Nx Ny Nz 3.1 Learn tensor by parametric model x y z v 1 1 1 1.2 2 1 1 2.1 3 1 1 1.5 1 2 1 3.1 2 2 1 1.5 3 2 1 2.5 … … … … … … … … Nx Ny Nz 3.1

82 Two approaches for tensor completion Rank minimization by using
penalty function (nuclear norms) When is convex, it can be solved by convex optimizer such as ADMM, PDS. Obtaining explicit low-rank tensor decompositions Incomplete tensor Projection operator Set of observed entries Entry of : Set of core tensor and factor matrices : Penalty for core tensor and factor matrices such as orthogonality, sparseness, smoothness

83 83 Low-rank prior for tensors Approach using penalty function
Tensors

84 Tensor ranks Two typical tensor ranks CP rank: R
Minimum number of rank-1 tensors to construct a target tensor Multi-linear tensor rank (Tucker rank): (R1, R2, R3) Ranks of matrices obtained by individual mode unfolding of a target tensor The 1st mode rank The 2nd mode rank The 3rd mode rank 1st mode 2nd mode 3rd mode = rank-1 tensor rank-1 tensor rank-1 tensor R target tensor target tensor R1 R2 R3

85 Two definitions of tensor nuclear norms Tensor nuclear norm
based on CP decomposition (CPD) Tensor nuclear norm based on Tucker decomposition (TKD) Tensor nuclear norms Since obtaining CPD is NP-hard, it is not useful. n-th mode unfolding It is convex function, and quite useful.

86 Formulation [Gandy+, 2011 (in Inverse Problems)] [Liu+, 2012 (in
IEEE-TPAMI)] It can be solved by ADMM LRTC: Low rank tensor completion equivalent problem augmented Lagrangian alternating update 1st mode 2nd mode 3rd mode target tensor Projection for X Singular value thresholding for Y Update for Z Indicator function

87 tSVD: tensor SVD Tensor SVD [Zhang+, 2017 (in IEEE-TSP)]
It is a convolutional tensor decomposition model based on t-product It can be obtained by sequential steps = t-product DFT along the 3rd mode SVD of each frontal slices Inverse DFT

88 tSVD: tensor SVD (cont’d) TNN: Tensor nuclear norm [Zhang+,
2017 (in IEEE-TSP)] A tensor nuclear norm based on tSVD Tubal nuclear norm is same as a matrix nuclear norm of its block circulant one. = Block circulant matrix Block diagonalization DFT blcirc

89 tSVD: tensor SVD (cont’d) Singular value thresholding for tSVD
It can be obtained by the following steps: DFT → SVD → Soft-thresh → inverse DFT DFT along the 3rd mode SVD of each frontal slices Inverse DFT Soft thresholding = =

90 Formulation It can be solved by ADMM tSVD: tensor
SVD (cont’d) equivalent problem augmented Lagrangian alternating update : complement of

91 TKD and tSVD based tensor nuclear norm TKD based
nuclear norm tSVD based nuclear norm Comparison between TKD and tSVD SVD SVD

92 Demo Various values of in singular value thresholding 0.1
1.0 10 25 100 TKD tSVD

93 Demo Color image inpainting (tensor completion) tSVD based nuclear
norm TKD based nuclear norm matrix nuclear norm (for each color frame) PSNR = 23.20 PSNR = 25.74 PSNR = 26.66

94 94 Low-rank prior for tensors Approach using explicit decompositions
Tensors

95 TD models Canonical Polyadic (CPD) Tucker (TKD) Tensor Train
(TT) Tensor Ring (TR) Tensor decomposition (TD) approach Unified notation of a tensor decomposition (TD) Reconstruction tensor is parameterized by

96 Unified notation of a tensor decomposition In general, cost
function is non-convex with respect to But, it is convex with respect to each ALS can be applied to wide range of TD models (CPD, TKD, TT, and TR) Alternating least squares (ALS) for TD …

97 Cost function for TD based tensor completion Majorization-Minimization (MM)
algorithm can be applied The following auxiliary function is employed It satisfies , Update by (ALS can be applied!!) We have TD for Tensor Completion

98 Rank increment algorithm Rank determination (size of core tensor)
is a critical problem in TD Rank increment strategy works well!! Start from R=1, optimization, RR+1, optimization, RR+1, …, until converge RR+1 Rank increment Optimize

99 Demo Color image inpainting (tensor completion) CPD with rank
increment TKD with rank increment PSNR = 23.09 PSNR = 26.02 Incomplete

100 100 Low-rank + smoothness Tensors

101 Various priors can be combined at the same time
Penalty function based approach TD based approach Low-rank, smoothness, sparseness, non-negativity, etc Typical combinations is Low-rank and Smoothness Multiple priors

102 LRTV: Low-rank and TV minimization LRTV prior [Yokota+, 2017
(in CVPR)] [Yokota+, 2019 (in IEEE-Access)] Smoothness (TV) Low-rank (TKD based nuclear norm) Range constraint (0~255) Noise inequality Primal-Dual Splitting (PDS) Algorithm

103 SPC: Smooth CPD based tensor completion Smooth CPD [Yokota+,
2016 (in IEEE-TSP)] = ・・・ height width rgb Consistent + + ◦ ◦ ◦ ・・・ + + + + = smooth components Smoothness (QV) for factor matrix CPD model Square errors MM algorithm ALS algorithm Rank increment

104 Demo Combinations of low-rank and smoothness priors Smooth CPD
(PSNR=28.29) Normal CPD (PSNR=26.02) LRTV (PSNR=26.86) Only LR (PSNR=25.74) Only TV (PSNR=26.21) tSVD (PSNR=26.66) Penalty Function Tensor Decomp.

105 105 Low-rank prior under tensorization Tensors

106 Limitation of direct representation Revisiting signal restoration using tensor
decomposition When whole slices are missing in a tensor, low-rank tensor decomposition does not work Random pixel missing can be recovered, but slice can not be recovered. Since slice missing deflates the matrix/tensor rank, its deflated component can not be recovered. Slices & random missing Recovered by low-rank decomposition

107 Tensorization for TD We consider a concept of “tensorize”
and “decompose” A question here is “how to tensorize the signals” Generally, some specific knowledge on the signals is necessary to design the tensors Here, I introduce the Direct folding Ket augmentation Multi-way Hankelization signals tensors decomposition

108 Direct folding of images Image data can be naturally
represented by matrix, but folding have some benefits for compression (256,256) Low rank approximation #param = 2048 #param = 2048 16th order tensor (2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2) Low rank approximation Folding

109 Ket augmentation (KA) KA [Bengua+, 2017 (in IEEE-TIP)] (256,256)
#param = 2048 #param = 2048 (2,2,2,2,2,2,2,2) (2,2,2,2,2,2,2,2) 16th order tensor 8th order tensor Folding KA (4,4,4,4,4,4,4,4) Low rank approximation Low rank approximation

110 Approach to use Hankel representation Here, we introduce to
use low-rank tensor decomposition not in direct tensor space, but in embedded (Hankel) tensor space. height width rgb Low-rank tensor completion Delay embedding / Hankelization High order incomplete tensor Low-rank tensor completion Inverse operation output High order completed tensor  ☺

111 Here, I introduce a technique from dynamical system analysis.
For a time series: for We consider a state space : for It is called as “Delay-embedding (DE)” What is delay-embedding (DE)

112 DE is a transformation from a vector to a
Hankel matrix. (i.e. Hankelization) Hankel matrix : a matrix of which anti-diagonal elements are the same Delay-embedding (DE) is Hankelization

113 DE can be regarded as a linear operation. DE
= duplication & unfolding

114 Inverse of delay-embedding (DE) Can we define the inverse
of DE? There is no inversion of DE since it is essentially a duplication We only consider the pseudo-inverse of DE folding unfolding duplicate … average … average duplicate …  ☺ ?

115 SVD of Hankel matrix Seeing singular values of Hankel
matrix with various value of Hankel matrix has a low-rank structure 3d-plot of DE(x,τ=3) and its principal patterns Order of singular values

116 SVD of Hankel matrix (cont’d) Seeing left- and right-
singular vectors of Hankel matrix Left singular vectors are very similar to discrete cosine transform (DCT) basis

117 Time-series recovery by using truncated SVD missing&noisy D.E. missing&noisy
Hankel matrix low rank approximation Inverse D.E. recovered Completion

118 Considering “both” case for matrix delay-embedding Each column is
Hankelized by DE It obtains time-series of Hankel matrices Further DE obtains “block Hankel matrix” Block Hankel matrix is defined as a block matrix which has Hankel structure and its all elements are also Hankel matrices Block Hankel structure … … … … … … … … … … DE DE a b c d z A B C D A B C B C D C D E Z Z

119 DE = linear duplication + folding Multiway delay-embedding =
multilinear duplication + folding We call it as “Multiway delay-embedding transform (MDT)” MDT: A generalization of DE for tensors = I τ*(I-τ+1) I-τ+1 τ = 3rd order tensor (I1, I2, I3) 6th order tensor

120 DE vs Inverse DE MDT vs Inverse MDT Inverse
MDT 6th order tensor † † † = 3rd order tensor (I1, I2, I3) 6th order tensor (I1, I2, I3) 3rd order tensor

121 MDT of Image and its high order SVD Seeing
the components of Hankel tensor [Yokota+, 2018 (in APSIPA)] Very similar to DCT Bases

122 Application to Tensor Completion Incomplete tensor Incomplete block Hankel
tensor Complete block Hankel tensor Complete tensor Low-rank Tucker decomposition in embedded space [Yokota+, 2018 (in CVPR)] ① MDT of given incomplete tensor of original size (Nth order ➔ 2Nth order) ② Tucker decomposition of incomplete block Hankel tensor ③ Inverse MDT of comlete block Hankel tensor (2Nth order ➔ Nth order) MDT Inv. MDT Tucker Decomp.

123 Tucker decomposition of complete tensor was explained before Here,
I explain the method for Tucker decomposition of incomplete tensor Optimization problem can be given by It can be solved by auxiliary function approach and ALS algorithm Update Tucker decomposition of an incomplete tensor : Incomplete tensor : binary tensor of same size to → 0 represents missing entry → 1 represents observed entry ALS to minimize alternative optimization To convergence ↑ it is same as Tucker for complete tensor

124 (256,256,3) (32,225,32,225,3) (R1,R2,R3,R4,3) (256,256,3) Experimental results (32,32) → (R1,R3)
(225,225) → (R2,R4) 16 32 64 4 8 Slice missing was recovered with appropriate rank decomposition ➔ But the best rank is unknown

125 Adaptive rank increment algorithm To be free from the
rank estimation problem in tensor completion, we propose to use rank increment algorithm

126 ・・・・・・ Initialization is very important !! (32,225,32,225,3) Experimental
results

127 Demo Low-rank decomposition under tensorization CPD with Rank inc.
99% missing TKD with Rank inc. Raw tensor (256,256,3) Tensorization (8,8,4,8,8,4,3) KA (16,16,16,16,3) MDT (32,225,32,225,3)

128 Demo Comparison with various methods 50% missing tSVD LRTV
SPC MDT-TKD 70% missing 90% missing 99% missing LR TV CPD TKD 26.67 21.71 17.10 9.96 25.86 21.98 16.92 7.88 26.21 23.17 19.32 13.97 26.89 23.69 19.53 13.97 26.18 22.05 17.05 6.06 23.50 20.27 16.44 10.60 28.32 25.50 21.46 15.91 23.50 23.44 20.28 16.59

129 Summary of Part II Prior based image restoration under
matrix and tensor representation Penalty function based approach Matrix nuclear norm works well for low-rank prior Several nuclear norm for tensors TKD based nuclear norm tSVD based nuclear norm Their problem can be solved by ADMM or PDS TD based approach There are many TD models, TD is solved by ALS Tensor completion can be solved by MM Rank increment works well for adaptive rank selection Low-rank prior under tensorization is hot now.

130 130 Basics of neural networks and recent advances Neural
Networks

131 Neural Networks Mathematical model of a function that outputs
a signal obtained by repeatedly applying linear operations and nonlinear activations to an input signal. Various functions can be expressed by adjusting the weight matrix used for the linear operation of each layer as a parameter. Universal approximation ability A two-layer NN consisting of an infinite number of nodes can approximate any function Deep structure It is good for approximating complicated functions It helps supervised and unsupervised learning using large amount of data Neural Networks and Machine Learning

132 Assumption: There is a large amount of pairs of
input x and output y (training data) Learning process Step 1: Design mathematical model Step 2: Fit the mathematical model to training data Trained model is obtained as follow Supervised learning 〇 : training samples

133 Supervised learning: example Image Recognition Targeted Function When an
image x is inputted, then class y is outputted: y=f(x) “DOG” “CAT” input output

134 Supervised learning: example Age estimation Targeted System When a
facial image x is inputted, then estimated age y is outputted: y=f(x) 7 years old 45 years old input output

135 Supervised learning: The impact of deep learning Large Scale
Visual Recognition Challenge 2012 A huge database of images: ImageNet Train about 1.2 million image data Compete for 1000 class classification tasks Deep learning Univ. Toronto

136 Convolutional Neural Networks (CNNs) CNN at the time of
2012 consists of 8 layers CNN at the time of 2015 consists of 152 layers!! ※below figure is 34 layers

137 Unsupervised learning Variational Auto-Encoder (VAE) [Kingma+, 2013] Generate arbitrary
images which included in a specific domain Generated facial images by learned model Generated hand-written Digits by learned model

138 Unsupervised learning (cont’d) Variational Auto-Encoder (VAE) [Kingma+, 2013] Images
are mapped to the appropriate coordinates in the latent space. Due to the low dimensionality of the latent space, similar images are closely located. Continuous changes of images on the latent space can be seen ・・・・・・ Latent space Encoder Decoder

139 Unsupervised learning (cont’d) Generative Adversarial Nets (GAN) [Goodfellow+, 2014]
It performs adversarial training with a generator and a discriminator. Generator: Optimize to generate fake images so that it can deceive the discriminator Discriminator: Optimize to accurately classify fake images and real images Latent space Normal distribution Discriminator Generator (Decoder) Fake Real Real or Fake It is like a game of AI vs AI

140 Unsupervised learning (cont’d) StyleGAN [Karras+, 2019] [demo] [github] Very
large GAN with various improvements Learned 70,000 images with a size of 1024 x 1024 StyleGAN

141 141 Data-driven priors Neural Networks

142 Machine Learning (ML) framework Learning a function that inputs
corrupted image and outputs clean image Corrupted images are created by a pseudo manner from clean image database Optimize parametric function to fit the training data Corrupted image Clean image

143 ML based Super-Resolution SRCNN [Dong+, ECCV2014], [Dong+, TPAMI2015], SRResNet
[Ledig+, CVPR2017] Create many low-resolution and high-resolution image pairs and learn the input / output relationship with a CNN High resolution image Low resolution image

144 ML based Super-Resolution SRGAN [Ledig+, CVPR2017] Extension of SRCNN,
SRResNet with adversarial regularization SR image (fake) LR image {True, Fake} HR image (real) Content loss (VGG) Adversarial regularizer - G deceives D - D keeps accuracy Content loss for SRResNet

145 PULSE [Menon+, CVPR2020] It explicitly use data-driven prior for
image restoration Training phase: Learn generative model from set of clean images Image restoration based on data-driven prior: ML based Super-Resolution Data-driven prior sphere surface Error bound

146 ML based Super-Resolution PULSE [Menon+, CVPR2020] PULSE with StyleGAN
StyleGAN [Karras+, CVPR2019] PULSE [Menon+, CVPR2020]

147 Discussions about the dada-driven prior Almost all dataset have
bias Skin color, hair color, hairstyle, glasses, gender, age, clothes, accessories, etc StyleGAN can generate high quality HR images However, these generated images are still in some specific domain (not universal) original Down scaled Upscaled by PULSE ※Demo code is available at [github]

148 148 Deep Image Prior Neural Networks

149 Deep Image Prior [Ulyanov+, CVPR2018] No training data !
Only CNN structure becomes image prior ! Deep Image Prior Minimize error mask Corrupted image output Input (noise) Randomly initialized CNN

150 Deep Image Prior: Optimization behavior Deep Image Prior [Ulyanov+,
CVPR2018] Prepare CNN with randomly initialized weights Update weights by minimizing least squares error

151 Deep Image Prior: Noise Reduction Early stopping is necessary
for noise reduction Noisy image 0 iterations 100 iterations 1200 iterations 1800 iterations

152 Deep Image Prior: PET reconstruction In PET reconstruction, we
minimize KL divergence with Radon transform Observed sinogram Noise (fixed) optimize Reconstructed PET Reconstructed sinogram Evaluate by KL div. Iter=10 SN=1.8 Iter=100 SN=12.2 Iter=600 SN=16.2 Iter=2600 SN=18.8 Iter=10000 SN=10.2 Iter=5000 SN=17.4 T. Yokota+, “Dynamic PET image reconstruction using NMF incorporated with Deep Image Prior." ICCV, 2019.

153 Why noise can be reduced by CNNs ? Explanation
in [Ulyanov+, 2018] Noise impedance was shown Four targets were tested to reconstruct Natural image Natural image with noise Natural image with pixel shuffle Uniform noise Convergence speeds were different Natural image was faster Noise was slower DIP exploits the delay of times However, it is just a report of phenomenon Question: Why CNNs? Why convolution?

154 154 Patch manifold model Neural Networks

155 Discussions Deep Image Prior Fully Convolutional Net (architecture) itself
has some image prior However, the prior is difficult to be explained with words MDT × Low-rank model Convolution＝DE + linear transform It has some relationship with CNN

156 Discussions (cont’d) Multiway delay-embedding transform (MDT) is a patch
extractor

extractor It produces Multilevel Block Hankel Matrix (256, 256) (32, 225, 32, 225) (32*32, 225*225) (32, 32, 225, 225) MDT mode permutation & reshaping … 32 225 225 32 Two level block Hankel matrix

extractor It produces Multilevel Block Hankel Matrix Its linear transform is equivalent to convolution Conv = MDT + Linear CNN = (Conv + Act)*L MLP = (Linear + Act)*L MLP is explainable as manifold learning Here, we consider MDT + MLP … * = = MDT reshape Input Convolution Activation Convolution Activation … Output Convolution Convolution Activation CNN structure

159 Manifold Modeling in Embedded Space (MMES) A new CNN
structure using MDT [Yokota+, 2020 (in IEEE-TNNLS)] Simple structure (MDT ➔ MLP ➔ inverse MDT) Each column vector (image patch) is independently transformed by MLP MLP is trained as Auto-Encoder (AE), and it can be interpreted as manifold learning In practice, Denoising-Auto-Encoder (DAE) is employed MDT … … Inv MDT Multi Layer Perceptron (MLP) Train as Denoising-Auto-Encoder (DAE)

160 Manifold Modeling in Embedded Space (MMES) MMES can be
interpreted as optimizing the following problem Prior in MMES is the low-dimensionality of the patch manifold MDT Prior: Low dimensional patch manifold … … vectorize

161 Manifold Modeling in Embedded Space (MMES) MMES can be
interpreted as optimizing the following problem Prior in MMES is the low-dimensionality of the patch manifold Replacing the manifold with MLP(AE), it can be transformed as MDT Prior: Low dimensional patch manifold Error term Regularization term

162 Conceptual illustration Compressed in low-dimensional space Patches +noise +noise
+noise … … Reconstructed patches Patch aggregation Patch extraction ・Compression works well when many similar patches ➔ Low-dimensionality can be image prior ➔ It can be regarded as non-local similarity ・Denoising-Auto-Encoder(DAE) learns low- dimensional manifold from patch distribution Data Fidelity Corrupted image Interpretation of DAE [Vincent+, 2010]

163 Implementation of MMES: Inpainting It can be implemented using
Deep Learning toolbox (TensorFlow etc) Cost function is minimized by using Adam optimizer Various task can be solved by same framework MDT … … Inv MDT AE loss loss mask

164 It can be implemented using Deep Learning toolbox (TensorFlow
etc) Cost function is minimized by using Adam optimizer Various task can be solved by same framework Implementation of MMES: Deblurring MDT … … Inv MDT AE loss loss blur

165 It can be implemented using Deep Learning toolbox (TensorFlow
etc) Cost function is minimized by using Adam optimizer Various task can be solved by same framework Implementation of MMES: Super-Resolution MDT … … Inv MDT AE loss loss Down simple

166 Preliminary experiments (signal recovery) Recovery of a noisy and
incomplete one-dimensional signal Original Observed Low-rank manifold

167 Color image inpainting from 99% missing 99% missing result
Optimization behavior of MMES

168 Two dimensional space on the patch manifold Learning two-dimensional
latent space in AE from an incomplete image 168

169 Comparison of various image priors DIP and MMES work
well Low-rank priors Low-rank &smoothness priors Deep image prior Sparse modeling Low-rank MDT Patch manifold 99%

170 Comparison (99% missing) Original LRTV convex Smooth CPD MDT
low rank Deep Image Prior MDT manifold

173 Comparison: Super resolution Bicubic Deep Image Prior MDT manifold
×4 ×8

174 Comparison: Deblurring Blurred DIP MDT Manifold

175 Summary of Part III Two approaches use NN Learning
data-driven prior from a large amount of data SRCNN, SRGAN, PULSE Network structure (CNN) provides image prior Deep image prior, Patch manifold model ML based image restoration has much attentions Data-driven DIP Patch manifold Accuracy ◎ ◦ ◦ Computation Fast Slow Slow Training data Necessary Not necessary No necessary Interpretability Depend on data Unclear Clear

176 Closing tutorial I’d like to show my appreciation to
organizers of APSIPA ASC 2021 and all participants!! In this tutorial, I tried to summarize prior-based image restoration methods from multiple perspectives: classical methods (sparseness, smoothness, and non-local similarity priors), tensor methods, and neural network methods. In my opinion, two important points in the study of image restoration are a technical viewpoint about the behavior of mathematical models and optimization algorithms a realistic viewpoint for digging into the essential nature of images I hope this tutorial helps your study in the future. Basics Tensors Neural Nets

Advanced Topics of Prior-based Image Restoratio...

Advanced Topics of Prior-based Image Restoration: Tensors and Neural Networks (Tutorial at APSIPA-ASC 2021)

More Decks by Tatsuya Yokota

Featured

Transcript