4th order tensor A block 3rd order tensor of which elements are vectors A block matrix of which elements are matrices A block vector of which elements are 3rd order tensors 5th order tensor A block 4th order tensor of which elements are vectors A block 3rd order tensor of which elements are matrices A block matrix of which elements are 3rd order tensors A block vector of which elements are 4th order tensors Nth order tensor A block vector of which elements are (N-1)th order tensors Recursive representations of Nth order tensor ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・・・ ・・・ ・・・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・・・ ・・・ ・・・ ・ ・ ・ Block matrix Block matrix Block vector Block vector
or “smooth structures” Many applications Dimensionality Reduction (Compression), Pattern Recognition, Blind Source Separation, Signal Restoration, Deep Neural Networks, etc. Why tensors
Computer Science / Data Engineering (1990s-) Multi-media signal processing (audio, speech, image, video) Bio-medical signal processing (EEG, MEG, functional MRI, diffusion MRI) Bio-informatics (gene, protein) Remote sensing (Hyperspectral image) Web/data mining (SNS analysis, recommender system) Non-linear time series analysis Wireless communications Deep learning etc Fields of applications ID 001 ID 002 ID 003 ID 004
2019 “Cause-and-Effect in a Tensor Framework” by L. De Lathauwer, M. Alex, O. Vasilescu, and J. Kossaifi in CVPR 2019 “Tensor Factorizations, Data Fusion & Applications” by E. Acar in LVA/ICA 2018 “Blind Audio Source Separation on Tensor Representation” by H. Sawada, N. Ono, H. Kameoka, D. Kitamura in ICASSP 2018 “Tensor Decomposition for Signal Processing and Machine Learning” by N.D. Sidiropoulos, L. De Lathauwer, and X. Fu, E.E. Papalexakis in ICASSP 2017 “A New Tensor Algebra” by L. Horesh and M. Kilmer in CVPR 2017 Related Tutorials / Workshops
of vectors, matrices … Many data can be represented by using tensors Tensor processing have many applications. Tensor processing have attracted attentions Workshops/Tutorials Awards So, lets learn about tensors!!
Lowercase bold letter : Matrix variables Uppercase bold letter : High order tensor variables (3rd, 4th, …, Nth ) Calligraphic bold letter : Mathematical notations
the meaning of Ex) 3rd mode unfolding of 5th order tensor n-th mode unfolding unfold all modes except the n-th mode keep the 3rd mode unfold the 1st, 2nd, 4th, 5th modes
is useful to see the linear characteristics of tensor from each mode For example, Tucker rank or multilinear rank of tensor can be defined by using matrix rank of n-th mode unfolding
“unfolding” It sometimes help us to understand the linear characteristics of tensors On the other hand, “folding” can be considered, too. unfolding folding
Folding can be considered as a “grouping” elements Ex: triplet is extracted for each column vector, not unique How to consider which folding is better? ➔ low rankness can be one criterion 9 6 3 6 4 2 3 2 1 3 2 1 6 4 2 9 6 3 3 2 1 6 3 4 6 9 Folding 1 Folding 2 3 2 1 1 2 3 × 0 0 1 1 4 9 × 2 3 6 1 1 0 × + rank is 1 rank is 2
be naturally represented by matrix, but folding have some benefits for compression (256,256) Low rank approximation #param = 2048 #param = 2048 16th order tensor (2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2) Low rank approximation Folding
all elements of their elementwise (Adamard) product. It can be naturally generalized that “inner product of two vectors obtained by vectorising two tensors”. Inner product * * * * … * sum * * * * … * sum vec vec
number of columns/rows is same. Matrix product is an combination of inner product and outer product Matrix product = + + = inner product w.r.t r outer product w.r.t i and j = + +
number of columns/rows is same. Matrix product is an combination of inner product and outer product Khatri-Rao product is a sub-part of matrix product. Matrix product (cont’d) = + + = = + + = * * * * * * = sum fold
tensor and a matrix of which the size of some mode is same. It is also called as nth-mode product. Easier to understand through nth-mode unfolding. inner prod nth mode prod
is very helpful) Using a node and edges, the number of edges means order of a tensor Inner product of tensors can be denoted by the link of all edges Tensor Diagram vector matrix 3rd order tensor 4th order tensor ・・・ ・・・ ・・・ ・・・
essentially connecting the edges (modes) of tensors. There is some freedom in how to connect them. Tensor Diagram (cont’d) (I×J) (I×R)(R×J) R I J I J = I J K I J K I J K I J K J K I J K I J K I I J K
are explained Notations Fiber / Slice Unfolding (vectorization, matricization etc) Inner product Outer product Special products (Kronecker, Khatri-Rao etc) General tensor multiplication Tensor diagram I think that it is nice to geometrically imagine how the tensors are unfolded, folded, transformed, and etc. So, this time, I tried to explain them using figure. Q&A
decomposition General definition may be “a tensor represented by multiple tensors” A specific decomposition is given by a parametric model f and its parameter tensors. f( , , ) Tensor decomposition = Matrix decomposition = core tensor factor matrix
algorithm is typical and widely used. Alternately update each factor matrix with a least squares solution while freezing the other factors An algorithm for CP decomposition Initialize:
algorithm is typical and widely used. Alternately update each factor matrix with a least squares solution while freezing the other factors An algorithm for Tucker decomposition (cont’d) Initialize:
a relatively emergent model It have a benefit of extremely light parameterization for high order huge tensors Elements of tensor can be calculated by multiplication of slices/fibers of core tensors = TT decomposition
a generalization of TT-decomposition Also many other networks can be considered. Now a day, tensor decompositions (including CP, Tucker etc) are called as Tensor Netoworks in general. = CP Tucker Hierarchical Tucker (HT) TT Projected Entangled-Pair States (PEPS)
tensor data grows exponentially with the tensor order, N. Tensor decomposition works to reduce the number of parameters. For example, CP decomposition with I=10, R = 10 Dimensionality reduction Number of param. Original Tensor CP (R=10) N=3 1,000 3,00 N=4 10,000 4,00 N=5 100,000 5,00 N=6 1,000,000 6,00
2013 (in NECO)] [Caiafa+, 2013 (in WIRES)] Tensor networks (TNs) for dimensionality reduction [Zhao+, 2016] [Cichocki+, 2017] Genetic algorithm for TN model selection [Li+, 2020 (in ICML)]
using multi-linear projection instead of linear projection. Multiway LDA … Linear projection maximize minimize … Multi-linear projection maximize minimize (N+1)-th order tensor
using Eigen Value Decomposition (EVD) Multi-way LDA is obtained by alternating optimization Eigenvectors of Update by EVD for n = 1, …, N repeat endfor until converge
(cont’d) 112 92 ORL faces 10^4 … 10^4 10^4 In the case of LDA, EVD of a large matrix is necessary ☺ In the case of (Multi-way) 2DLDA, the matrix is much smaller L 112 92 L Obtain EVD 112 112 92 92 Update EVD Update EVD
feature extraction by using 2DLDA + LDA is proposed Running time of 2DLDA was much shorter than LDA Accuracy of 2DLDA+LDA was slightly better than LDA Pattern recognition (cont’d) 112 92 10 10 vec 100 40 Category Nearest Neighbor
represented as a 4th order tensor by convolving Gabor functions The Gabor Tensor is discriminated by (multi-way) Generalized LDA Pattern recognition (cont’d) 5×8 Gabor functions Category vec GLDA LDA Some simple classifier
low-rank matrix analysis (ILRMA) [Kitamura+, 2016] Blind source separation (cont’d) channel time channel time frequency Tensor representation by STFT for each channel source = Demixing matrices for each frequency frequency ≈ .^2 ② Nonnegative CP decomp. of power tensor ◦ ① Reconstructed tensor is parameterized by using demixing matrices
Segmentation is one of the way of tensorization (folding) This strategy is nice in case of the sources having low-rank structure in matrix/tensor representation Blind source separation (cont’d) tensor representation
[Dabov+, 2007 (in IEEE-TIP)] GSR [Zhang+, 2014 (in IEEE-TIP)] Truncated HOSVD [Yokota+, 2017 (in IEEE-TSP)] http://www.cs.tut.fi/~foi/GCF-BM3D/ ① Extracting similar patches and represent it as a tensor ② The tensor is approximated by low-rank tensor to reduce noise component
based tensor data recovery Low-n-rank tensor recovery [Gandy+, 2011 (in Iverse Problems)] Low rank tensor completion [Liu+, 2012 (in IEEE-TPAMI)] Bayesian CP decomposition [Zhao+, 2015 (in IEEE-TPAMI)] t-SVD [Zhang+, 2017 (in IEEE-TSP)] Low-rank and smooth tensor recovery [Yokota+, 2017 (in CVPR)] [Yokota+, 2019 (in IEEE-Access)] Tensor ring completion [Wang+, 2017 (in ICCV)]
and information processing studies. Especially, convolutional neural networks (CNNs) is widely employed for its efficiency. Deep neural networks (DNNs) Recognition Segmentation Enhancement Restoration Synthesis for Audio Speech Language Image Bio-signal Input Convolution Activation Convolution Activation … Output Convolution Convolution Activation
from the 3rd order tensor space to another 3rd order tensor space Its parameter is a 4th order tensor DNNs (cont’d) height width channels height width channels , , conv layer
ICLR)] Full convolution (4th order tensor) is approximated by CP model It becomes sequential convolutions with fibers Experiments: evaluated in image recognition (largely speedup with few accuracy drop) DNNs (cont’d) , , , CP decomposition ◦ ◦ ◦ (9,9,48,128) rank
2015 (in NeurIPS)] Experiments: 1000 class ImageNet using VGG16 and VGG19 There are three FC layers DNNs (cont’d) = = TT decomposition FC1 (25088, 4096) FC2 (4096, 4096) FC3 (4096, 1000) 25088 = 2*7*8*8*7*4 4096 = 4*4*4*4*4*4 No compression ➔ Using TT TT-rank ➔ was 1, 2, or 4 Matrix decomp matrix rank ➔ was 1, 5, or 50 Efficient compression with few degradation!!
CNN with Tucker [Kim+, 2016 (in ICLR)] Supervised Learning with Tensor Networks [Stoudenmire+, 2016 (in NeurIPS)] Exponential Machines [Nivikov+, 2017 (in ICLR workshop)] Compact RNNs with Block-term Tensor Decomposition [Ye+, 2018 (in CVPR)] Compressing RNNs with Tensor Ring [Pan+, 2019 (in AAAI)] Tensor network decomposition of CNN kernels [Hayashi+, 2019 (in NeurIPS)] For understanding DNNs [Li+, 2019 (in ICASSP)] [Li+, 2020 (in AISTATS)]
A common concept is “representing signals by high-order tensors which has low-rank structure” and do the tensor decomposition Direct representation Multi-scale Gabor representation Time-frequency representation Similar patch grouping High-order tensorization Other perspectives in tensor studies: decomposition models, optimization algorithm, model selection, rank estimation, etc signals tensors decomposition Dimensionality Reduction Pattern Recognition Blind Source Separation Signal Restoration Deep Neural Networks
in more strict sense we need to consider how to select the model for decomposition Also it is also important how to efficiently obtain the decomposition in optimization Model selection Usually, model of tensor decomposition is manually selected by researchers / users, however, the there are various candidates of the tensor decomposition nowadays. It is very difficult task to determine the model from given data. Usually it requires some professional knowledge of both application and tensor fields. Evolutionary Topology Search for Tensor Networks [Li+, 2020 (in ICML)] Optimization It is quite important to study how to efficiently solve the complicated optimization problems to perform tensor (network) decompositions Many methods for optimization: Gradient method, Newton method, alternating least square, hierarchical alternating least squares, stochastic gradient … Randomized SVD [Minster+, 2020] [Ahmadi-Asl+, 2020] TensorSketch [Malik+, 2018 (in NeurIPS)] GPU acceleration [Choi+, 2018]
its applications A common concept was “tensorize” and “decompose” A question here is “how to tensorize the signals” Generally, some specific knowledge on the signals is necessary to design the tensors Here, I introduce the multi-way Hankelization / delay-embedding for tensors as an advanced topic on the tensor processing. signals tensors decomposition here
decomposition When whole slices are missing in a tensor, low-rank tensor decomposition does not work Random pixel missing can be recovered, but slice can not be recovered. Since slice missing deflates the matrix/tensor rank, its deflated component can not be recovered. Slices & random missing Recovered by low-rank decomposition
use low-rank tensor decomposition not in direct tensor space, but in embedded (Hankel) tensor space. height width rgb Low-rank tensor completion Delay embedding / Hankelization High order incomplete tensor Low-rank tensor completion Inverse operation output High order completed tensor ☺
of DE? There is no inversion of DE since it is essentially a duplication We only consider the pseudo-inverse of DE folding unfolding duplicate … average … average duplicate … ☺ ?
(a) Delay-embedding only column direction (b) Delay-embedding only row direction (c) Delay-embedding both directions Delay-embedding for matrix … … (a) … … … (b) (c) … … … … … … … … 3rd order tensor 4th order tensor
Hankelized by DE It obtains time-series of Hankel matrices Further DE obtains “block Hankel matrix” Block Hankel matrix is defined as a block matrix which has Hankel structure and its all elements are also Hankel matrices Block Hankel structure … … … … … … … … … … DE DE a b c d z A B C D A B C B C D C D E Z Z
multilinear duplication + folding We call it as “Multiway delay-embedding transform (MDT)” MDT: A generalization of DE for tensors = I τ*(I-τ+1) I-τ+1 τ = 3rd order tensor (I1, I2, I3) 6th order tensor
tensor Complete block Hankel tensor Complete tensor Low-rank Tucker decomposition in embedded space [Yokota+, 2018 (in CVPR)] ① MDT of given incomplete tensor of original size (Nth order ➔ 2Nth order) ② Tucker decomposition of incomplete block Hankel tensor ③ Inverse MDT of comlete block Hankel tensor (2Nth order ➔ Nth order) MDT Inv. MDT Tucker Decomp.
I explain the method for Tucker decomposition of incomplete tensor Optimization problem can be given by It can be solved by auxiliary function approach and ALS algorithm Update Tucker decomposition of an incomplete tensor : Incomplete tensor : binary tensor of same size to → 0 represents missing entry → 1 represents observed entry ALS to minimize alternative optimization To convergence ↑ it is same as Tucker for complete tensor
MDT Tucker decomposition with MDT [Yokota+, 2018 (in CVPR)] We are extending it to others TT decomposition with MDT [Sedighin+, 2020 (in IEEE-SPL)] Time series analysis with MDT [Shi+, 2020 (in AAAI)] Manifold modeling with MDT [Yokota+, 2020 (in IEEE-TNNLS)] MDT Inv. MDT + ◦ ◦
part of convolution Hankelization can be applied to incomplete data ☺ Convolution (STFT, Wavelet) can not be applied to incomplete data convolution Hankel matrix
criteria would be helpful Computationally expensive The number of elements of the Hankel tensor dramatically increases w.r.t. Ex: we consider MDT of (128,128,128)-tensor small large DE 128 128 128 τ Size of Hankel tensor Memory requirement 8 (8, 121, 8, 121, 8, 121) 約7GB 16 (16, 113, 16, 113, 16, 113) 約47GB 32 (32, 97, 32, 97, 32, 97) 約239GB 64 (64, 65, 64, 65, 64, 65) 約575GB
organizers of APSIPA ASC 2020 and all participants!! This tutorial is an intro. of tensor methods in signal processing and machine learning. I tried to avoid cutting-edge difficult content as much as possible and make the tutorial friendly to students and beginners. Community of tensor studies is growing now, and all the people are welcome to this fields having any interest on theory, algorithm, software development, and applications!! Basics Applications Advances * * * * … * = + + ◦ ◦ ◦ = … = + +