Ghalamkari1,2, Mahito Sugiyama1,2 International Conference on Information Geometry for Data Science (IG4DS 2022) 1 : The Graduate University for Advanced Studies, SOKENDAI 2 : National Institute of Informatics
2 Approximates with a linear combination of fewer bases (principal components) for feature extraction, memory reduction, and pattern discovery.๐ โ โ โ โ
3 Approximates with a linear combination of fewer bases (principal components) for feature extraction, memory reduction, and pattern discovery.๐ โ โ โ โ Non-negative constraint improves interpretability
4 Approximates with a linear combination of fewer bases (principal components) for feature extraction, memory reduction, and pattern discovery.๐ โ โ โ โ Low-rank approximation with non-negative constraints are based on gradient methods. โ Appropriate settings for stopping criteria, learning rate, and initial values are necessary ๐ข Non-negative constraint improves interpretability
initial values, stopping criterion and learning rate ๐ Information Geometric Analysis using Distributions on DAGs that Correspond to Data Structures
initial values, stopping criterion and learning rate ๐ Information Geometric Analysis using Distributions on DAGs that Correspond to Data Structures Rank-1 = rank 1,1,1
rank-1 missing NMF No worries about initial values, stopping criterion and learning rate ๐ Solve the task as a coupled NMF. Find the most dominant factor rapidly. Missing value Rank-1 = rank 1,1,1 Information Geometric Analysis using Distributions on DAGs that Correspond to Data Structures
, ๐ 3 โ the following three properties are satisfied. (1) Reflexivity โถ ๐ 1 โค ๐ 1 (2) Antisymmetry: ๐ 1 โค ๐ 2 , ๐ 2 โค ๐ 1 โ ๐ 1 = ๐ 2 (3)Transitivity:๐ 1 โค ๐ 2 , ๐ 2 โค ๐ 3 โ ๐ 1 โค ๐ 3 โก log-linear model on DAG We define the log-linear model on a DAG as a mapping ๐: โ 0,1 ๏ผNatural parameters ๐ฝ describe the model. ๐-space Mahito Sugiyama, Hiroyuki Nakahara and Koji Tsuda "Tensor balancing on statistical manifoldโ(2017) ICML. 15 Log-linear model on Directed Acyclic Graph (DAG)
๐ 3 โ the following three properties are satisfied. โกDAG(poset) (1) Reflexivity โถ ๐ 1 โค ๐ 1 (2) Antisymmetry: ๐ 1 โค ๐ 2 , ๐ 2 โค ๐ 1 โ ๐ 1 = ๐ 2 (3)Transitivity:๐ 1 โค ๐ 2 , ๐ 2 โค ๐ 3 โ ๐ 1 โค ๐ 3 โก log-linear model on DAG We define the log-linear model on a DAG as a mapping ๐: โ 0,1 ๏ผNatural parameters ๐ฝ describe the model. ๐-space ๐-space We can also describe the model by expectation parameters ๐ผ with Mรถbius function. Mahito Sugiyama, Hiroyuki Nakahara and Koji Tsuda "Tensor balancing on statistical manifoldโ(2017) ICML. 16 Log-linear model on Directed Acyclic Graph (DAG)
Probability values Relation between distribution and tensor Mรถbius inversion formula ๏ผ ๐, ๐, ๐ , indices of the tensor ๏ผ index set ๏ผ tensor values ๐ซ๐๐๐
can find the projection destination by a gradient-method. But gradient-methods require Appropriate settings for stopping criteria, learning rate, and initial values ๐ข Rank-1 subspace Rank-1 condition (๐ฝ-representation) Its all many-body ๐-parameters are 0. is e-flat. The projection is unique.
condition with the ๐-parameter. is e-flat. The projection is unique. One-body parameter Many-body parameter Rank-1 subspace Its all many-body ๐-parameters are 0. Rank-1 condition (๐ฝ-representation) We can find the projection destination by a gradient-method. But gradient-methods require Appropriate settings for stopping criteria, learning rate, and initial values ๐ข
the best rank-1 approximation 33 One-body parameter Many-body parameter Rank-1 condition (๐ผ- representation) Rank-1 condition (๐ฝ-representation) Rank-1 subspace Its all many-body ๐-parameters are 0.
the best rank-1 approximation 34 One-body parameter Many-body parameter Mรถbius inversion formula = ๐๐11 ๐1๐1 ๐11๐ Rank-1 condition (๐ผ- representation) Rank-1 condition (๐ฝ-representation) Rank-1 subspace All ๐ผ-parameters after the projection are identified. Using inversion formula, we found the projection destination. Its all many-body ๐-parameters are 0.
given as which minimizes KL divergence from ๐ซ. Best rank-1 tensor formula for minimizing KL divergence (๐ = 3 ) 35 Mean-field approximation and rank-1 approximation We reproduce the result in K.Huang, et al. "Kullback-Leibler principal component for tensors is not NP-hard." ACSSC 2017 9:00
given as which minimizes KL divergence from ๐ซ. Best rank-1 tensor formula for minimizing KL divergence (๐ = 3 ) 36 By the way, Frobenius error minimization is NP-hard Mean-field approximation and rank-1 approximation We reproduce the result in K.Huang, et al. "Kullback-Leibler principal component for tensors is not NP-hard." ACSSC 2017
given as which minimizes KL divergence from ๐ซ. A tensor with ๐ indices is a joint distribution with ๐ random variables. A vector with only 1 index is an independent distribution with only one random variable. Best rank-1 tensor formula for minimizing KL divergence (๐ = 3 ) 37 By the way, Frobenius error minimization is NP-hard Mean-field approximation and rank-1 approximation We reproduce the result in K.Huang, et al. "Kullback-Leibler principal component for tensors is not NP-hard." ACSSC 2017 Normalized vector depending on only ๐ Normalized vector depending on only ๐ Normalized vector depending on only ๐
given as which minimizes KL divergence from ๐ซ. A tensor with ๐ indices is a joint distribution with ๐ random variables. A vector with only 1 index is an independent distribution with only one random variable. Rank-1 approximation approximates a joint distribution by a product of independent distributions. Best rank-1 tensor formula for minimizing KL divergence (๐ = 3 ) 38 By the way, Frobenius error minimization is NP-hard Mean-field approximation and rank-1 approximation We reproduce the result in K.Huang, et al. "Kullback-Leibler principal component for tensors is not NP-hard." ACSSC 2017 Mean-field approximation : a methodology in physics for reducing a many-body problem to a one-body problem. Normalized vector depending on only ๐ Normalized vector depending on only ๐ Normalized vector depending on only ๐
= 0 ๐112 ๐131 ๐121 ๐113 ๐211 ๐311 Expand the tensor by focusing on the ๐-th axis into a rectangular matrix ๐(๐) (mode-๐ expansion) rank ๐ซ = 1 โบ its all manyโbody ๐ parameters are 0 Rank-1 condition (๐ฝ-representation) 46
a bingo location. The shaded areas do not change their values in the projection. 55 Example: Reduce the rank of (8,8,3) tensor to (5,8,3) or less STEP2 : Replace the bingo part with the best rank-1 tensor.
best rank-1 approximation formula 56 Example: Reduce the rank of (8,8,3) tensor to (5,8,3) or less STEP2 : Replace the bingo part with the best rank-1 tensor. STEP1 : Choose a bingo location. ๐ is zero ๐ can be any
best rank-1 approximation formula 57 Example: Reduce the rank of (8,8,3) tensor to (5,8,3) or less STEP2 : Replace the bingo part with the best rank-1 tensor. STEP1 : Choose a bingo location. ๐ is zero ๐ can be any
best rank-1 approximation formula The best tensor is obtained in the specified bingo space. ๐ There is no guarantee that it is the best rank (5,8,3) approximation. ๐ข 58 Example: Reduce the rank of (8,8,3) tensor to (5,8,3) or less STEP2 : Replace the bingo part with the best rank-1 tensor. STEP1 : Choose a bingo location. ๐ is zero ๐ can be any
or less STEP2 : Replace the bingo part with the best rank-1 tensor. STEP1 : Choose a bingo location. ๐ is zero ๐ can be any The shaded areas do not change their values in the projection.
is missing otherwise Element-wise product ๐ฝ๐๐ = แ 0 1 โก Collect missing values in a corner of matrix to solve as coupled NMF Missing value
et al., 2013) ๐ฝ๐๐ = แ 0 1 If ๐๐๐ is missing otherwise Element-wise product Missing value โก Collect missing values in a corner of matrix to solve as coupled NMF Equivalent
simultaneously rank-1 decomposable. โ It can be written as ๐ โ ๐, ๐ โ ๐, ๐ โ ๐ . Its all two-body ๐-parameters are 0. Simultaneous Rank-1 ๐ฝ-condition One-body parameter Two-body parameter
Simultaneous Rank-1 ๐ผ-condition Its all two-body ๐-parameters are 0. Simultaneous Rank-1 ๐ฝ-condition ๐ฟ, ๐, ๐ is simultaneously rank-1 decomposable. โ It can be written as ๐ โ ๐, ๐ โ ๐, ๐ โ ๐ . One-body parameter Two-body parameter is e-flat. The projection is unique.
= ๐๐1 ๐1๐ Simultaneous Rank-1 ๐ผ-condition Its all two-body ๐-parameters are 0. Simultaneous Rank-1 ๐ฝ-condition ๐ฟ, ๐, ๐ is simultaneously rank-1 decomposable. โ It can be written as ๐ โ ๐, ๐ โ ๐, ๐ โ ๐ . One-body parameter Two-body parameter The m-projection does not change one-body ฮท-parameter Shun-ichi Amari, Information Geometry and Its Applications, 2008, Theorem 11.6
= ๐๐1 ๐1๐ Simultaneous Rank-1 ๐ผ-condition Its all two-body ๐-parameters are 0. Simultaneous Rank-1 ๐ฝ-condition ๐ฟ, ๐, ๐ is simultaneously rank-1 decomposable. โ It can be written as ๐ โ ๐, ๐ โ ๐, ๐ โ ๐ . One-body parameter Two-body parameter The m-projection does not change one-body ฮท-parameter Shun-ichi Amari, Information Geometry and Its Applications, 2008, Theorem 11.6 All ๐ผ-parameters after the projection are identified. 19:20
81 Missing values tend to be in certain columns in some real datasets. ex) disconnected sensing device, optional answer field in questionnaire form ๐ Missing values are evenly distributed in each row and column. ๐ Missing are heavily distributed in certain rows and columns.
KL-WNMF - Relative runtime < 1 means A1GM is faster than KL-WNMF. - Relative error > 1 means worse reconstruction error of A1GM than KL-WNMF. - Increase rate is the ratio of # missing values after addition of missing values at step1. ร5 โ 10 times faster! 83 Find the best solution Add missing values. Accuracy decreases.
1 If ๐๐๐ is missing otherwise โก The rank of weight matrix is 2 after adding missing values. โก Can we exactly solve rank-1 NMF if the rank(ฮฆ) = 2? ๐ฝ ๐ฝ ๐ ๐ rank ๐ฝ = 2 rank ๐ฝ = 2 rank ๐ฝ = 2 86
approximation of extended NMMF Equivalent If rank(๐ฝ) โฆ2, the matrix can be transformed into the form ๐ฝ๐๐ = แ 0 1 If ๐๐๐ is missing otherwise Permutation We can exactly solve rank-1 NMF with missing values by permutation if rank(๐ฝ) โฆ2.
๐๐๐๐ = าง ๐๐11 าง ๐1๐1 าง ๐11๐ Rank-1 condition (๐ฝ-representation) All many body าง ๐๐๐๐ are 0 93 โก Closed Formula of the Best Rank-1 NMMF โก A1GM: Faster Rank-1 NMF with missing values Conclusion The best rank-1 approximation for NMMF Data structure DAG Infor-Geo 93