Upgrade to PRO for Only $50/Yearโ€”Limited-Time Offer! ๐Ÿ”ฅ

Non-negative low-rank approximations for multi...

Kazu Ghalamkari
September 23, 2022

Non-negative low-rank approximations for multi-dimensional arrays on statisticalย manifold

The International Conference on Information Geometry for Data Science 2022 Virtual 2022.9.19 โ€“ 23

Kazu Ghalamkari

September 23, 2022
Tweet

More Decks by Kazu Ghalamkari

Other Decks in Research

Transcript

  1. Non-negative low-rank approximations for multi-dimensional arrays on statistical manifold Kazu

    Ghalamkari1,2, Mahito Sugiyama1,2 International Conference on Information Geometry for Data Science (IG4DS 2022) 1 : The Graduate University for Advanced Studies, SOKENDAI 2 : National Institute of Informatics
  2. Motivation โ–ก Non-negative low-rank approximation of data with various structures

    2 Approximates with a linear combination of fewer bases (principal components) for feature extraction, memory reduction, and pattern discovery.๐Ÿ˜€ โ‰ƒ โ‰ƒ โ‰ƒ โ‰ƒ
  3. Motivation โ–ก Non-negative low-rank approximation of data with various structures

    3 Approximates with a linear combination of fewer bases (principal components) for feature extraction, memory reduction, and pattern discovery.๐Ÿ˜€ โ‰ƒ โ‰ƒ โ‰ƒ โ‰ƒ Non-negative constraint improves interpretability
  4. Motivation โ–ก Non-negative low-rank approximation of data with various structures

    4 Approximates with a linear combination of fewer bases (principal components) for feature extraction, memory reduction, and pattern discovery.๐Ÿ˜€ โ‰ƒ โ‰ƒ โ‰ƒ โ‰ƒ Low-rank approximation with non-negative constraints are based on gradient methods. โ†’ Appropriate settings for stopping criteria, learning rate, and initial values are necessary ๐Ÿ˜ข Non-negative constraint improves interpretability
  5. Strategy โ–ก Modeling with probability mass function on Directed Acyclic

    Graph(DAG). 5 โ–ก Modeling with probability mass function on Directed Acyclic Graph(DAG). โ–ก Utilize projection theory of information geometry.
  6. Strategy โ–ก Modeling with probability mass function on Directed Acyclic

    Graph(DAG). 6 โ–ก Modeling with probability mass function on Directed Acyclic Graph(DAG). โ–ก Utilize projection theory of information geometry.
  7. Strategy โ–ก Modeling with probability mass function on Directed Acyclic

    Graph(DAG). โ–ก Utilize projection theory of information geometry. 7
  8. Strategy โ–ก Modeling with probability mass function on Directed Acyclic

    Graph(DAG). โ–ก Utilize projection theory of information geometry. 8
  9. Contribution โ–ก LTR: Faster Tucker-rank Reduction 9 No worries about

    initial values, stopping criterion and learning rate ๐Ÿ˜„ Information Geometric Analysis using Distributions on DAGs that Correspond to Data Structures
  10. Contribution โ–ก LTR: Faster Tucker-rank Reduction 10 No worries about

    initial values, stopping criterion and learning rate ๐Ÿ˜„ Information Geometric Analysis using Distributions on DAGs that Correspond to Data Structures Rank-1 = rank 1,1,1
  11. Contribution โ–ก LTR: Faster Tucker-rank Reduction 11 โ–ก A1GM: Faster

    rank-1 missing NMF No worries about initial values, stopping criterion and learning rate ๐Ÿ˜„ Solve the task as a coupled NMF. Find the most dominant factor rapidly. Missing value Rank-1 = rank 1,1,1 Information Geometric Analysis using Distributions on DAGs that Correspond to Data Structures
  12. Contents 12 โ–ก Introduction of log-linear model on DAG โ–ก

    The best rank-1 approximation formula โ–ก Legendre Tucker-Rank Reduction(LTR) โ–ก The best rank-1 NMMF โ–กA1GM: faster rank-1 missing NMF โ–ก Motivation, Strategy, and Contributions github.com/gkazunii/A1GM github.com/gkazunii/Legendre-tucker-rank-reduction โ–ก Theoretical Remarks โ–ก Conclusion 3:00
  13. Modeling tensor and matrix 13 โ–ก Flexible modeling is required

    to capture the structure of various data Formulate low-rank approximations with probabilistic models on DAGs
  14. โ–กDAG(poset) is a DAG โ‡” for all ๐‘ 1 , ๐‘ 2

    , ๐‘ 3 โˆˆ the following three properties are satisfied. (1) Reflexivity โˆถ ๐‘ 1 โ‰ค ๐‘ 1 (2) Antisymmetry: ๐‘ 1 โ‰ค ๐‘ 2 , ๐‘ 2 โ‰ค ๐‘ 1 โ‡’ ๐‘ 1 = ๐‘ 2 (3)Transitivity:๐‘ 1 โ‰ค ๐‘ 2 , ๐‘ 2 โ‰ค ๐‘ 3 โ‡’ ๐‘ 1 โ‰ค ๐‘ 3 Mahito Sugiyama, Hiroyuki Nakahara and Koji Tsuda "Tensor balancing on statistical manifoldโ€œ(2017) ICML. 14 Log-linear model on Directed Acyclic Graph (DAG)
  15. โ–กDAG(poset) is a DAG โ‡” for all ๐‘ 1 , ๐‘ 2

    , ๐‘ 3 โˆˆ the following three properties are satisfied. (1) Reflexivity โˆถ ๐‘ 1 โ‰ค ๐‘ 1 (2) Antisymmetry: ๐‘ 1 โ‰ค ๐‘ 2 , ๐‘ 2 โ‰ค ๐‘ 1 โ‡’ ๐‘ 1 = ๐‘ 2 (3)Transitivity:๐‘ 1 โ‰ค ๐‘ 2 , ๐‘ 2 โ‰ค ๐‘ 3 โ‡’ ๐‘ 1 โ‰ค ๐‘ 3 โ–ก log-linear model on DAG We define the log-linear model on a DAG as a mapping ๐‘: โ†’ 0,1 ๏ผŽNatural parameters ๐œฝ describe the model. ๐œƒ-space Mahito Sugiyama, Hiroyuki Nakahara and Koji Tsuda "Tensor balancing on statistical manifoldโ€œ(2017) ICML. 15 Log-linear model on Directed Acyclic Graph (DAG)
  16. is a DAG โ‡” for all ๐‘ 1 , ๐‘ 2 ,

    ๐‘ 3 โˆˆ the following three properties are satisfied. โ–กDAG(poset) (1) Reflexivity โˆถ ๐‘ 1 โ‰ค ๐‘ 1 (2) Antisymmetry: ๐‘ 1 โ‰ค ๐‘ 2 , ๐‘ 2 โ‰ค ๐‘ 1 โ‡’ ๐‘ 1 = ๐‘ 2 (3)Transitivity:๐‘ 1 โ‰ค ๐‘ 2 , ๐‘ 2 โ‰ค ๐‘ 3 โ‡’ ๐‘ 1 โ‰ค ๐‘ 3 โ–ก log-linear model on DAG We define the log-linear model on a DAG as a mapping ๐‘: โ†’ 0,1 ๏ผŽNatural parameters ๐œฝ describe the model. ๐œƒ-space ๐œ‚-space We can also describe the model by expectation parameters ๐œผ with Mรถbius function. Mahito Sugiyama, Hiroyuki Nakahara and Koji Tsuda "Tensor balancing on statistical manifoldโ€œ(2017) ICML. 16 Log-linear model on Directed Acyclic Graph (DAG)
  17. Contents 17 โ–ก Introduction of log-linear model on DAG โ–ก

    The best rank-1 approximation formula โ–ก Legendre Tucker-Rank Reduction(LTR) โ–ก The best rank-1 NMMF โ–กA1GM: faster rank-1 missing NMF โ–ก Motivation, Strategy, and Contributions github.com/gkazunii/A1GM github.com/gkazunii/Legendre-tucker-rank-reduction โ–ก Theoretical Remarks โ–ก Conclusion 4:30
  18. Describe a tensor with (ฮธ,ฮท) 25 Random variables Sample space

    Probability values Relation between distribution and tensor Mรถbius inversion formula ๏ผš ๐‘–, ๐‘—, ๐‘˜ , indices of the tensor ๏ผš index set ๏ผš tensor values ๐’ซ๐‘–๐‘—๐‘˜
  19. ๐œฝ-representation of rank-1 tensor 27 One-body parameter Many-body parameter Rank-1

    condition (๐œฝ-representation) Its all many-body ๐œƒ-parameters are 0. Rank-1 subspace
  20. ๐œฝ-representation of rank-1 tensor 28 One-body parameter Many-body parameter Rank-1

    subspace Rank-1 condition (๐œฝ-representation) Its all many-body ๐œƒ-parameters are 0. is e-flat. The projection is unique.
  21. ๐œฝ-representation of rank-1 tensor 29 One-body parameter Many-body parameter We

    can find the projection destination by a gradient-method. But gradient-methods require Appropriate settings for stopping criteria, learning rate, and initial values ๐Ÿ˜ข Rank-1 subspace Rank-1 condition (๐œฝ-representation) Its all many-body ๐œƒ-parameters are 0. is e-flat. The projection is unique.
  22. ๐œฝ-representation of rank-1 tensor 30 Let us describe the rank-1

    condition with the ๐œ‚-parameter. is e-flat. The projection is unique. One-body parameter Many-body parameter Rank-1 subspace Its all many-body ๐œƒ-parameters are 0. Rank-1 condition (๐œฝ-representation) We can find the projection destination by a gradient-method. But gradient-methods require Appropriate settings for stopping criteria, learning rate, and initial values ๐Ÿ˜ข
  23. ๐œผ-representation of rank-1 tensor 31 One-body parameter Many-body parameter Rank-1

    subspace ๐œ‚๐‘–๐‘—๐‘˜ = ๐œ‚๐‘–11 ๐œ‚1๐‘—1 ๐œ‚11๐‘˜ Rank-1 condition (๐œผ- representation) Rank-1 condition (๐œฝ-representation) Its all many-body ๐œƒ-parameters are 0. Rank-1 subspace
  24. ๐œผ-representation of rank-1 tensor 32 The m-projection does not change

    one-body ฮท-parameter = = = Shun-ichi Amari, Information Geometry and Its Applications, 2008, Theorem 11.6 One-body parameter Many-body parameter ๐œ‚๐‘–๐‘—๐‘˜ = ๐œ‚๐‘–11 ๐œ‚1๐‘—1 ๐œ‚11๐‘˜ Rank-1 condition (๐œผ- representation) Rank-1 condition (๐œฝ-representation) Rank-1 subspace Its all many-body ๐œƒ-parameters are 0.
  25. าง ๐œ‚๐‘–๐‘—๐‘˜ = าง ๐œ‚๐‘–11 าง ๐œ‚1๐‘—1 าง ๐œ‚11๐‘˜ Find

    the best rank-1 approximation 33 One-body parameter Many-body parameter Rank-1 condition (๐œผ- representation) Rank-1 condition (๐œฝ-representation) Rank-1 subspace Its all many-body ๐œƒ-parameters are 0.
  26. าง ๐œ‚๐‘–๐‘—๐‘˜ = าง ๐œ‚๐‘–11 าง ๐œ‚1๐‘—1 าง ๐œ‚11๐‘˜ Find

    the best rank-1 approximation 34 One-body parameter Many-body parameter Mรถbius inversion formula = ๐œ‚๐‘–11 ๐œ‚1๐‘—1 ๐œ‚11๐‘˜ Rank-1 condition (๐œผ- representation) Rank-1 condition (๐œฝ-representation) Rank-1 subspace All ๐œผ-parameters after the projection are identified. Using inversion formula, we found the projection destination. Its all many-body ๐œƒ-parameters are 0.
  27. The best rank-1 approximation of ๐’ซ โˆˆ โ„>0 ๐ผร—๐ฝร—๐พ is

    given as which minimizes KL divergence from ๐’ซ. Best rank-1 tensor formula for minimizing KL divergence (๐‘‘ = 3 ) 35 Mean-field approximation and rank-1 approximation We reproduce the result in K.Huang, et al. "Kullback-Leibler principal component for tensors is not NP-hard." ACSSC 2017 9:00
  28. The best rank-1 approximation of ๐’ซ โˆˆ โ„>0 ๐ผร—๐ฝร—๐พ is

    given as which minimizes KL divergence from ๐’ซ. Best rank-1 tensor formula for minimizing KL divergence (๐‘‘ = 3 ) 36 By the way, Frobenius error minimization is NP-hard Mean-field approximation and rank-1 approximation We reproduce the result in K.Huang, et al. "Kullback-Leibler principal component for tensors is not NP-hard." ACSSC 2017
  29. The best rank-1 approximation of ๐’ซ โˆˆ โ„>0 ๐ผร—๐ฝร—๐พ is

    given as which minimizes KL divergence from ๐’ซ. A tensor with ๐‘‘ indices is a joint distribution with ๐‘‘ random variables. A vector with only 1 index is an independent distribution with only one random variable. Best rank-1 tensor formula for minimizing KL divergence (๐‘‘ = 3 ) 37 By the way, Frobenius error minimization is NP-hard Mean-field approximation and rank-1 approximation We reproduce the result in K.Huang, et al. "Kullback-Leibler principal component for tensors is not NP-hard." ACSSC 2017 Normalized vector depending on only ๐‘– Normalized vector depending on only ๐‘— Normalized vector depending on only ๐‘˜
  30. The best rank-1 approximation of ๐’ซ โˆˆ โ„>0 ๐ผร—๐ฝร—๐พ is

    given as which minimizes KL divergence from ๐’ซ. A tensor with ๐‘‘ indices is a joint distribution with ๐‘‘ random variables. A vector with only 1 index is an independent distribution with only one random variable. Rank-1 approximation approximates a joint distribution by a product of independent distributions. Best rank-1 tensor formula for minimizing KL divergence (๐‘‘ = 3 ) 38 By the way, Frobenius error minimization is NP-hard Mean-field approximation and rank-1 approximation We reproduce the result in K.Huang, et al. "Kullback-Leibler principal component for tensors is not NP-hard." ACSSC 2017 Mean-field approximation : a methodology in physics for reducing a many-body problem to a one-body problem. Normalized vector depending on only ๐‘– Normalized vector depending on only ๐‘— Normalized vector depending on only ๐‘˜
  31. MFA of Boltzmann-machine ๐‘ ๐’™ = 1 ๐‘(๐œฝ) exp เท

    ๐‘– ๐œƒ๐‘– ๐‘ฅ๐‘– + เท ๐‘–<๐‘— ๐œƒ๐‘–๐‘— ๐‘ฅ๐‘– ๐‘ฅ๐‘— ๐ท๐พ๐ฟ ๐‘, ฦธ ๐‘ ๐œ‚๐‘– = เท ๐‘ฅ1=0 1 โ‹ฏ เท ๐‘ฅ๐‘›=0 1 ๐‘ฅ๐‘– ๐‘ ๐’™ 39 Interaction Bias Mean-field approximation and rank-1 approximation
  32. MFA of Boltzmann-machine ๐‘ ๐’™ = 1 ๐‘(๐œฝ) exp เท

    ๐‘– ๐œƒ๐‘– ๐‘ฅ๐‘– + เท ๐‘–<๐‘— ๐œƒ๐‘–๐‘— ๐‘ฅ๐‘– ๐‘ฅ๐‘— ๐ท๐พ๐ฟ ๐‘, ฦธ ๐‘ ๐œ‚๐‘– = เท ๐‘ฅ1=0 1 โ‹ฏ เท ๐‘ฅ๐‘›=0 1 ๐‘ฅ๐‘– ๐‘ ๐’™ 40 Interaction Bias Mean-field approximation and rank-1 approximation = 1 ๐‘(๐œฝ) exp เท ๐‘– ๐œƒ๐‘– ๐‘ฅ๐‘– = ๐‘ ๐‘ฅ1 โ€ฆ ๐‘(๐‘ฅ๐‘› )
  33. ๐‘‚ 2๐‘› ๐ท๐พ๐ฟ ๐‘, ฦธ ๐‘ ๐ท๐พ๐ฟ ฦธ ๐‘๐‘’ ,

    ๐‘ าง ๐œ‚๐‘– = sigmoid ๐œƒ๐‘– + เท ๐‘˜ ๐œƒ๐‘˜๐‘— าง ๐œ‚๐‘˜ 41 Mean-field approximation and rank-1 approximation MF equations MFA of Boltzmann-machine ๐‘ ๐’™ = 1 ๐‘(๐œฝ) exp เท ๐‘– ๐œƒ๐‘– ๐‘ฅ๐‘– + เท ๐‘–<๐‘— ๐œƒ๐‘–๐‘— ๐‘ฅ๐‘– ๐‘ฅ๐‘— ๐œ‚๐‘– = เท ๐‘ฅ1=0 1 โ‹ฏ เท ๐‘ฅ๐‘›=0 1 ๐‘ฅ๐‘– ๐‘ ๐’™ ๐‘‚ 2๐‘› Interaction Bias
  34. Rank-1 approximation ๐‘๐œƒ (๐‘–, ๐‘—, ๐‘˜) = exp เท ๐‘–โ€ฒ=1

    ๐‘– เท ๐‘—โ€ฒ=1 ๐‘— เท ๐‘˜โ€ฒ=1 ๐‘˜ ๐œƒ๐‘–โ€ฒ๐‘—โ€ฒ๐‘˜โ€ฒ ๐‘‚ 2๐‘› ๐ท๐พ๐ฟ ๐‘, ฦธ ๐‘ ๐ท๐พ๐ฟ ๐‘, ฦธ ๐‘ MF equations Set of products of independent distributions ๐œ‚๐‘–11 = เท ๐‘—โ€ฒ=1 ๐ฝ เท ๐‘˜โ€ฒ=1 ๐พ ๐’ซ๐‘–๐‘—โ€ฒ๐‘˜โ€ฒ าง ๐œ‚๐‘– = sigmoid ๐œƒ๐‘– + เท ๐‘˜ ๐œƒ๐‘˜๐‘— าง ๐œ‚๐‘˜ 42 Mean-field approximation and rank-1 approximation ๐ท๐พ๐ฟ ฦธ ๐‘๐‘’ , ๐‘ MFA of Boltzmann-machine ๐‘ ๐’™ = 1 ๐‘(๐œฝ) exp เท ๐‘– ๐œƒ๐‘– ๐‘ฅ๐‘– + เท ๐‘–<๐‘— ๐œƒ๐‘–๐‘— ๐‘ฅ๐‘– ๐‘ฅ๐‘— ๐œ‚๐‘– = เท ๐‘ฅ1=0 1 โ‹ฏ เท ๐‘ฅ๐‘›=0 1 ๐‘ฅ๐‘– ๐‘ ๐’™ ๐‘‚ 2๐‘› Interaction Bias
  35. Rank-1 approximation ๐‘๐œƒ (๐‘–, ๐‘—, ๐‘˜) = exp เท ๐‘–โ€ฒ=1

    ๐‘– เท ๐‘—โ€ฒ=1 ๐‘— เท ๐‘˜โ€ฒ=1 ๐‘˜ ๐œƒ๐‘–โ€ฒ๐‘—โ€ฒ๐‘˜โ€ฒ ๐‘‚ 2๐‘› ๐‘‚ ๐ผ๐ฝ๐พ ๐ท๐พ๐ฟ ๐‘, ฦธ ๐‘ ๐ท๐พ๐ฟ ๐‘, ฦธ ๐‘ MF equations Set of products of independent distributions ๐œ‚๐‘–11 = เท ๐‘—โ€ฒ=1 ๐ฝ เท ๐‘˜โ€ฒ=1 ๐พ ๐’ซ๐‘–๐‘—โ€ฒ๐‘˜โ€ฒ าง ๐œ‚๐‘– = sigmoid ๐œƒ๐‘– + เท ๐‘˜ ๐œƒ๐‘˜๐‘— าง ๐œ‚๐‘˜ ๐ถ๐‘œ๐‘š๐‘๐‘ข๐‘ก๐‘Ž๐‘๐‘™๐‘’ 43 Mean-field approximation and rank-1 approximation ๐ท๐พ๐ฟ ฦธ ๐‘๐‘’ , ๐‘ MFA of Boltzmann-machine ๐‘ ๐’™ = 1 ๐‘(๐œฝ) exp เท ๐‘– ๐œƒ๐‘– ๐‘ฅ๐‘– + เท ๐‘–<๐‘— ๐œƒ๐‘–๐‘— ๐‘ฅ๐‘– ๐‘ฅ๐‘— ๐œ‚๐‘– = เท ๐‘ฅ1=0 1 โ‹ฏ เท ๐‘ฅ๐‘›=0 1 ๐‘ฅ๐‘– ๐‘ ๐’™ ๐‘‚ 2๐‘› Interaction Bias
  36. 44 Mean-field approximation and rank-1 approximation Minimizing KL divergence Minimizing

    inverse-KL divergence Rank-1 approximation Mean-field Approximation of BM impossible Closed-formula ๐œ‚๐‘– = ฯƒ ๐œƒ๐‘– + เท ๐‘˜ ๐œƒ๐‘˜๐‘— ๐œ‚๐‘˜ ๐‘‚ 2๐‘› m-projection e-projection Projection onto e-flat space Projection onto e-flat space 44 unique unique not unique
  37. Contents 45 โ–ก Introduction of log-linear model on DAG โ–ก

    The best rank-1 approximation formula โ–ก Legendre Tucker-Rank Reduction(LTR) โ–ก The best rank-1 NMMF โ–กA1GM: faster rank-1 missing NMF โ–ก Motivation, Strategy, and Contributions github.com/gkazunii/A1GM github.com/gkazunii/ Legendre-tucker-rank-reduction โ–ก Theoretical Remarks โ–ก Conclusion 12:00
  38. Formulate Tucker rank reduction by relaxing the rank-1 condition ๐œƒ๐‘–๐‘—๐‘˜

    = 0 ๐œƒ112 ๐œƒ131 ๐œƒ121 ๐œƒ113 ๐œƒ211 ๐œƒ311 Expand the tensor by focusing on the ๐‘š-th axis into a rectangular matrix ๐œƒ(๐‘š) (mode-๐‘š expansion) rank ๐’ซ = 1 โŸบ its all manyโˆ’body ๐œƒ parameters are 0 Rank-1 condition (๐œฝ-representation) 46
  39. ๐œƒ(3) = ๐œƒ111 ๐œƒ211 ๐œƒ311 ๐œƒ121 0 0 ๐œƒ131 0

    0 ๐œƒ112 0 0 0 0 0 0 0 0 ๐œƒ113 0 0 0 0 0 0 0 0 ๐œƒ(1) = ๐œƒ111 ๐œƒ121 ๐œƒ131 ๐œƒ112 0 0 ๐œƒ113 0 0 ๐œƒ211 0 0 0 0 0 0 0 0 ๐œƒ311 0 0 0 0 0 0 0 0 ๐œƒ(2) = ๐œƒ111 ๐œƒ211 ๐œƒ311 ๐œƒ112 0 0 ๐œƒ311 0 0 ๐œƒ121 0 0 0 0 0 0 0 0 ๐œƒ131 0 0 0 0 0 0 0 0 Formulate Tucker rank reduction by relaxing the rank-1 condition ๐œƒ๐‘–๐‘—๐‘˜ = 0 ๐œƒ112 ๐œƒ131 ๐œƒ121 ๐œƒ113 ๐œƒ211 ๐œƒ311 Expand the tensor by focusing on the ๐‘š-th axis into a rectangular matrix ๐œƒ(๐‘š) (mode-๐‘š expansion) Rank 1,1,1 rank ๐’ซ = 1 โŸบ its all manyโˆ’body ๐œƒ parameters are 0 Rank-1 condition (๐œฝ-representation) 47
  40. Formulate Tucker rank reduction by relaxing the rank-1 condition ๐œƒ๐‘–๐‘—๐‘˜

    = 0 ๐œƒ112 ๐œƒ131 ๐œƒ121 ๐œƒ113 ๐œƒ211 ๐œƒ311 Expand the tensor by focusing on the ๐‘š-th axis into a rectangular matrix ๐œƒ(๐‘š) (mode-๐‘š expansion) ๐œƒ(1) = ๐œƒ111 ๐œƒ121 ๐œƒ131 ๐œƒ112 0 0 ๐œƒ113 0 0 ๐œƒ211 0 0 0 0 0 0 0 0 ๐œƒ311 0 0 0 0 0 0 0 0 ๐œƒ(2) = ๐œƒ111 ๐œƒ211 ๐œƒ311 ๐œƒ112 0 0 ๐œƒ311 0 0 ๐œƒ121 0 0 0 0 0 0 0 0 ๐œƒ131 0 0 0 0 0 0 0 0 ๐œƒ(3) = ๐œƒ111 ๐œƒ211 ๐œƒ311 ๐œƒ121 0 0 ๐œƒ131 0 0 ๐œƒ112 0 0 0 0 0 0 0 0 ๐œƒ113 0 0 0 0 0 0 0 0 Rank 1,1,1 Two bingos rank ๐’ซ = 1 โŸบ its all manyโˆ’body ๐œƒ parameters are 0 Rank-1 condition (๐œฝ-representation) Two bingos Two bingos 48
  41. Formulate Tucker rank reduction by relaxing the rank-1 condition ๐œƒ๐‘–๐‘—๐‘˜

    = 0 ๐œƒ112 ๐œƒ131 ๐œƒ121 ๐œƒ113 ๐œƒ211 ๐œƒ311 Expand the tensor by focusing on the ๐‘š-th axis into a rectangular matrix ๐œƒ(๐‘š) (mode-๐‘š expansion) ๐œƒ(1) = ๐œƒ111 ๐œƒ121 ๐œƒ131 ๐œƒ112 0 0 ๐œƒ113 0 0 ๐œƒ211 0 0 0 0 0 0 0 0 ๐œƒ311 0 0 0 0 0 0 0 0 ๐œƒ(2) = ๐œƒ111 ๐œƒ211 ๐œƒ311 ๐œƒ112 0 0 ๐œƒ311 0 0 ๐œƒ121 0 0 0 0 0 0 0 0 ๐œƒ131 0 0 0 0 0 0 0 0 ๐œƒ(3) = ๐œƒ111 ๐œƒ211 ๐œƒ311 ๐œƒ121 0 0 ๐œƒ131 0 0 ๐œƒ112 0 0 0 0 0 0 0 0 ๐œƒ113 0 0 0 0 0 0 0 0 Rank 1,1,1 Two bingos rank ๐’ซ = 1 โŸบ its all manyโˆ’body ๐œƒ parameters are 0 Rank-1 condition (๐œฝ-representation) Two bingos Two bingos The first row and first column are the scaling factors 49
  42. The relationship between bingo and rank ๐œƒ(1) = ๐œƒ111 ๐œƒ121

    ๐œƒ131 ๐œƒ112 0 0 ๐œƒ113 0 0 ๐œƒ211 0 0 0 0 0 0 0 0 ๐œƒ311 ๐œƒ321 ๐œƒ331 ๐œƒ312 ๐œƒ322 ๐œƒ332 ๐œƒ313 ๐œƒ323 ๐œƒ333 ๐œƒ(2) = ๐œƒ111 ๐œƒ211 ๐œƒ311 ๐œƒ112 0 ๐œƒ312 ๐œƒ311 0 ๐œƒ313 ๐œƒ121 0 ๐œƒ321 0 0 ๐œƒ322 0 0 ๐œƒ323 ๐œƒ131 0 ๐œƒ331 0 0 ๐œƒ332 0 0 ๐œƒ333 ๐œƒ(3) = ๐œƒ111 ๐œƒ211 ๐œƒ311 ๐œƒ121 0 ๐œƒ321 ๐œƒ131 0 ๐œƒ331 ๐œƒ112 0 ๐œƒ312 0 0 ๐œƒ322 0 0 ๐œƒ332 ๐œƒ113 0 ๐œƒ313 0 0 ๐œƒ323 0 0 ๐œƒ333 One bingo 50 No bingo No bingo Rank 2,3,3
  43. The relationship between bingo and rank ๐œƒ(1) = ๐œƒ111 ๐œƒ121

    ๐œƒ131 ๐œƒ112 0 0 ๐œƒ113 0 0 ๐œƒ211 0 0 0 0 0 0 0 0 ๐œƒ311 ๐œƒ321 ๐œƒ331 ๐œƒ312 ๐œƒ322 ๐œƒ332 ๐œƒ313 ๐œƒ323 ๐œƒ333 ๐œƒ(2) = ๐œƒ111 ๐œƒ211 ๐œƒ311 ๐œƒ112 0 ๐œƒ312 ๐œƒ311 0 ๐œƒ313 ๐œƒ121 0 ๐œƒ321 0 0 ๐œƒ322 0 0 ๐œƒ323 ๐œƒ131 0 ๐œƒ331 0 0 ๐œƒ332 0 0 ๐œƒ333 ๐œƒ(3) = ๐œƒ111 ๐œƒ211 ๐œƒ311 ๐œƒ121 0 ๐œƒ321 ๐œƒ131 0 ๐œƒ331 ๐œƒ112 0 ๐œƒ312 0 0 ๐œƒ322 0 0 ๐œƒ332 ๐œƒ113 0 ๐œƒ313 0 0 ๐œƒ323 0 0 ๐œƒ333 One bingo ๐œƒ123 โ€ข โ€ข ๐’ซ เดค ๐’ซ ๐ท๐พ๐ฟ ๐’ซ, เดค ๐’ซ m-projection Subspace with one bingo in the mode-1 direction โ„ฌ 1 51 No bingo No bingo Input tensor Rank 2,3,3
  44. The relationship between bingo and rank ๐œƒ(1) = ๐œƒ111 ๐œƒ121

    ๐œƒ131 ๐œƒ112 0 0 ๐œƒ113 0 0 ๐œƒ211 0 0 0 0 0 0 0 0 ๐œƒ311 ๐œƒ321 ๐œƒ331 ๐œƒ312 ๐œƒ322 ๐œƒ332 ๐œƒ313 ๐œƒ323 ๐œƒ333 ๐œƒ(2) = ๐œƒ111 ๐œƒ211 ๐œƒ311 ๐œƒ112 0 ๐œƒ312 ๐œƒ311 0 ๐œƒ313 ๐œƒ121 0 ๐œƒ321 0 0 ๐œƒ322 0 0 ๐œƒ323 ๐œƒ131 0 ๐œƒ331 0 0 ๐œƒ332 0 0 ๐œƒ333 ๐œƒ(3) = ๐œƒ111 ๐œƒ211 ๐œƒ311 ๐œƒ121 0 ๐œƒ321 ๐œƒ131 0 ๐œƒ331 ๐œƒ112 0 ๐œƒ312 0 0 ๐œƒ322 0 0 ๐œƒ332 ๐œƒ113 0 ๐œƒ313 0 0 ๐œƒ323 0 0 ๐œƒ333 One bingo ๐œƒ123 โ€ข โ€ข ๐’ซ เดค ๐’ซ ๐ท๐พ๐ฟ ๐’ซ, เดค ๐’ซ m-projection Subspace with one bingo in the mode-1 direction โ„ฌ 1 52 No bingo No bingo Input tensor The mode-๐‘˜ expansion ๐œƒ(๐‘˜) of the natural parameter of a tensor ๐’ซ โˆˆ โ„ >0 ๐ผ1ร—๐ผ2ร—๐ผ3 has ๐‘๐‘˜ bingos โ‡’ rank ๐’ซ โ‰ค ๐ผ1 โˆ’ ๐‘1 , ๐ผ2 โˆ’ ๐‘2 , ๐ผ3 โˆ’ ๐‘3 Bingo rule (๐‘‘ = 3 ) Rank 2,3,3
  45. Example: Reduce the rank of (8,8,3) tensor to (5,8,3) or

    less 53 ๐œƒ is zero ๐œƒ can be any STEP1 : Choose a bingo location.
  46. Example: Reduce the rank of (8,8,3) tensor to (5,8,3) or

    less Bingo Bingo Bingo 54 ๐œƒ is zero ๐œƒ can be any STEP1 : Choose a bingo location.
  47. ๐œƒ is zero ๐œƒ can be any STEP1 : Choose

    a bingo location. The shaded areas do not change their values in the projection. 55 Example: Reduce the rank of (8,8,3) tensor to (5,8,3) or less STEP2 : Replace the bingo part with the best rank-1 tensor.
  48. Replace the partial tensor in the red box using the

    best rank-1 approximation formula 56 Example: Reduce the rank of (8,8,3) tensor to (5,8,3) or less STEP2 : Replace the bingo part with the best rank-1 tensor. STEP1 : Choose a bingo location. ๐œƒ is zero ๐œƒ can be any
  49. Replace the partial tensor in the red box using the

    best rank-1 approximation formula 57 Example: Reduce the rank of (8,8,3) tensor to (5,8,3) or less STEP2 : Replace the bingo part with the best rank-1 tensor. STEP1 : Choose a bingo location. ๐œƒ is zero ๐œƒ can be any
  50. Replace the partial tensor in the red box using the

    best rank-1 approximation formula The best tensor is obtained in the specified bingo space. ๐Ÿ˜„ There is no guarantee that it is the best rank (5,8,3) approximation. ๐Ÿ˜ข 58 Example: Reduce the rank of (8,8,3) tensor to (5,8,3) or less STEP2 : Replace the bingo part with the best rank-1 tensor. STEP1 : Choose a bingo location. ๐œƒ is zero ๐œƒ can be any
  51. 59 Example: Reduce the rank of (8,8,3) tensor to (5,7,3)

    or less STEP2 : Replace the bingo part with the best rank-1 tensor. STEP1 : Choose a bingo location. ๐œƒ is zero ๐œƒ can be any The shaded areas do not change their values in the projection.
  52. 61 Experimental results (real data) LTR is faster with the

    competitive approximation performance. (92, 112, 400) (9, 9, 512, 512, 3)
  53. Contents 62 โ–ก Introduction of log-linear model on DAG โ–ก

    The best rank-1 approximation formula โ–ก Legendre Tucker-Rank Reduction(LTR) โ–ก The best rank-1 NMMF โ–กA1GM: faster rank-1 missing NMF โ–ก Motivation, Strategy, and Contributions github.com/gkazunii/A1GM github.com/gkazunii/ Legendre-tucker-rank-reduction โ–ก Theoretical Remarks โ–ก Conclusion 16:40
  54. Strategy for rank-1 NMF with missing values 63 If ๐—๐‘–๐‘—

    is missing otherwise Element-wise product ๐šฝ๐‘–๐‘— = แ‰Š 0 1 โ–ก Collect missing values in a corner of matrix to solve as coupled NMF Missing value
  55. Strategy for rank-1 NMF with missing values 64 NMMF (Takeuchi

    et al., 2013) ๐šฝ๐‘–๐‘— = แ‰Š 0 1 If ๐—๐‘–๐‘— is missing otherwise Element-wise product Missing value โ–ก Collect missing values in a corner of matrix to solve as coupled NMF Equivalent
  56. NMMF, Nonnegative multiple matrix factorization (Takeuchi et al., 2013) 65

    user artist tag user user tag artist user user artist
  57. The best rank-1 approximation of NMMF The best rank-1 approximation

    of NMMF 66 user artist tag user user tag artist user user artist
  58. One-body and many-body parameters 68 ๐‘ฟ, ๐’€, ๐’ is simultaneously

    rank-1 decomposable. โ‡” It can be written as ๐’˜ โŠ— ๐’‰, ๐’‚ โŠ— ๐’‰, ๐’˜ โŠ— ๐’ƒ . One-body parameter Two-body parameter
  59. Information geometry of rank-1 NMMF 69 ๐‘ฟ, ๐’€, ๐’ is

    simultaneously rank-1 decomposable. โ‡” It can be written as ๐’˜ โŠ— ๐’‰, ๐’‚ โŠ— ๐’‰, ๐’˜ โŠ— ๐’ƒ . Its all two-body ๐œƒ-parameters are 0. Simultaneous Rank-1 ๐œฝ-condition One-body parameter Two-body parameter
  60. Information geometry of rank-1 NMMF 70 ๐œ‚๐‘–๐‘— = ๐œ‚๐‘–1 ๐œ‚1๐‘—

    Simultaneous Rank-1 ๐œผ-condition Its all two-body ๐œƒ-parameters are 0. Simultaneous Rank-1 ๐œฝ-condition ๐‘ฟ, ๐’€, ๐’ is simultaneously rank-1 decomposable. โ‡” It can be written as ๐’˜ โŠ— ๐’‰, ๐’‚ โŠ— ๐’‰, ๐’˜ โŠ— ๐’ƒ . One-body parameter Two-body parameter is e-flat. The projection is unique.
  61. Find the global optimal solution of rank-1 NMMF 71 ๐œ‚๐‘–๐‘—

    = ๐œ‚๐‘–1 ๐œ‚1๐‘— Simultaneous Rank-1 ๐œผ-condition Its all two-body ๐œƒ-parameters are 0. Simultaneous Rank-1 ๐œฝ-condition ๐‘ฟ, ๐’€, ๐’ is simultaneously rank-1 decomposable. โ‡” It can be written as ๐’˜ โŠ— ๐’‰, ๐’‚ โŠ— ๐’‰, ๐’˜ โŠ— ๐’ƒ . One-body parameter Two-body parameter The m-projection does not change one-body ฮท-parameter Shun-ichi Amari, Information Geometry and Its Applications, 2008, Theorem 11.6
  62. Find the global optimal solution of rank-1 NMMF 72 ๐œ‚๐‘–๐‘—

    = ๐œ‚๐‘–1 ๐œ‚1๐‘— Simultaneous Rank-1 ๐œผ-condition Its all two-body ๐œƒ-parameters are 0. Simultaneous Rank-1 ๐œฝ-condition ๐‘ฟ, ๐’€, ๐’ is simultaneously rank-1 decomposable. โ‡” It can be written as ๐’˜ โŠ— ๐’‰, ๐’‚ โŠ— ๐’‰, ๐’˜ โŠ— ๐’ƒ . One-body parameter Two-body parameter The m-projection does not change one-body ฮท-parameter Shun-ichi Amari, Information Geometry and Its Applications, 2008, Theorem 11.6 All ๐œผ-parameters after the projection are identified. 19:20
  63. Rank-1 NMF with missing values โ–ก NMMF can be viewed

    as a special case of NMF with missing values. Equivalent 73
  64. Rank-1 NMF with missing values โ–ก NMMF can be viewed

    as a special case of NMF with missing values. Equivalent โ–ก NMF is homogeneous for row and column permutations 74
  65. A1GM: Algorithm Step 1 : Gather missing values in the

    bottom right. Step 2 : Use the formula of the best rank-1 NMMF. 75 Step 3 : Repermutate Find exact solution ๐Ÿค”โ“
  66. Add missing values to solve the problem as NMMF 78

    Reconstruction error worsens ๐Ÿ˜ข
  67. Add missing values to solve the problem as NMMF 79

    Gain in efficiency ๐Ÿ˜€ Reconstruction error worsens ๐Ÿ˜ข
  68. ๐Ÿ™†Data that A1GM is good at and not good at๐Ÿ™…

    80 ๐Ÿ™… Missing values are evenly distributed in each row and column.
  69. ๐Ÿ™†Data that A1GM is good at and not good at๐Ÿ™…

    81 Missing values tend to be in certain columns in some real datasets. ex) disconnected sensing device, optional answer field in questionnaire form ๐Ÿ™… Missing values are evenly distributed in each row and column. ๐Ÿ™† Missing are heavily distributed in certain rows and columns.
  70. A1GM: Algorithm Step 1 : Increase the number of missing

    values. Step 2 : Gather missing values in the bottom right. Step 3 : Use the formula of rank-1 NMMF and repermutate. 82
  71. Experiments on real data โ–ก A1GM is compared with gradient-based

    KL-WNMF - Relative runtime < 1 means A1GM is faster than KL-WNMF. - Relative error > 1 means worse reconstruction error of A1GM than KL-WNMF. - Increase rate is the ratio of # missing values after addition of missing values at step1. ร—5 โ€“ 10 times faster! 83 Find the best solution Add missing values. Accuracy decreases.
  72. Contents 84 โ–ก Introduction of log-linear model on DAG โ–ก

    The best rank-1 approximation formula โ–ก Legendre Tucker-Rank Reduction(LTR) โ–ก The best rank-1 NMMF โ–กA1GM: faster rank-1 missing NMF โ–ก Motivation, Strategy, and Contributions github.com/gkazunii/A1GM github.com/gkazunii/Legendre-tucker-rank-reduction โ–ก Theoretical Remarks โ–ก Conclusion 22:30
  73. Theoretical Remarks 1 : Extended NMMF. ๐šฝ๐‘–๐‘— = แ‰Š 0

    1 If ๐—๐‘–๐‘— is missing otherwise โ–ก The rank of weight matrix is 2 after adding missing values. ๐šฝ ๐šฝ ๐— ๐— rank ๐šฝ = 2 rank ๐šฝ = 2 85
  74. Theoretical Remarks 1 : Extended NMMF. ๐šฝ๐‘–๐‘— = แ‰Š 0

    1 If ๐—๐‘–๐‘— is missing otherwise โ–ก The rank of weight matrix is 2 after adding missing values. โ–ก Can we exactly solve rank-1 NMF if the rank(ฮฆ) = 2? ๐šฝ ๐šฝ ๐— ๐— rank ๐šฝ = 2 rank ๐šฝ = 2 rank ๐šฝ = 2 86
  75. Theoretical Remarks 1 : Extended NMMF. 88 The best rank-1

    approximation of extended NMMF Equivalent ๐šฝ๐‘–๐‘— = แ‰Š 0 1 If ๐—๐‘–๐‘— is missing otherwise
  76. Theoretical Remarks 1 : Extended NMMF. 89 The best rank-1

    approximation of extended NMMF Equivalent If rank(๐šฝ) โ‰ฆ2, the matrix can be transformed into the form ๐šฝ๐‘–๐‘— = แ‰Š 0 1 If ๐—๐‘–๐‘— is missing otherwise Permutation We can exactly solve rank-1 NMF with missing values by permutation if rank(๐šฝ) โ‰ฆ2.
  77. Theoretical Remarks 2 : Connection to balancing. 90 Transform Balanced

    matrix (Doubly stochastic matrix) โ–ก Matrix Balancing Mahito Sugiyama, Hiroyuki Nakahara and Koji Tsuda "Tensor balancing on statistical manifoldโ€œ(2017) ICML.
  78. Theoretical Remarks 2 : Connection to balancing. 91 Transform Balanced

    matrix (Doubly stochastic matrix) Balancing condition โ–ก Matrix Balancing Mahito Sugiyama, Hiroyuki Nakahara and Koji Tsuda "Tensor balancing on statistical manifoldโ€œ(2017) ICML.
  79. Theoretical Remarks 2 : Connection to balancing. 92 Transform Balanced

    matrix (Doubly stochastic matrix) Balancing condition โ–ก Matrix Balancing Rank-1 condition Its all many-body ๐œƒ-parameters are 0. Balanced rank-1 matrix is unique.
  80. โ–ก Describe low-rank condition using (๐œƒ,๐œ‚) Rank-1 condition (๐œผ-representation) าง

    ๐œ‚๐‘–๐‘—๐‘˜ = าง ๐œ‚๐‘–11 าง ๐œ‚1๐‘—1 าง ๐œ‚11๐‘˜ Rank-1 condition (๐œฝ-representation) All many body าง ๐œƒ๐‘–๐‘—๐‘˜ are 0 93 โ–ก Closed Formula of the Best Rank-1 NMMF โ–ก A1GM: Faster Rank-1 NMF with missing values Conclusion The best rank-1 approximation for NMMF Data structure DAG Infor-Geo 93