Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Tensor Decomposition Based Integrated Analysis ...

Tensor Decomposition Based Integrated Analysis of Protein-Protein Interaction with Cancer Gene Expression Can Improve the Coincidence with Clinical Labels

Presentation at ICBCB2023
http://www.icbcb.org/

Y-h. Taguchi

April 24, 2023
Tweet

More Decks by Y-h. Taguchi

Other Decks in Science

Transcript

  1. Tensor Decomposition Based Integrated Analysis of Protein-Protein Interaction with Cancer

    Gene Expression Can Improve the Coincidence with Clinical Labels Y-h. Taguchi, Department of Physics, Chuo University Turki Turki, Department of Computer Science, King Abdulaziz University For material request Wechat →
  2. Motivation: Integrated analysis of protein-protein interaction (PPI) and gene expression

    gene expression profiles: N genes ✕ M samples PPI: N genes (proteins) ✕ N genes (proteins) How can we integrate two matrices with distinct dimensions?
  3. Using tensor decomposition (TD) = Tucker decomposition x ijk ∈ℝN

    ×M×K =∑ l 1 =1 N ∑ l 2 =1 M ∑ l 3 =1 K G(l 1 l 2 l 3 )u l 1 i u l 2 j u l 3 k How can we make use of TD to integrate PPI and gene expression?
  4. Apply singular value decomposition (SVD) to gene expression profile x

    ij ∈ℝN✕M and PPI n ii’ ∈ℝN✕N x ij =∑ l=1 L λl ' u ' li v lj n ii' =∑ l=1 L λl u li u li' Bundle u li and u’ li to generate tensor x ilk ∈ℝN✕L✕2 x il1 =u li , x il 2 =u ' li Apply TD to x ilk x ilk ∈ℝN ×L×2 =∑ l 1 =1 N ∑ l 2 =1 L ∑ l 3 =1 2 G(l 1 l 2 l 3 )~ u l 1 i ~ u l 2 l ~ u l 3 k (gene expression) (PPI)
  5. N genes M samples Gene expression N genes N genes

    SVD N genes N genes L SVV L SVV N ⨉ L ⨉ 2 2 (gene expression or network)⨉ 2 SVV N genes ⨉ N SVV L SVV ⨉ L SVV N genes L SVV ⨉ M samples L SVV M samples Class labeling Comparisons HOSVD M samples L SVV PPI
  6. Recover singular value vector (SVV) attributed to sample j as

    ~ v lj =∑ i=1 N x ij ~ u li Compare coincidence between v lj or v~ lj with class label (categorical regression) v lj =a l +∑ s=1 S b ls δjs ~ v lj =a ' l +∑ s=1 S b' ls δjs δ js = 1 only when sample j belongs to class s otherwise 0 (SVD) (TD)
  7. N genes M samples Gene expression N genes N genes

    SVD N genes N genes L SVV L SVV N ⨉ L ⨉ 2 2 (gene expression or network)⨉ 2 SVV N genes ⨉ N SVV L SVV ⨉ L SVV N genes L SVV ⨉ M samples L SVV M samples Class labeling Comparisons HOSVD M samples L SVV PPI
  8. (1) “PATIENT. VITAL STATUS”, (2) “PATIENT . STAGE EVENT.PATHOLOGIC STAGE

    ”, (3)“PATIENT.STAGE EVENT.TNM CATEGORIES . PATHOLOGIC CATEGORIES. PATHOLOGIC M” (4) “PATIENT. STAGE EVENT.TNM CATEGORIES.PATHOLOGIC CATEGORIES .PATHOLOGIC T ” (5) “ PATIENT.STAGE EVENT. TNM CATEGORIES. PATHOLOGIC CATEGORIES .PATHOLOGIC N ” (6) AUC for “ PATIENT. VITAL STATUS”. 27 cancers with 5 kinds of classes
  9. Scatter plot of P P values values (ascending order) obtained

    by obtained by categorical regression categorical regression for SVD and TD for 27 cancers in class (1) Red triangles: significant Blue crosses: not significant TD (vertical axis) is more coincident with classes than SVD (horizontal axis) Class (1)
  10. Compare P values obtained by categorical regression for SVD and

    TD with either t test or Wilcoxon test for class (1) to (5). TD is more coincident with classes than SVD other than class (3)
  11. AUC for the discrimination task for class (1) with 11

    cancers with significant P-values for categorical regression. TD (vertical axis) is almost always better than SVD.
  12. Conclusions Integrated analysis of gene expression and PPI can improve

    the coincidence between SVV with class labels although PPI does not PPI does not include class information at all. include class information at all. doi:10.18129/B9.bioc.TDbasedUFE doi:10.18129/B9.bioc.TDbasedUFE doi: doi:10.18129/B9.bioc.TDbasedUFEadv 10.18129/B9.bioc.TDbasedUFEadv Analyses with TD can be performed in two bioconductor packages by myself