Tensor decomposition based unsupervised feature extraction with optimized standard deviation applied to differentially expressed genes, DNA methylation and histone modification
optimized standard deviation applied to differentially expressed genes, DNA methylation and histone modification Y-h. Taguchi Ryo Ishibashi Department of Physics, Chuo University, Tokyo, Japan
identification of differentially expressed genes (DEGs) can outperform various state of art methods when standard deviations (SDs) used to generate the null hypothesis are optimized. 2. They are applicable to identification of differentially methylated cytosine (DMCs) as well as differential histone modification without specific modification as well.
three preprints in bioRxiv. • https://doi.org/10.1101/2022.02.18.481115 • https://doi.org/10.1101/2022.04.02.486807 • https://doi.org/10.1101/2022.04.29.490081
P, small N problem) ◆ Why do we choose PCA and TD ? ⚫Methods ◆What is TD ? ◆Analysis Procedure ⚫Results ◆Benchmark data set for DEG ◆DNA methylation ◆Histone modification ⚫Conclusion For detailed information, refer to here. This book is available from [https://link.springer.com/book/ 10.1007/978-3-030-22456-1]
P, small N problem) ◆ Why do we choose PCA and TD ? ⚫Methods ◆What is TD ? ◆Analysis Procedure ⚫Results ◆Benchmark data set for DEG ◆DNA methylation ◆Histone modification ⚫Conclusion For detailed information, refer to here. This book is available from [https://link.springer.com/book/ 10.1007/978-3-030-22456-1]
N problem) large P Traditionally, the number of variables is large Ex.) genes, epigenomes small N The number of samples is small Ex.) subjects, experimental animals, or cultured cells → difficult to handle computationally.
P, small N problem) ◆ Why do we choose PCA and TD ? ⚫Methods ◆What is TD ? ◆Analysis Procedure ⚫Results ◆Benchmark data set for DEG ◆DNA methylation ◆Histone modification ⚫Conclusion For detailed information, refer to here. This book is available from [https://link.springer.com/book/ 10.1007/978-3-030-22456-1]
P, small N problem) ◆ Why do we choose PCA and TD ? ⚫Methods ◆What is TD ? ◆Analysis Procedure ⚫Results ◆Benchmark data set for DEG ◆DNA methylation ◆Histone modification ⚫Conclusion For detailed information, refer to here. This book is available from [https://link.springer.com/book/ 10.1007/978-3-030-22456-1]
ul2j ul3k L1 L2 L3 HOSVD (Higher Order Singular Value Decomposition) N M K 𝑥𝑖𝑗𝑘 ≃ 𝑙1=1 𝐿1 𝑙2=1 𝐿2 𝑙3=1 𝐿3 𝐺 𝑙1 𝑙2 𝑙3 𝑢𝑙1𝑖 𝑢𝑙2𝑗 𝑢𝑙3𝑘 N: number of genes (i) M: number of samples (j) K: number of tissues (k) xijk : gene expression Example
tissue specific Differentially Expressed Genes tDEG: Healthy controls < Patients tDEG: Healthy controls > Patients For some specific l1 with max |G(l1 l2 l3 )| If G(l1 l2 l3 )>0 Fixed
P, small N problem) ◆ Why do we choose PCA and TD ? ⚫Methods ◆What is TD ? ◆Analysis Procedure ⚫Results ◆Benchmark data set for DEG ◆DNA methylation ◆Histone modification ⚫Conclusion For detailed information, refer to here. This book is available from [https://link.springer.com/book/ 10.1007/978-3-030-22456-1]
P, small N problem) ◆ Why do we choose PCA and TD ? ⚫Methods ◆What is TD ? ◆Analysis Procedure ⚫Results ◆Benchmark data set for DEG ◆DNA methylation ◆Histone modification ⚫Conclusion For detailed information, refer to here. This book is available from [https://link.springer.com/book/ 10.1007/978-3-030-22456-1]
x ij represents expression of ith gene at jth sample Samples: seven Universal Human Reference RNA (UHRR) vs seven Human Brain Reference RNA (HBRR) Measured for 40933 genes (done by the presenter) (*)https://www.fda.gov/science-research/bioinformatics- tools/microarraysequencing-quality-control-maqcseqc
Right: optimal σ minimizes σ h n 0 h n h n n n Select genes with adjusted P i <0.1 Cumulative χ2 distribution Histogram 1-P i , h n of nth bin Adjusted P(n 0 )=0.1
P, small N problem) ◆ Why do we choose PCA and TD ? ⚫Methods ◆What is TD ? ◆Analysis Procedure ⚫Results ◆Benchmark data set for DEG ◆DNA methylation ◆Histone modification ⚫Conclusion For detailed information, refer to here. This book is available from [https://link.springer.com/book/ 10.1007/978-3-030-22456-1]
P, small N problem) ◆ Why do we choose PCA and TD ? ⚫Methods ◆What is TD ? ◆Analysis Procedure ⚫Results ◆Benchmark data set for DEG ◆DNA methylation ◆Histone modification ⚫Conclusion For detailed information, refer to here. This book is available from [https://link.springer.com/book/ 10.1007/978-3-030-22456-1]
optimized SD can be applied to identification of differential histone modification without the specific modification • https://doi.org/10.1101/2022.04.29.490081
P, small N problem) ◆ Why do we choose PCA and TD ? ⚫Methods ◆What is TD ? ◆Analysis Procedure ⚫Results ◆Benchmark data set for DEG ◆DNA methylation ◆Histone modification ⚫Conclusion For detailed information, refer to here. This book is available from [https://link.springer.com/book/ 10.1007/978-3-030-22456-1]
and tensor decomposition (TD) based- unsupervised feature extraction (FE) applied to identification of differentially expressed genes (DEGs) can outperform various state of art methods including DESeq2, when standard deviations (SDs) used to generate the null hypothesis (Gaussian distribution of principal components) are optimized. 2. They are applicable to identification of differentially methylated cytosine (DMCs) as well as differential histone modification without specific modification as well.