software tool to select more reasonable differentially methylated cytosines Y-H. Taguchi, Department of Physics, Chuo University, Tokyo 112-8551, Japan Turki Turki, Department of Computer Science, King Abdulaziz University, Jeddah 21589, Saudi Arabia Preprint: doi: https://doi.org/10.1101/2022.04.02.486807 GIW2022 1 1
based unsupervised feature extraction (FE), and successfully applied to gene expression. We would like to see if the method is applicable to DNA methylation without methylation specific modification. GIW2022 2
Apply PCA to matrices (e.g., genes ⨉ samples) or TD to tensors (e.g. genes ⨉ samples ⨉ tissues) and get vectors attributed separately to genes, samples, or tissues. 2. Select the vectors of interest, attributed to samples and tissues. 3. Select genes whose contribution to corresponding vectors attributed to genes are larger (based upon the null hypothesis of Gaussian distribution of components of vectors). GIW2022 4
1 L 2 L 3 HOSVD (Higher Order Singular Value Decomposition) N M K x ijk : genes number of N: genes (i), M: samples (j),K: tissues (k) example GIW2022 6
for various problems, they have some problems. 1. Histogram of 1-P does not fully obey the null hypothesis 2. Too small genes are selected to think that there are no false negatives. GIW2022 10
σ h n 0 h n h n n n Select genes with adjusted P i <0.01 Cumulative χ2 distribution Histogram 1-P i , h n of nth bin Adjusted P(n 0 )=0.1 Left: GIW2022 13 gene expression