"Lessons from working on the edge of human brain transcriptomics with spatially-resolved transcriptomics and deconvolution" seminar on 2023-05-23 at The Francis Crick Institute
human brain transcriptomics with spatially-resolved transcriptomics and deconvolution Leonardo Collado Torres, Investigator The Francis Crick Institute May 23 2023 Slides available at speakerdeck.com/lcolladotor
MuSiC Wang et al, Nature Communications, 2019 W-NNLS regression (Weighted - Non-negative least squares) None Tree guided deconvolution, good for closely related cell types Bisque Jew et al, Nature Communications, 2020 NNLS regresion Gene specific transformation of bulk data Leverage overlapping bulk & sc data SCDC Dong et al, Briefings in Bioinformatics, 2020 W-NNLS framework proposed by MuSiC Option for Gene specific transformation of bulk data (from Bisque) Multiple reference datasets can be used, results combined with ENSEMBL weights DWLS Tsoucas, Nature Communications, 2019 Dampened Weighted least squares None 15
vs. Bulk Tissues Tested Consider Cell Size Reference Set MuSiC W-NNLS Min. Internal Weighting No Pancreatic Islet, Rat & Mouse Kidney Yes Bisque NNLS Min. No Yes Adipose, DLPFC Recommend 3+ donors SCDC W-NNLS Min. Internal Weighting Yes Pancreatic Islet, mouse mammary Can input multiple references DWLS DWLS Hours Internal Selection No Mouse kidney, lung, liver, small intestine 16
different methods perform best on different data sets (Cobos et al, Nature Communications, 2020) • Benchmarking results from different papers on “real” data ◦ MuSiC paper: MuSiC > NNLS > BSEQ-sx > CIBERSORT ▪ Pancreatic Islet: Beta cells vs. HbA1c (Fig 2a) ◦ Bisque paper: Bisque > MuSiC > CIBERSORT ▪ DLPFC: Microglia vs. Braak stage, Neuron vs. Cognitive diagnostic category (Fig 4) ◦ SCDC paper: SCDC > MuSiC > Bisque > DWLS > CIBERSORT ▪ Pancreatic Islet: Beta cells vs. HbA1c (Fig 4b) ◦ Cobos benchmark: DWLS > MuSiC > Bisque > deconvoSeq ▪ Human PMBC flow sorted (Fig 7) 17 Louise A Huuki-Myers @lahuuki
per cell type • Bisque is more robust to changes in the marker set than MuSiC Method Sensitivity to Marker Set 25 vs. 20 Genes Louise A Huuki-Myers @lahuuki
and RNA content between cell types • Use smFISH with RNAscope to establish data set of: ◦ Cellular composition ◦ Nuclei sizes of major cell types ◦ Average nuclei RNA content of major cell types How do we measure total RNA content of a cell if we can only observe a few genes at a time? Use a TREG Data-driven Identification of Total RNA Expression Genes (TREGs) for Estimation of RNA Abundance in Heterogeneous Cell Types research.libd.org/TREG/ doi.org/10.1101/2022.04.28.489923 Louise A Huuki-Myers @lahuuki #TREG
Expression is proportional to the overall RNA expression in a nucleus • In smFISH the count of TREG puncta in a nucleus can estimate the RNA content Data-driven Identification of Total RNA Expression Genes (TREGs) for Estimation of RNA Abundance in Heterogeneous Cell Types research.libd.org/TREG/ doi.org/10.1101/2022.04.28.489923 #TREG
cells • AKT3 tracks really well with pattern of expression seen in snRNA-seq (ARID1B is also pretty good) snRNA-seq RNAscope Gene Mean Prop. Cells with Expression Prop. non-zero in DLPFC snRNA Standardized β (95% CI) AKT3 0.948 0.92 -1.38 (-1.39,-1.37) ARID1B 0.908 0.94 -0.62 (-0.62,-0.61) MALAT1 0.910 1.00 -0.11 (-0.12,-0.11) POLR2A 0.853 0.30 -0.98 (-0.99,-0.98) snRNA-seq NA NA -1.33 (-1.35,-1.31) Remember: MALAT1’s puncta data is unreliable research.libd.org/TREG/ doi.org/10.1101/2022.04.28.489923
- A slide contains 4 capture areas, each full of thousands of 55um-wide “spots” (often containing 1-10 cells) - Unique barcodes in each spot bind to particular genes; after sequencing, gene expression can be tied back to exact spots, forming a spatial map Kristen R. Maynard 33
• k=2: separate white vs. grey matter • k=9: best reiterated histological layers • k=16: data-driven optimal k based on fast H+ statistic 42 More Clusters = More Complexity doi.org/10.1101/2023.02.15.528722
structure • Correlate enrichment t-statistics for top marker genes of reference ◦ Cluster vs. manual annotation • Annotate with strongly associated histological layer 43 Sp k D d ~L doi.org/10.1101/2023.02.15.528722
from the literature Software name Overall approach Input Cell Counts Output Tangram (Biancalani et al.) Mapping individual cells Every spot Integer counts Cell2location (Kleshchevnikov et al.) Matching gene-expression profile Average across spots Decimal counts SPOTlight (Elosua-Bayes et al.) Matching gene-expression profile Not used Proportions 48 Excit L5 Counts
mark for several proteins - Fluorescence in image channels correlates with counts of measured cell types Can measure 5 distinct cell types: 53 • Astrocyte (GFAP) • Neuron (NeuN) • Oligodendrocyte (OLIG2) • Microglia (TMEM119) • Other (low signal in all channels) doi.org/10.1101/2023.01.28.525943 Sriworarat, 2023. samuibrowser.com Sang Ho Kwon @sanghokwon17
IF image 2. Manually label N cells 3. Train cell-type classifier and apply on remaining data Sriworarat, 2023. samuibrowser.com 54 Nicholas J Eagles @Nick-Eagles (GitHub) doi.org/10.1101/2023.01.28.525943
image 2. Manually label example cells 3. Train cell-type classifier and apply on remaining data Image Channels Classified Cell Type Cell Mask 55 Annie B. Nguyen
on 600-cell dataset - Broke cells into 4 quartiles based on model confidence - Labelled 320 more cells, evenly sampled from all 4 quartiles 57 Cell Type Probability Astro 0.2 Oligo 0.3 Micro 0.1 Neuron 0.45 Other 0.05 4 quartiles * 4 sections * 5 cell types * 4 cells = 320 new cells total 600 old cells + 320 new cells = 920 labeled cells Nicholas J Eagles @Nick-Eagles (GitHub)
tree 0.86 0.87 Dataset # Training # Test Split Old 600 480 120 80/20 New 320 240 80 75/25 Combined 920 720 200 ~78/22 1. Segment cells on IF image 2. Manually label N cells 3. Train cell-type classifier and apply on remaining data Grid search with 5-fold CV for each model to select hyperparameters Data Model Final model chosen
spatial domains B. Cell-cell communication; cell-type-informed ligand-receptor interactions in the context of schizophrenia risk A 66 Boyi Guo Melissa Grant-Peters
aim to use the best methods 78 Moses, L., Pachter, L. Museum of spatial transcriptomics. Nat Methods 19, 534–546 (2022). https://doi.org/10.1038/s41592-022-01409-2
software can change dramatically (function and syntax) between versions - Promotes collaboration by allowing two researchers to share exact code and instantly run software without special set-up SpatialExperiment release 3.14 SpatialExperiment devel 3.15 module load tangram/1.0.2 module load cell2location/0.8a0 module load spagcn/1.2.0 https://github.com/LieberInstitute/jhpce_mod_source https://github.com/LieberInstitute/jhpce_module_config Nicholas J Eagles @Nick-Eagles (GitHub)
clarify functionality and report bugs - Documentation for code and author responsiveness on GitHub can be critical in successfully applying software to our data Nicholas J Eagles @Nick-Eagles (GitHub)
think adding 0, multiplying by 1 • It nearly always takes a team • Data sharing accelerates science + democratizes access to it • Zooming in allows us to reduce the heterogeneity • We can learn from each other: from uniformly processing our data & re-using it → replicate / validate?
Weber @stephaniehicks Stephanie C Hicks @abspangler Abby Spangler @martinowk Keri Martinowich @CerceoPage Stephanie C Page @kr_maynard Kristen R Maynard @lcolladotor Leonardo Collado-Torres @Nick-Eagles (GH) Nicholas J Eagles Kelsey D Montgomery Sang Ho Kwon Image Analysis Expression Analysis Data Generation Thomas M Hyde @lahuuki Louise A Huuki-Myers @BoyiGuo Boyi Guo @mattntran Matthew N Tran @sowmyapartybun Sowmya Parthiban Slides available at speakerdeck.com /lcolladotor + Many more LIBD, JHU, and external collaborators @mgrantpeters Melissa Grant-Peters @prashanthi-ravichandran (GH) Prashanthi Ravichandran