Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Comprehensive Variant Detection with PacBio HiF...

William Rowell
September 17, 2020

Comprehensive Variant Detection with PacBio HiFi Reads

This presentation was for a UC system webinar.

William Rowell

September 17, 2020
Tweet

More Decks by William Rowell

Other Decks in Science

Transcript

  1. For Research Use Only. Not for use in diagnostic procedures.

    © Copyright 2020 by Pacific Biosciences of California, Inc. All rights reserved. Comprehensive Variant Detection with PacBio HiFi Reads William Rowell, Staff Scientist, Bioinformatics Applications, PacBio @nothingclever
  2. HIFI READS PROVIDE A COMPREHENSIVE VIEW OF VARIATION IN THE

    HUMAN GENOME 5 Mb 3 Mb 10 Mb 1 bp SNVs ≥50 bp structural variants (SVs) 1-49 bp indels HiFi reads Short reads vs GRCh38
  3. HIFI READS PROVIDE A COMPREHENSIVE VIEW OF VARIATION IN THE

    HUMAN GENOME 5 Mb 3 Mb 10 Mb 1 bp SNVs ≥50 bp structural variants (SVs) 1-49 bp indels HiFi reads Short reads vs GRCh38 SNVs and indels in difficult regions
  4. HIFI READS PROVIDE A COMPREHENSIVE VIEW OF VARIATION IN THE

    HUMAN GENOME Short reads HiFi reads STRC STRC is a congenital deafness gene that requires long reads to cover all exons.
  5. HIFI READS IMPROVE MAPPABILITY IN MANY MEDICALLY-RELEVANT GENES % problem

    exons resolved Genes 100% ABCC6, ABCD1, ACAN, ACSM2B, AKR1C2, ALG1, ANKRD11, BCR, CATSPER2, CD177, CEL, CES1, CFH, CFHR1, CFHR3, CFHR4, CGB, CHEK2, CISD2, CLCNKA, CLCNKB, CORO1A, COX10, CRYBB2, CSH1, CYP11B1, CYP11B2, CYP21A2, CYP2A6, CYP2D6, CYP2F1, CYP4A22, DDX11, DHRS4L1, DIS3L2, DND1, DPY19L2, DUOX2, ESRRA, F8, FAM120A, FAM205A, FANCD2, FCGR1A, FCGR2A, FCGR3A, FCGR3B, FLG, FLNC, FOXD4, FOXO3, FUT3, GBA, GFRA2, GON4L, GRM5, GSTM1, GYPA, GYPB, GYPE, HBA1, HBA2, HBG1, HBG2, HP, HS6ST1, IDS, IFT122, IKBKG, IL9R, KIR2DL1, KIR2DL3, KMT2C, KRT17, KRT6A, KRT6B, KRT6C, KRT81, KRT86, LEFTY2, LPA, MST1, MUC5B, MYH6, MYH7, NEB, NLGN4X, NLGN4Y, NOS2, NOTCH2, NXF5, OPN1LW, OR2T5, OR51A2, PCDH11X, PCDHB4, PGAM1, PHC1, PIK3CA, PKD1, PLA2G10, PLEKHM1, PLG, PMS2, PRB1, PRDM9, PROS1, RAB40AL, RALGAPA1, RANBP2, RHCE, RHD, RHPN2, ROCK1, SAA1, SDHA, SDHC, SFTPA1, SFTPA2, SIGLEC14, SLC6A8, SMG1, SPATA31C1, SPTLC1, SRGAP2, SSX7, STAT5B, STK19, STRC, SULT1A1, SUZ12, TBX20, TCEB3C, TLR1, TLR6, TMEM231, TNXB, TRIOBP, TRPA1, TTN, TUBA1A, TUBB2B, UGT1A5, UGT2B15, UGT2B17, UNC93B1, VCY, VWF, WDR72, ZNF419, ZNF592, ZNF674 [75%, 100%) ANAPC1, C4A, C4B, CHRNA7, CR1, DUX4, FCGR2B, HYDIN, OTOA, PDPK1, TMLHE [50%, 75%) ADAMTSL2, CDY2A, DAZ1, GTF2I, NAIP, OCLN, RPS17 [25%, 50%) DAZ2, DAZ3, KIR3DL1, OPN1MW, PPIP5K1 (0%, 25%) NCF1, RBMY1A1 0% BPY2, CCL3L1, CCL4L1, CDY1, CFC1, CFC1B, GTF2IRD2, HSFY1, MRC1, OR4F5, PRY, PRY2, SMN1, SMN2, TSPY1, XKRY 16 2 5 7 11 152 Genes
  6. GOOGLE DEEPVARIANT IS A HIGHLY ACCURATE SMALL VARIANT CALLER FOR

    HIFI READS Poplin, R. E. et al. A universal SNP and small-indel variant caller using deep neural networks. Nat Biotechnol 25, 1097 (2018). -Variant calling pipeline powered by deep neural network -Fast and inexpensive -Run from binaries as well as Docker or Singularity containers -PacBio model trained on HiFi reads from Sequel and Sequel II Systems with median read quality >99.9% -Model is updated regularly to support PacBio Chemistry and Software updates
  7. RUN DEEPVARIANT EASILY WITH DOCKER OR SINGULARITY Example suitable for

    amplicon analysis. singularity exec \ docker://google/deepvariant:1.0.0 \ /opt/deepvariant/bin/run_deepvariant \ --model_type PACBIO \ --ref ./reference.fasta \ --reads ./aligned.ccs.bam \ --output_vcf ./output.vcf.gz \ --num_shards $(nproc)
  8. PRECISION & RECALL Variant calls Benchmark (“truth”) variants Precision percentage

    of calls that are correct = TP/(TP+FP) Recall percentage of truth that is called = TP/(TP+FN) Metric Abbreviation Benchmark Variant calls True Positive TP ✓ ✓ False Positive FP - ✓ False Negative FN ✓ - Benchmark Variant Calls TP FN FP
  9. PRECISIONFDA TRUTH CHALLENGE V2 https://precision.fda.gov/challenges/10/view/results HG002 HG003 HG004 35× Illumina

    NovaSeq ✓ ✓ ✓ 35× HiFi, PacBio Sequel II System ✓ ✓ ✓ 60× ONT PromethION ✓ ✓ ✓ V4 Benchmark ✓ Blinded Blinded
  10. 99.9 90 99 97 99.7 Accuracy, F1(%) precisionFDA Entries Top

    12 entries and 25 of top 26 use PacBio HiFi reads Illumina Multi HiFi ONT HiFi DeepVariant Illumina DeepVariant Illumina GATK ONT DeepVariant
  11. HIFI READS PROVIDE A COMPREHENSIVE VIEW OF VARIATION IN THE

    HUMAN GENOME 5 Mb 3 Mb 10 Mb 1 bp SNVs ≥50 bp structural variants (SVs) 1-49 bp indels HiFi reads Short reads vs GRCh38 Long indels and SVs genome-wide
  12. HIFI READS SPAN STRUCTURAL VARIANTS 1,733 1,733 bp deletion deletion

    not detected 1,733 1,733 1,733 1,733 1,733 1,733 1,733 1,733 1,733 1,733 1,733 1,733 1,733 Haplotype 1 Haplotype 2 HiFi reads Short reads Repeats
  13. CALL STRUCTURAL VARIANTS FROM HIFI READS WITH PBSV HiFi reads

    pbmm2 pbsv discover pbsv call variant calls (vcf) SMRT Link Structural Variant Calling SMRT Link Mapping OR
  14. HIFI PBSV PERFORMANCE AGAINST BENCHMARK 40% 50% 60% 70% 80%

    90% 100% 0 5 10 15 20 25 30 Value Fold coverage Structural variants with pbsv Precision (HiFi) Recall (HiFi)
  15. VARIANT DETECTION BENCHMARKING (HG002) Recall | Precision (%) HiFi Coverage

    SNVs Indels SVs 15-fold 99.53 | 99.89 95.16 | 96.23 97.41 | 94.48 30-fold 99.89 | 99.95 98.90 | 98.99 98.00 | 95.29 SNV and indel calls are from DeepVariant 1.0.0 and evaluated against the GIAB v4.2 small variant benchmark using Hap.py. SV calls are from pbsv 2.2.2 and evaluated against the GIAB v0.6 SV benchmark using Truvari.
  16. PATHOGENIC VARIANTS DETECTED WITH HIFI READS Hiatt SM, Lawlor JMJ,

    et al. (2020). Long-read sequencing for the diagnosis of neurodevelopmental disorders. bioRxiv, doi:10.1101/2020.07.02.185447 Figure 1. Proband 6 has a de novo insertion resulting in duplication of exon 3 of CDKL5
  17. COMPREHENSIVE VARIANT DETECTION WITH HIFI READS -HiFi = mappability of

    long reads + base quality of short reads -HiFi + DeepVariant yield most accurate small variant calls currently available with a single technology. -HiFi + pbsv yield highly accurate structural variant calls, including inversions, translocations, and copy number variants. -Recommend 15-fold coverage for most discovery applications. Datasets for the Ashkenazi trio (15 kb and 20 kb libraries) are deposited on SRA: HG002 (PRJNA586863) HG003 (PRJNA626365) HG004 (PRJNA626366)
  18. For Research Use Only. Not for use in diagnostic procedures.

    © Copyright 2020 by Pacific Biosciences of California, Inc. All rights reserved. Pacific Biosciences, the Pacific Biosciences logo, PacBio, SMRT, SMRTbell, Iso-Seq, and Sequel are trademarks of Pacific Biosciences. Pacific Biosciences does not sell a kit for carrying out the overall No-Amp Targeted Sequencing method. Use of these No-Amp methods may require rights to third-party owned intellectual property. FEMTO Pulse and Fragment Analyzer are trademarks of Agilent Technologies Inc. All other trademarks are the sole property of their respective owners. www.pacb.com