M., et al. (2019). Accurate circular consensus sequencing improves variant detection and assembly of a human genome. Nature Biotechnology. Article Metrics: Altmetric score* * Article is in the 98th percentile of the 254,341 tracked articles of a similar age in all journals. Published: 12 August 2019
Mb 1 bp SNVs ≥50 bp structural variants 1-49 bp indels PacBio HiFi reads Short reads vs GRCh38 Short reads miss ~80% of SVs, typically long insertion events or variants in difficult-to- map repetitive regions. This is not improved by increasing the coverage.
Mb 1 bp SNVs ≥50 bp structural variants 1-49 bp indels Short reads PacBio high accuracy long reads improve mappability and increase variant detection in these regions Small variants missed in difficult-to-map regions of the human genome vs GRCh38 PacBio HiFi reads
et al. (2019). Accurate circular consensus sequencing improves variant detection and assembly of a human genome. Nature Biotechnology. List originally from Mandelker, D. et al. Navigating highly homologous genes in a molecular diagnostic setting: a resource for clinical next-generation sequencing. Genet Med.
with pbmm2 and call with pbsv from command line -SNVs and small indels -map with pbmm2 -Google DeepVariant -Optional phasing with WhatsHap RECOMMENDED VARIANT DETECTION WORKFLOWS
-Calls variants and assigns genotypes -Recent updates: -improved sensitivity for large insertions and deletions -call duplications and copy number variation -simplified parameters with --hifi preset -report variants seen in a single read with at least 10% read support. -equivalent to “-A 1 -O 1 -S 0 -P 10”
inexpensive -Run from binaries as well as Docker or Singularity images -PacBio model trained on HiFi reads from Sequel and Sequel II Systems with median read quality >99.9% -Model is updated regularly to support PacBio Chemistry and Software updates GOOGLE DEEPVARIANT Poplin, R. E. et al. A universal SNP and small-indel variant caller using deep neural networks. Nat Biotechnol 25, 1097 (2018).
PACBIO \ --ref ./reference.fasta \ --reads ./aligned.ccs.bam \ --output_vcf ./output.vcf.gz \ --num_shards $(nproc) RUN DEEPVARIANT EASILY WITH DOCKER OR SINGULARITY Example suitable for amplicon analysis.
authoritative characterization of benchmark human genomes https://www.nist.gov/programs-projects/genome-bottle HG002 HG003 HG004 doi:10.1101/664623 Benchmark (or "High-confidence") variant calls and regions • Structural variants: Currently available for HG002 on GRCh37 • Small variants in more difficult regions: Currently available for HG002 on GRCh37 and GRCh38
SNVs Indels SVs 15-fold 99.44 | 99.69 95.41 | 96.57 97.41 | 94.48 30-fold 99.97 | 99.87 98.78 | 98.90 98.00 | 95.29 SNV and indel calls are from DeepVariant 0.10.0 and evaluated against the GIAB v3.3.2 small variant benchmark using Hap.py. SV calls are from pbsv 2.2.2 and evaluated against the GIAB v0.6 SV benchmark using Truvari.
for 7 GIAB samples are being used to improve SV and small variant benchmarks. -Upcoming small variant benchmark release v4.1 for HG002 will add: -~6% reference bases -~300,000 SNVs -~50,000 indels -Benchmark updates for other samples will follow. -HiFi datasets are included in the precisionFDA Truth V2 Challenge, which focuses on difficult-to-map regions. HG002 HG003 HG004
long reads + base quality of short reads -Structural variants: SMRT Link or pbmm2 + pbsv -Added support for duplications and copy number variations -Small variants: DeepVariant -Added support for amplified fragments -Recommend 15-fold coverage for most discovery applications. Datasets for the Ashkenazi trio (15 kb and 20 kb libraries) are deposited on SRA: HG002 (PRJNA586863) HG003 (PRJNA626365) HG004 (PRJNA626366)
Kolesnikov Maria Nattestad Aaron Wenger Justin Zook, Justin Wagner, and the Genome in a Bottle Consortium ACKNOWLEDGMENTS Structural variant detection Armin Töpfer Aaron Wenger Justin Zook, Nate Olson, and the Genome in a Bottle Consortium