al. Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nat Biotechnol 74, 5463 (2019). Short reads PacBio HiFi STRC STRC is a congenital deafness gene that requires long reads to cover all exons.
A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet 43, 491–498 (2011). Wenger, A. M. et al. Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nat Biotechnol 74, 5463 (2019). HiFi reads pbmm2 HaplotypeCaller VariantFiltration variant calls (vcf) GATK4 SMRT Link Mapping
A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet 43, 491–498 (2011). Wenger, A. M. et al. Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nat Biotechnol 74, 5463 (2019). HiFi reads pbmm2 HaplotypeCaller VariantFiltration variant calls (vcf) GATK4 SMRT Link Mapping -High SNP Recall and Precision -Lower Indel Recall and Precision, due to 1bp indel errors
A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet 43, 491–498 (2011). Wenger, A. M. et al. Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nat Biotechnol 74, 5463 (2019). HiFi reads pbmm2 HaplotypeCaller VariantFiltration variant calls (vcf) GATK4 SMRT Link Mapping -High SNP Recall and Precision -Lower Indel Recall and Precision, due to 1bp indel errors -HaplotypeCaller optimized for error mode of short reads Indel Mismatch 96.6% PacBio HiFi 99.1% Short reads
A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet 43, 491–498 (2011). Wenger, A. M. et al. Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nat Biotechnol 74, 5463 (2019). HiFi reads pbmm2 HaplotypeCaller VariantFiltration variant calls (vcf) GATK4 SMRT Link Mapping -High SNP Recall and Precision -Lower Indel Recall and Precision, due to 1bp indel errors -HaplotypeCaller optimized for error mode of short reads -We recommend using a caller that can adapt to the error mode of long reads, such as DeepVariant (see Pi-Chuan Chang’s lightning talk)
on HiFi reads: -75% of putative FN and 95% of putative FP are clearly errors in the GATK callset -Suggestions for improving the benchmark: -Exclude regions with SNV disagreements between long/linked read datasets or odd SNV frequencies (2:1, 3:1) in long/linked read datasets -Require support from long reads for indels in repetitive regions with low short read coverage