(haploid) assembly phasing ? De novo: From scratch, without looking at the original picture (reference) Sequenced reads sequencing assembling Pseudo-haplotype + alts
metabolizing >50% of available drugs • Genetic variation and copy number affects drug efficacy CYP2D6*10: Intermediate ~ poor metabolizer CYP2D6*2: Extensive metabolizer Seo, Rhie, Kim, and Lee et al., De novo assembly and phasing of a Korean human genome, Nature (2016) Chr. 22
(haploid) assembly phasing ? De novo: From scratch, without looking at the original picture (reference) Sequenced reads sequencing assembling Complete haplotypes
De novo: From scratch, without looking at the original picture (reference) Phased reads sequencing assembling Phased reads Maternal assembly assembling
diverse, high-quality haplotypes with trio binning • Illumina WGS for the parents, PacBio and Nanopore for the child • Pilot 10 trios selected to maximize non-ref haplotype AF 2 PUR 1 KHV 3 ACB 1 MSL 1 PJL 1 GWD 1 CLM 5 African 3 American 1 East Asian 1 South Asian
current best practice All levels of assembly quality improved Complete haplotypes will become the new norm • A human pan-genome reference A collection of diverse, high-quality haplotypes Including complex heterozygous SVs Summary
phased diploid genomes, BioRxiv (2018) FALCON-Phase Trio-binning FALCON-Phase as an alternative? • Investigating ways to improve for less het. genomes HG002 (0.17) Angus x Brahman (0.93) bTaeGut2 (1.2)
Department of Inland Fisheries and Wildlife, left, and UMass lynx team coordinator, Tanya Lama, with an adult male lynx from northern Maine whose DNA was used to create first-ever whole genome for the species. The lynx has since been released to the wild. (MassWildlife photo / Bill Byrne)
Walenz • Alexander Dilthey • Brian Ondov • Jay Ghurye Korean (AK1) Jeong-Sun Seo Changhoon Kim Junsoo Kim Sangjin Lee Tim Smith John Williams Cattle/pigs Pan-Genome Karen Miga Benedict Paten NIH NHGRI NISC VGP Assembly Working Group Erich Jarvis Richard Durbin Gene Myers Kerstin Howe Harris Lewin Olivier Fedrigo Shane McCarthy Martin Pippel Will Chow Joana Damas PacBio CCS Michael Hunkapiller Paul Peluso David Rank Trio binning is available in https://github.com/marbl/canu
assembly of haplotype-resolved genomes with trio binning, Nat. Biotech (2018) Primary = Longest path in the graph (pseudo-hap) Alternate haplotigs = Alternate path in the bubble Haplotigs = Contigs in each assembly agree with parental haplotypes (Phased) TrioCanu FALCON-unzip Angus specific k-mer counts Angus specific k-mer counts Brahman specific k-mer counts Brahman specific k-mer counts