Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The . gov means it’s official. Federal government websites often end in . gov or . mil. Before sharing sensitive information, make sure you’re on a federal government site VSports app下载. .

Https

The site is secure V体育官网. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely. .

. 2017 Dec;11(12):2864-2868.
doi: 10.1038/ismej.2017.126. Epub 2017 Jul 25.

"V体育2025版" dRep: a tool for fast and accurate genomic comparisons that enables improved genome recovery from metagenomes through de-replication

Affiliations

dRep: a tool for fast and accurate genomic comparisons that enables improved genome recovery from metagenomes through de-replication

Matthew R Olm et al. ISME J. 2017 Dec.

Abstract (VSports)

The number of microbial genomes sequenced each year is expanding rapidly, in part due to genome-resolved metagenomic studies that routinely recover hundreds of draft-quality genomes. Rapid algorithms have been developed to comprehensively compare large genome sets, but they are not accurate with draft-quality genomes. Here we present dRep, a program that reduces the computational time for pairwise genome comparisons by sequentially applying a fast, inaccurate estimation of genome distance, and a slow, accurate measure of average nucleotide identity VSports手机版. dRep achieves a 28 × increase in speed with perfect recall and precision when benchmarked against previously developed algorithms. We demonstrate the use of dRep for genome recovery from time-series datasets. Each metagenome was assembled separately, and dRep was used to identify groups of essentially identical genomes and select the best genome from each replicate set. This resulted in recovery of significantly more and higher-quality genomes compared to the set recovered using co-assembly. .

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
Assembly and de-replication with dRep results in more and higher-quality genome bins as compared to co-assembly. (a) A complete Escherichia coli genome was subset 10 times in increments of 10% (10%, 20%, 30% etc.). Subsets were compared to each other in a pairwise manner (100 total comparisons) using three algorithms- ANIm, MASH and gANI. For each pair of subsets, the alignment coverage between the two genomes as determined by MUMmer is shown on the x axis (aligned length / average genome length), and the ANI reported from each algorithm is shown on the y axis. ANIm and gANI are accurate when genomes are incomplete, but MASH is only accurate when genomes are essentially complete. (b) Using previously reported algorithm runtimes, we estimated the time required to de-replicate genome sets of various sizes. gANI exhibits a sharp exponential climb, limiting its use on larger genome sets; MASH and dRep do not. (c) De-replication of bins from individual assemblies and co-assembly (dRep assembly method) resulted in more bins (⩾75% complete, ⩽5% contaminated) than co-assembly alone. (d and e) Examples of genome relatedness figures generated by dRep. The red dotted line is the value of the lowest ANI resulting from a self-vs-self alignment of each genome in the cluster.
Figure 2
Figure 2
Strain heterogeneity reduces genome assembly quality and causes fragmentation in areas of extensive population-level variation. (a) Compared to individual assembly, co-assembly resulted in many duplicate and triplicate single copy genes. (bd) The Staphylococcus aureus bin obtained from co-assembly is more fragmented than that from an individual assembly. (b) Scaffolds from both bins are aligned to a complete reference genome (2.7 Mbp). (c) Scaffolds from the co-assembly are aligned to a single scaffold (shown in gray in b) from the individual assembly. (d) Reads from all samples aligned to a gap in the alignment in c. Reads mapped to the area where co-assembly failed to recover a genome sequence (highlighted in blue) show signs of population-level strain variation. Gray boxes represent reads, and colored lines represent discrepancies between reads and reference sequence.

References (V体育ios版)

    1. Bendall ML, Stevens SL, Chan L-K, Malfatti S, Schwientek P, Tremblay J et al. (2016). Genome-wide selective sweeps and gene-specific sweeps in natural bacterial populations. ISME J 10: 1589–1601. - PMC - PubMed
    1. Eren AM, Esen ÖC, Quince C, Vineis JH, Morrison HG, Sogin ML et al. (2015). Anvi’o: an advanced analysis and visualization platform for ‘omics data. Peer J 3: e1319. - "V体育2025版" PMC - PubMed
    1. Gibson MK, Wang B, Ahmadi S, Burnham C-AD, Tarr PI, Warner BB et al. (2016). Developmental dynamics of the preterm infant gut microbiota and antibiotic resistome. Nat Microbiol 1: 16024. - PMC - PubMed
    1. Jones E, Oliphant T, Peterson P. (2001). SciPy: Open source scientific tools for Python. Available at: httpscipy.org.
    1. Lee STM, Kahn SA, Delmont TO, Shaiber A, Esen ÖC, Hubert NA et al. (2017). Tracking microbial colonization in fecal microbiota transplantation experiments via genome-resolved metagenomics. Microbiome 5: 50. - PMC - PubMed