Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The . gov means it’s official. Federal government websites often end in . gov or VSports app下载. mil. Before sharing sensitive information, make sure you’re on a federal government site. .

Https

The site is secure V体育官网. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely. .

Review
. 2006 Nov 29;361(1475):1929-40.
doi: 10.1098/rstb.2006.1920.

"V体育ios版" The bacterial species definition in the genomic era

Affiliations
Review

The bacterial species definition in the genomic era

"VSports手机版" Konstantinos T Konstantinidis et al. Philos Trans R Soc Lond B Biol Sci. .

Abstract (VSports在线直播)

The bacterial species definition, despite its eminent practical significance for identification, diagnosis, quarantine and diversity surveys, remains a very difficult issue to advance. Genomics now offers novel insights into intra-species diversity and the potential for emergence of a more soundly based system. Although we share the excitement, we argue that it is premature for a universal change to the definition because current knowledge is based on too few phylogenetic groups and too few samples of natural populations. Our analysis of five important bacterial groups suggests, however, that more stringent standards for species may be justifiable when a solid understanding of gene content and ecological distinctiveness becomes available VSports手机版. Our analysis also reveals what is actually encompassed in a species according to the current standards, in terms of whole-genome sequence and gene-content diversity, and shows that this does not correspond to coherent clusters for the environmental Burkholderia and Shewanella genera examined. In contrast, the obligatory pathogens, which have a very restricted ecological niche, do exhibit clusters. Therefore, the idea of biologically meaningful clusters of diversity that applies to most eukaryotes may not be universally applicable in the microbial world, or if such clusters exist, they may be found at different levels of distinction. .

PubMed Disclaimer

"VSports注册入口" Figures

Figure 1
Figure 1
Genetic diversity within five important bacterial groups. Every data point represents a whole-genome comparison between two genomes and shows the total DNA sequence that differs between the two genomes (y-axes) plotted against the evolutionary distance between the genomes, measured as ANI (x-axes). (a) shows data for all pairs of genomes considered, whereas (b) shows only pairs that share at least 94% ANI. (c) shows the DNA sequence that differs as a percentage of the whole genome of one of the genomes in the pair. Pairs of genomes of the same group are denoted by the same colour. See graph key for group annotation by colour as well as for the average genome size within each group. ANI values were calculated as described previously (Konstantinidis & Tiedje 2005). The grey bars represent approximately 95% ANI that corresponds to the 70% DDH standard for species. Note the formation of ANI-based genetic clusters within the Streptococcaceae, Staphylococcaceae and Enterobacteriaceae groups (denoted by dashed circles), and the lack of clusters (i.e. genetic continuum) within the Burkholderiaceae and Shewanellaceae groups for the same range of ANI values. The clusters within the Streptococcaceae and Staphylococcaceae groups encompass genomes that show small gene-content differences (e.g. S. aureus versus S. aureus circle), and are diagnosable by specific gene-content signatures, e.g. the S. aureus genomes share a total of approximately 1 Mb DNA sequence that is not conserved in the genomes of their closest sequenced relatives, S. agalactiae or S. epidermidis. In contrast, gene-content signatures are less clear in the Enterobacteriaceae group, e.g. the DNA shared by all E. coli genomes, and not found in Salmonella spp. genomes, is less than 200 kb. The clustering seen for the Shewanellaceae group in the range of approximately 70–75% ANI is attributable to a biased selection of the genomes sequenced to represent the most distantly related lineages of the group (i.e. there are seven different species represented by these data points). Sampling within several of these lineages reveals a continuum of genetic diversity for the Shewanellaceae group (represented by data points between 80 and 97% ANI).
Figure 2
Figure 2
Gene-content differences among Enterobacteriaceae genomes. Bars represent the average gene-content differences between pairs of Enterobacteriaceae genomes that show a given range of ANI values (x-axis). Black bars represent the total number of genes, on average, that differ (primary y-axis), while grey bars represent what fraction of the former genes is hypothetical, phage, transposase or plasmid genes (secondary y-axis). Error bars represent deviation from the mean, and not standard deviation. Only pairs of genomes (four pairs in total) with considerable gene-content differences were included in the analysis for the higher than 99% ANI category (i.e. pairs of genomes with higher than 99% ANI but very small gene-content differences were not included), while a random selection of six pairs of genomes was included for the remaining three categories of ANI values.
Figure 3
Figure 3
Ecological and genetic diversity within bacterial species. Bacterial genomes of the same species according to the 70% DDH criterion, which corresponds to 95% ANI, may show a very broad range of gene-content (and probably gene expression) differences between the two thick, curved, dashed lines. Genomes that, presumably, share a more overlapping ecological niche tend to show smaller gene-content differences than do genomes that are ecologically more differentiated. The former genomes also tend to be smaller in genome size. Accordingly, genomes that show at least 98–99% ANI, or are less identical at the nucleotide level but share a closely overlapping niche, tend to show small gene-content differences.
Figure 4
Figure 4
(a) Genetic diversity within the 20 sequenced E. coliShigella spp. genomes. Bars show the total DNA sequence (y-axis) that remains conserved in all genomes (black bars), in all but one (dark grey), and in all but two (light grey), with the inclusion of more genomes in the analysis (x-axis). (b) The total unique DNA sequence in all genomes, (c) as well as the amount of DNA sequence that remains specific to one genome as the number of sequenced genomes increases is also shown. Therefore, (a) represents the total conserved core sequence for the group, (b) shows the total gene diversity for the group and (c) shows the amount of strain-specific (novel) DNA with increased coverage of the group with genomic sequences. Bars represent the mean of 20 different random combinations of genomes and error bars represent one standard deviation from the mean. The bars are therefore representative of the ‘average’ strain for the group, which has a genome size of approximately 5.1 Mb and shows maximum 97–98% ANI to any other strain of the group. The relatively large error bars in (c) are attributable to strains with either a very small number of novel genes (because there is another very identical (e.g. higher than 99% ANI) genome in the dataset, such as for genomes of the E. coli O157 and S. flexneri lineages), or a large number of novel genes (usually for genomes showing lower than 97% ANI to all other genomes in the dataset). The analysis was based on reciprocally best-matched conserved 1 kb fragments, using the Blast algorithm (Altschul et al. 1997), and a minimum cut-off for calling a fragment conserved of 50% identity (nucleotide level) over an alignable region of at least 700 out of the 1000 bases of the fragment. Comparable trends were observed when the analysis was restricted to complete genomes and/or coding sequences only. Therefore, the impact of the draft status of several of the genomic sequences used in the analysis on the trends shown is expected to be minor.
Figure 5
Figure 5
Whole-genome-based evaluation of the MLST method for the Burkholderiaceae group. The whole-genome tree was constructed based on the concatenated alignment of 2589 genes that are shared in all seven Burkholderia genomes, using the dnaML version of the Phylip package (Felsenstein 2004). The MSLT tree is based on the concatenated alignments of the seven full-length genes that are frequently used in MLST applications in the Burkholderia group (i.e. recA, gyrB, lepA, phaC, trpB, gltB and atpD; Baldwin et al. 2005) and was built with the same method as the whole-genome tree.

References

    1. Alland D, et al. Modeling bacterial evolution with comparative-genome-based marker systems: application to Mycobacterium tuberculosis evolution and pathogenesis. J. Bacteriol. 2003;185:3392–3399. doi:10.1128/JB.185.11.3392-3399.2003 - DOI - PMC - PubMed
    1. Altschul S.F, Madden T.L, Schaffer A.A, Zhang J, Zhang Z, Miller W, Lipman D.J. Gapped Blast and PSI-Blast: a new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402. doi:10.1093/nar/25.17.3389 - DOI - PMC - PubMed
    1. Baldwin A, et al. Multilocus sequence typing scheme that provides both species and strain differentiation for the Burkholderia cepacia complex. J. Clin. Microbiol. 2005;43:4665–4673. doi:10.1128/JCM.43.9.4665-4673.2005 - DOI - PMC - PubMed
    1. Boyd E.F, Brussow H. Common themes among bacteriophage-encoded virulence factors and diversity among the bacteriophages involved. Trends Microbiol. 2002;10:521–529. doi:10.1016/S0966-842X(02)02459-9 - DOI - PubMed
    1. Brenner D, Staley J, Krieg N. Classification of prokaryotic organisms and the concept of Bacterial speciation. Springer; New York, NY: 2000. Bergey's manual of systematic bacteriology.

Publication types

VSports注册入口 - LinkOut - more resources