In Silico Genome-Genome Hybridization to Classify and Identify Prokaryotes

Abstract

While the time-honored undertaking of identifying and classifying Bacteria and Archaea (prokaryotes) into natural units is a daunting task, the utility of defining these organisms in a standard way for the biomedical and biotechnology communities is a necessity. Identities are what biological researchers use to link, store, and share various kinds of biological meta-data for specific organisms and allow predictions to be made regarding traits of close relatives. There are no rules, methods, or publicly available knowledge bases for officially classifying and identifying existing or novel prokaryotes. Currently, many microbiologists place stake in the polyphasic approach, whereby two prokaryotes are given the same identity if they share a certain number of phenotypic and genotypic traits; if no suitable pairing can be made, the organism may be considered novel. Pragmatically defining something as complicated as an organism is labor intensive, time consuming, often wasteful, and potentially subjective. Therefore, we try to convince you that modern whole genome sequences and computational tools combined with decades old methodology and species concepts are all that is necessary and sufficient to classify prokaryotes into biologically meaningful groups useful for identifying known and novel organisms. We show that there exists natural cohesive forces which cluster bacterial genomes by similarity as opposed to a continous gradient of relatedness across prokaryotic domains. This method will be useful for accurately idenifying hard-to-cultivate prokaryotes including pathogens when genome sequencing becomes routine in hospital settings.