Go directly to analytical routines Download data tables
Instructions for using the analytical routines in the taxonomy browser

Vetted sequences

Outline Release 4.0 - This dataset consists of annotation information associated with 5519 high quality rDNA sequences that were used in creating the prokaryotic taxonomy described by Garrity, Lilburn and Bell in The Revised Road Map to the Manual appearing in Volume 2 of the Bergey's Manual of Systemtatic Bacteriology, 2nd. Edition (Springer). The resulting taxonomy is also detailed in The Outline of Procaryotic Taxa, Release 4.0.

Outline Release 5.1 - this dataset extends the number of sequences to 7673 and includes validly named Archaea and Bacteria, and a number of sequences from yet to be cultivated taxa and the environment that likely represent novel lineages. (Springer, DOI:10.1007/bergeysoutline).

Data structure

  • rdp.id - The "old" RDP identifier used in Release 8.0 and earlier. These are short, unique identifiers that were keyed to the name of the organism at the time of assignment. However, these may not agree with the current name if the organism from which it was derived was the subject of a recent change in taxonomy and nomenclature.
  • genbank - The GenBank/EMBL/DDBJ accession number.
  • RDP.new - The new RDP accession number (Release 9.0 and later) for the aligned 16S rDNA sequence. Archaeal sequences are not yet included in the autoaligned dataset currently used by the RDP-II.
  • species through domain - These are the names that are currently assigned to the sequence in Releases 4.0 and 5.1 and reflect the most recent changes (eg. resolution of synonymies).
  • seq - The sequence of appearance of a given species in the "flattened" taxonomy, in which each sequence is assigned a precise position. This value changes with each release.
  • taxon.seq - The sequence of appearance of a given taxon in the hierarchical classification. This value changes with each release.
  • status - This indicates whether a sequence maps to a type strain, a validy named strain, an invalidly named strain, a unnamed strain or clone, or a benchmark. Benchmark sequences are those that are used in the generation of our PCA models and serve as well-established reference points.

Release 4.0 vetted sequences
Release 4.0 taxonomic hierarchy
Release 5,1 vetted sequences
Release 5.1 taxonomic hierarchy

 

The SOSCC algorithm

These links provide access to the datasets, pseudocode, source code and other supplementary materials described in Self-organizing and Self-correcting biological classifications by G.M. Garrity and T.G. Lilburn (submitted to Bioinformatics).

SOSCC pseudocode
SOSCC script (Require S-Plus)
SOCC function source code (Requires S-Plus)
Gammaproteobacteria distance matrix

 


 

 

 

 

 

 

 

 

© 2003-2005 Michigan State University Board of Trustees, All Rights Reserved