Outline Release
4.0 - This dataset consists of annotation information associated
with 5519 high quality rDNA sequences that were used in creating
the prokaryotic taxonomy described by Garrity, Lilburn and Bell in The
Revised Road Map to the Manual appearing in Volume 2 of the Bergey's
Manual of Systemtatic Bacteriology, 2nd. Edition (Springer).
The resulting taxonomy is also detailed in The Outline of Procaryotic
Taxa, Release 4.0.
Outline Release 5.1 - this dataset extends the number
of sequences to 7673 and includes validly named Archaea and Bacteria,
and a number of sequences from yet to be cultivated taxa and the environment
that likely represent novel lineages. (Springer, DOI:10.1007/bergeysoutline).
Data structure
- rdp.id - The "old" RDP identifier used
in Release 8.0 and earlier. These are short, unique identifiers that
were keyed to the name of the organism at the time of assignment.
However, these may not agree with the current name if the organism
from which it was derived was the subject of a recent change in taxonomy
and nomenclature.
- genbank - The GenBank/EMBL/DDBJ accession number.
- RDP.new - The new RDP accession number (Release
9.0 and later) for the aligned 16S rDNA sequence. Archaeal sequences
are not yet included in the autoaligned dataset currently used by
the RDP-II.
- species through domain - These are the names that
are currently assigned to the sequence in Releases 4.0 and 5.1 and
reflect the most recent changes (eg. resolution of synonymies).
- seq - The sequence of appearance of a given species
in the "flattened" taxonomy, in which each sequence is
assigned a precise position. This value changes with each release.
- taxon.seq - The sequence of appearance of a given
taxon in the hierarchical classification. This value changes with
each release.
- status - This indicates whether a sequence maps
to a type strain, a validy named strain, an invalidly named strain,
a unnamed strain or clone, or a benchmark. Benchmark sequences are
those that are used in the generation of our PCA models and serve
as well-established reference points.
Release 4.0 vetted
sequences
Release 4.0 taxonomic hierarchy
Release 5,1 vetted sequences
Release 5.1 taxonomic hierarchy
The SOSCC algorithm
These links provide access to the datasets, pseudocode,
source code and other supplementary materials described in Self-organizing
and Self-correcting biological classifications by G.M.
Garrity and T.G. Lilburn (submitted to Bioinformatics).
SOSCC pseudocode
SOSCC script (Require S-Plus)
SOCC function source code (Requires S-Plus)
Gammaproteobacteria distance matrix