Description of statistical
methodology:
Principal components analysis - The global
PCA plots present a view of the two prokaryotic domains and are generated
from the complete data set using all 223 benchmarks. To maintain a consistent
perspective, taxa are selected at the phylum or class level and overlayed
back onto the base plot. The identity of the highlighted points can be
viewed by placing the cursor directly over those of interest. In all
cases, global plots are for the first two principal components, which
account for > 85% of the total variance within the data set.
The analysis - One of the major problems plaguing
the use of 16S rDNA for deterministic purposes is the lack of a carefully
vetted set of sequences, in which the taxonomic annotation was carefully
reviewed and updated. Our analyses began with a set of 6635 sequences
(> 1399 nts, < 4% ambiguities) that had been reported as coming
from type strains or from strains of validly named species. These are
identified as the "unresolved" set as there remained a number
of taxonomic and nomenclatural errors within this data set. The "resolved" set
is a subset of 6377 sequences for which we could confirm identity and
taxonomic placement. Within this subset remain some likely placement
errors that are indicative of misnamed species. These are predominantly
within the phyla Firmicutes and Actinobacteria.
|