Go directly to analytical routines Download data tables
Instructions for using the analytical routines in the taxonomy browser

 

Description of statistical methodology:

Principal components analysis - The global PCA plots present a view of the two prokaryotic domains and are generated from the complete data set using all 223 benchmarks. To maintain a consistent perspective, taxa are selected at the phylum or class level and overlayed back onto the base plot. The identity of the highlighted points can be viewed by placing the cursor directly over those of interest. In all cases, global plots are for the first two principal components, which account for > 85% of the total variance within the data set.

 

The analysis - One of the major problems plaguing the use of 16S rDNA for deterministic purposes is the lack of a carefully vetted set of sequences, in which the taxonomic annotation was carefully reviewed and updated. Our analyses began with a set of 6635 sequences (> 1399 nts, < 4% ambiguities) that had been reported as coming from type strains or from strains of validly named species. These are identified as the "unresolved" set as there remained a number of taxonomic and nomenclatural errors within this data set. The "resolved" set is a subset of 6377 sequences for which we could confirm identity and taxonomic placement. Within this subset remain some likely placement errors that are indicative of misnamed species. These are predominantly within the phyla Firmicutes and Actinobacteria.

 

 

 

 

 

© 2003-2005 Michigan State University Board of Trustees, All Rights Reserved