Go directly to analytical routines Download data tables
Instructions for using the analytical routines in the taxonomy browser

Description of EDA plot types

Global PCA plots - The global PCA plots present a view of the two prokaryotic domains and are generated from the complete data set using all 223 benchmarks. To maintain a consistent perspective, taxa are selected at the phylum or class level and overlayed back onto the base plot. The identity of the highlighted points can be viewed by placing the cursor directly over those of interest. In all cases, global plots are for the first two principal components, which account for > 85% of the total variance within the data set.

Phylum- and class-level PCA plots - In some instances, adequate resolution of species is not possible within the global PCA plots. Reasons for this include overlap of point within a given 2-D space, point occlusion, or insufficient variability within the 16S rDNA to provide meaningful separation using the global coordinate system. In such cases, it is useful to recompute the principal components of a subset of sequences and benchmarks, selected based on taxonomic affilitation. Sepa ration and visualization of subgroups is enhanced by "rotating" the plots, which is accomplished by using various two-way combinations of the first, second and third principal components.

Screeplots - A screeplot plots the eigenvalues against their indices, and breaks visually into a steady downward slope and a gradual tailing away, analogous to a mountain and the scree that makes up the alluvial fan at the base of a mountain. The breakpoint in the downward slope in the plot indicates the break between the "important" and less important principal components which make up the scree.

Dynamic heatmaps - Non-optimized, dynamic heatmaps of subsets of the data used in PCA analysis were generated to help explain positioning of individual points in some plots. Note that the scale and coloration changes from on heatmap to the next.

The analysis - One of the major problems plaguing the use of 16S rDNA for deterministic purposes is the lack of a carefully vetted set of sequences, in which the taxonomic annotation was carefully reviewed and updated. Our analyses began with a set of 6635 sequences (> 1399 nts, < 4% ambiguities) that had been reported as coming from type strains or from strains of validly named species. These are identified as the "unresolved" set as there remained a number of taxonomic and nomenclatural errors within this data set. The "resolved" set is a subset of 6377 sequences for which we could confirm identity and taxonomic placement. Within this subset remain some likely placement errors that are indicative of misnamed species. These are predominantly within the phyla Firmicutes and Actinobacteria.

Page 1|2|3|4

 

 

© 2003-2005 Michigan State University Board of Trustees, All Rights Reserved