[New England Complex Systems Institute]
[Home] [Research] [Education] [Activities & Events] [Community] [News] [The Complex World] [About Complex Systems] [About NECSI]
NECSI Research Projects


Supporting Text

Influence clustering

To order the columns of Fig. 2, we first calculated the effective influences of a particular gene group on all other gene groups, across all time steps. These values were concatenated into 12 vectors with 6 × 12 = 72 values. The set of these 12 output vectors was clustered with correlation coefficient as a distance metric. This method generates a tree in which the most highly correlated vectors are placed on adjacent branches. Once two vectors were joined adjacently, that subtree was represented in further distance calculations by a vector that is the numerical average of its two constituent vectors. The order of the columns is the cluster order of the generated tree. Similarly, rows were ordered by making a 72-entry vector of the effects of all 12 genes on a particular gene across all time steps.


To test for the possibility that these results are being biased by the signal in just a few observations, we performed two tests. First, an analysis of the convergence of the least-squares calculation with increasing data revealed that using 33 observations to solve for the mutual influences of 12 gene groups (ostensibly 275% of the required data) was sufficient to yield stable results. Fig. 3 shows the correlation coefficient between the coefficients calculated using all 33 observations, and data sets made smaller by the random deletion of data. The distribution of points is restricted near 1 when » 27 of 33 observations are included in the analysis; all analyses done with > 26 observations were highly correlated to the results generated using the total data set. Considerably more observations are required for convergence (» 27) than for mathematical solubility (12). We believe that this follows from the sensitivity of the linear algebraic solution to the noise inherent in microarray data (1).

Furthermore, the set of coefficients generated using the average of the bootstrap replicates was highly correlated to the values derived using the least-squares fit (e.g., for the 2-h transition, r2 = 0.98 between these two methods).

1. Chen, Y., Kamat, V., Dougherty, E. R., Bittner, M. L., Meltzer, P. S. & Trent, J. M. (2002) Bioinformatics 18, 1207-1215.

Back to Dynamics of cellular level function and regulation derived from murine expression array data page

Back to Recent Publications page

Back to NECSI Research page

Maintained by NECSI Webmaster   Copyright © 2000-2005 New England Complex Systems Institute. All rights reserved.