**Data pertaining to the article ‘Broad patterns of gene expression
revealed by clustering of tumor and normal colon tissues probed by oligonucleotide
arrays’**

U. Alon, N. Barkai, D. A. Notterman, K. Gish, S. Ybarra, D. Mack, and A. J. Levine,

Proc. Natl. Acad. Sci. USA, Vol. 96, Issue 12, 6745-6750, June 8, 1999

The matrix __I2000__ contains the expression of the 2000
genes with highest minimal intensity across the 62 tissues. The genes are placed in order
of descending minimal intensity. Each entry in I2000 is a gene intensity derived from the
~20 feature pairs that correspond to the gene on the chip, derived using the filtering
process described in the ‘materials and methods’ section. The data is otherwise
unprocessed (for example it has not been normalized by the mean intensity of each
experiment).

The file ‘__names__’ contains the EST number and
description of each of the 2000 genes, in an order that corresponds to the order in I2000.
Note that some ESTs are repeated which means that they are tiled a number of times on the
chip, with different choices of feature sequences. The descriptions UMGAP, HSAC07 and I
correspond to control RNAs spiked with each experiment.

The identity of the 62 tissues is given in file __tissues__.
The numbers correspond to patients, a positive sign to a normal tissue, and a negative
sign to a tumor tissue.

The clustering algorithm (matlab version 5.1) is given in file __cluster__.
This program accepts a matrix of input data, where each object to be clustered is
represented by a column. It outputs three variables: Ord, Num and BetaVal. Ord contains
the post-clustering order of objects along the binary tree. Additional information about
the binary tree is found in Num and BetaVal. Num contains the sizes of the clusters at
each splitting, so that in the second row of Num are two nonzero entries corresponding to
the sizes of the two clusters in the first division in the tree, the third row has 4
entries, etc. The matrix BetaVal contains the b values of the
cluster splits (see ‘materials and methods’).

For questions about the clustering algorithm, please contact Uri Alon (urialon@weizmann.ac.il updated 17/7/2000).