Machine Learning Journal, 52(1-2):91-118, 2003.. Published: 2002.12.31
Stefano Monti, Pablo Tamayo, Jill Mesirov, and Todd Golub
Read ManuscriptIn this paper we present a new methodology of class discovery and clustering validation tailored to the task of analyzing gene expression data. The method can best be thought of as an analysis approach, to guide and assist in the use of any of a wide range of available clustering algorithms. We call the new methodology consensus clustering, and in conjunction with resampling techniques, it provides for a method to represent the consensus across multiple runs of a clustering algorithm and to assess the stability of the discovered clusters. The method can also be used to represent the consensus over multiple runs of a clustering algorithm with random restart (such as K-means, model-based Bayesian clustering, SOM, etc.), so as to account for its sensitivity to the initial conditions. Finally, it provides for a visualization tool to inspect cluster number, membership, and boundaries. We present the results of our experiments on both simulated data and real gene expression data aimed at evaluating the effectiveness of the methodology in discovering biologically meaningful clusters.
Description | Link/Filename |
---|---|
Technical Report | consensus4pdflatex.pdf |
Leukemia data | ALB_ALT_AML.1000genes.res |
Leukemia class template | ALB_ALT_AML.cls |
Novartis multi-tissue data | Novartis_BPLC.top1000.gct |
Novartis multi-tissue class template | Novartis_BPLC.cls |
St. Jude Leukemia data | leukemia.top1000.gct |
St. Jude Leukemia class template | leukemia.cls |
Lung cancer data | LungA_1000genes.gct |
Lung cancer class template | LungA_local.cls |
CNS tumors data | brain_morpho.1000genes.res |
CNS tumors class template | brain_morpho.cls |
Normal tissues data | cGCM_9_15000_nml_90.top100.res |
Normal tissues class template | cGCM_9_15000_nml_90.cls |
Uniform1 | uniform1.gct.gz |
Gaussian1 | gaussian1.gct.gz |
Gaussian3 | gaussian3.gct.gz |
Gaussian3 class template | gaussian3.cls |
Gaussian4 | gaussian4.gct.gz |
Gaussian4 class template | gaussian4.cls |
Gaussian5.delta2 | gaussian5.delta2.gct.gz |
Gaussian5.delta3 | gaussian5.delta3.gct.gz |
Gaussia5 class template | gaussian5.cls |
Simulated6 | artificial_dataset1.gct.gz |
Simulated6 class template | artificial_dataset1.cls |