Consensus Clustering: A resampling-based method for class discovery and visualization of gene expression microarray data

Machine Learning Journal, 52(1-2):91-118, 2003.. Published: 2002.12.31

Stefano Monti, Pablo Tamayo, Jill Mesirov, and Todd Golub

Read Manuscript


In this paper we present a new methodology of class discovery and clustering validation tailored to the task of analyzing gene expression data. The method can best be thought of as an analysis approach, to guide and assist in the use of any of a wide range of available clustering algorithms. We call the new methodology consensus clustering, and in conjunction with resampling techniques, it provides for a method to represent the consensus across multiple runs of a clustering algorithm and to assess the stability of the discovered clusters. The method can also be used to represent the consensus over multiple runs of a clustering algorithm with random restart (such as K-means, model-based Bayesian clustering, SOM, etc.), so as to account for its sensitivity to the initial conditions. Finally, it provides for a visualization tool to inspect cluster number, membership, and boundaries. We present the results of our experiments on both simulated data and real gene expression data aimed at evaluating the effectiveness of the methodology in discovering biologically meaningful clusters.

Keywords: Cancer classification. Clustering Microarray data Unsupervised learning

Supplemental Data

Description Link/Filename
Technical Report consensus4pdflatex.pdf
Leukemia data ALB_ALT_AML.1000genes.res
Leukemia class template ALB_ALT_AML.cls
Novartis multi-tissue data Novartis_BPLC.top1000.gct
Novartis multi-tissue class template Novartis_BPLC.cls
St. Jude Leukemia data leukemia.top1000.gct
St. Jude Leukemia class template leukemia.cls
Lung cancer data LungA_1000genes.gct
Lung cancer class template LungA_local.cls
CNS tumors data brain_morpho.1000genes.res
CNS tumors class template brain_morpho.cls
Normal tissues data cGCM_9_15000_nml_90.top100.res
Normal tissues class template cGCM_9_15000_nml_90.cls
Uniform1 uniform1.gct.gz
Gaussian1 gaussian1.gct.gz
Gaussian3 gaussian3.gct.gz
Gaussian3 class template gaussian3.cls
Gaussian4 gaussian4.gct.gz
Gaussian4 class template gaussian4.cls
Gaussian5.delta2 gaussian5.delta2.gct.gz
Gaussian5.delta3 gaussian5.delta3.gct.gz
Gaussia5 class template gaussian5.cls
Simulated6 artificial_dataset1.gct.gz
Simulated6 class template artificial_dataset1.cls