Metagene projection for cross platform, cross species characterization of global transcriptional states

Proc. Natl. Acad. Sci. USA, 104: 5959-5964. Published: 2007.04.02

Pablo Tamayo, Daniel Scanfeld, Benjamin L. Ebert, Michael A. Gillette, Charles W. M. Roberts, and Jill P. Mesirov.

Read Manuscript


The high dimensionality of global transcription profiles, the expression level of 20,000 genes in a much small number of samples, presents challenges that affect the sensitivity and general applicability of analysis results. In principle, it would be better to describe the data in terms of a small number of metagenes, positive linear combinations of genes, which could reduce noise while still capturing the invariant biological features of the data. Here we describe how to accomplish such a reduction in dimension by a metagene projection methodology, which can greatly reduce the number of features used to characterize microarray data. We show, in applications to the analysis of leukemia, lung cancer, and central nervous systems tumor data sets, how this approach can help assess and interpret similarities and differences between independent data sets, enable cross-platform and cross-species analysis, improve clustering and class prediction, and provide a computational means for detecting and removing sample contamination.

Keywords: cancer, dimension reduction, expression analysis, noise reduction, sample contamination

Mp method

Supplemental Data

Description Link/Filename
Readme file with instructions about how to run the code readme.txt
Leukemia 1 example: R code and datasets
Leukemia 2 example: R code and datasets
Lung example: R code and datasets