Metagenes and molecular pattern discovery using matrix factorization

Proc. Natl. Acad. Sci. USA 2004 101: 4164-4169. Published: 2004.03.22

Jean-Philippe Brunet, Pablo Tamayo, Todd Golub, Jill Mesirov

Read Manuscript


The ability to generate large amounts of genomic information using DNA microarrays provides an opportunity to extract from these data previously unrecognized biological structure and meaning. The challenge, however, is that existing unsupervised clustering methods are often non-robust, and lack the ability to discover subtle, context-dependent biological patterns. We describe here the use of Non-negative Matrix Factorization (NMF), an algorithm based on decomposition-by-parts, and we demonstrate its ability to recover meaningful biological information from cancer-related microarray data without supervision. Coupled with a novel model selection mechanism, NMF is an efficient method for identification of distinct molecular patterns and provides a powerful method for class discovery. NMF appears to have higher resolution than other methods such as hierarchical clustering or self-organizing maps, and to be less sensitive to a priori selection of genes. Rather than separating gene clusters based on distance computation, NMF detects context-dependent patterns of gene expression in complex biological systems. This ability, similar to semantic polysemy in text, provides a general method for robust molecular pattern discovery.

Keywords: molecular pattern recognition, matrix factorization, unsupervised learning

Nmf website2

Supplemental Data

Description Link/Filename
ALL-AML gene expression data ALL_AML_data.txt
ALL-AML samples ALL_AML_samples.txt
ALL-AML genes ALL_AML_genes.txt
Medulloblastomas gene expression data Medulloblastoma_data.txt
Medulloblastomas samples Medulloblastomas_samples.txt
Medulloblastomas genes Medulloblastoma_genes.txt
Matlab M-file for NMF nmf.m
Matlab M-file for reordering NMF consensus matrices nmforderconsensus.m
supplemental information NMF_final_supplement.pdf
Matlab M-file for NMF (model selection) nmfconsensus.m
Some papers making use of NMF codes (as of 8/07) NMF_code_used_8_07.doc
NMF codes FAQ NMF_codes_FAQ.doc