Proc. Natl. Acad. Sci. USA 2004 101: 4164-4169. Published: 2004.03.22
Jean-Philippe Brunet, Pablo Tamayo, Todd Golub, Jill Mesirov
Read ManuscriptThe ability to generate large amounts of genomic information using DNA microarrays provides an opportunity to extract from these data previously unrecognized biological structure and meaning. The challenge, however, is that existing unsupervised clustering methods are often non-robust, and lack the ability to discover subtle, context-dependent biological patterns. We describe here the use of Non-negative Matrix Factorization (NMF), an algorithm based on decomposition-by-parts, and we demonstrate its ability to recover meaningful biological information from cancer-related microarray data without supervision. Coupled with a novel model selection mechanism, NMF is an efficient method for identification of distinct molecular patterns and provides a powerful method for class discovery. NMF appears to have higher resolution than other methods such as hierarchical clustering or self-organizing maps, and to be less sensitive to a priori selection of genes. Rather than separating gene clusters based on distance computation, NMF detects context-dependent patterns of gene expression in complex biological systems. This ability, similar to semantic polysemy in text, provides a general method for robust molecular pattern discovery.
Description | Link/Filename |
---|---|
ALL-AML gene expression data | ALL_AML_data.txt |
ALL-AML samples | ALL_AML_samples.txt |
ALL-AML genes | ALL_AML_genes.txt |
Medulloblastomas gene expression data | Medulloblastoma_data.txt |
Medulloblastomas samples | Medulloblastomas_samples.txt |
Medulloblastomas genes | Medulloblastoma_genes.txt |
Matlab M-file for NMF | nmf.m |
Matlab M-file for reordering NMF consensus matrices | nmforderconsensus.m |
supplemental information | NMF_final_supplement.pdf |
Matlab M-file for NMF (model selection) | nmfconsensus.m |
Some papers making use of NMF codes (as of 8/07) | NMF_code_used_8_07.doc |
NMF codes FAQ | NMF_codes_FAQ.doc |