Interpreting patterns of gene expression with self-organizing maps

Proc. Natl. Acad. Sci. USA 96:2907-2912.. Published: 1999.03.15

Pablo Tamayo, Donna Slonim, Jill Mesirov, Qing Zhu, Sutisak Kitareewan, Ethan Dmitrovsky, Eric S. Lander, and Todd R. Golub.

Read Manuscript


Array technologies have made it straightforward to monitor simultaneously the expression pattern of thousands of genes. The challenge now is to interpret such massive data sets. The first step is to extract the fundamental patterns of gene expression inherent in the data. This paper describes the application of self-organizing maps, a type of mathematical cluster analysis that is particularly well suited for recognizing and classifying features in complex, multidimensional data. The method has been implemented in a publicly available computer package, GENECLUSTER, that performs the analytical calculations and provides easy data visualization. To illustrate the value of such analysis, the approach is applied to hematopoietic differentiation in four well studied models (HL-60, U937, Jurkat, and NB4 cells). Expression patterns of some 6,000 human genes were assayed, and an online database was created. GENECLUSTER was used to organize the genes into biologically relevant clusters that suggest novel hypotheses about hematopoietic differentiationfor example, highlighting certain genes and pathways involved in "differentiation therapy" used in the treatment of acute promyelocytic leukemia.

Keywords: SOM clustering gene expression hematopoeitic differentiation self-organizing map


Supplemental Data

Description Link/Filename
Experimental protocol (.html) protocol.html
Dataset description Datasets_description.txt
Dataset data_set_HL60 (Excel) data_set_HL60.tsv
Dataset data_set_HL60_U937_NB4_Jurkat (text) data_set_HL60_U937_NB4_Jurkat.txt
Dataset data_set_HL60_U937_NB4_Jurkat (Excel) data_set_HL60_U937_NB4_Jurkat.tsv
Dataset data_set_HL60_U937_NB4_Jurkat (Excel) data_set_HL60_U937_NB4_Jurkat.tsv