SIAM Review, vol. 45, number 4, pp. 706-723 (2003).
Copyright ? 2003 by Society for Industrial and Applied Mathematics. Published: 2002.12.31
Ryan Rifkin, Sayan Mukherjee, Pablo Tamayo, Sridhar Ramaswamy, Chen-Hsiang Yeang, Michael Angelo, Michael Reich, Tomaso Poggio, Eric S. Lander, Todd R. Golub and Jill P. Mesirov.Read Manuscript
Modern cancer treatment relies upon clinical judgment and microscopic tissue examination to classify tumors according to anatomical site of origin. This approach is effective but subjective and variable even among experienced clinicians and pathologists. Recently, DNA microarray-generated gene expression data has been used to build molecular cancer classifiers. Previous work from our group and others demonstrated methods for solving pair-wise classification problems using such global gene expression patterns. However, classification across multiple primary tumor classes poses new methodological and computational challenges. In this paper we describe a computational methodology for multi-class prediction that combines class specific (one vs. all) binary Support Vector Machines. We apply this methodology to the diagnosis of multiple common adult malignancies using DNA microarray data from a collection of 198 tumor samples, spanning 14 of the most common tumor types. Overall classification accuracy is 78%, far exceeding the expected accuracy for random classification. In a large subset of the samples (80%), the algorithm attains 90% accuracy. The methodology described in this paper both demonstrates that accurate gene expression-based multi-class cancer diagnosis is possible and highlights some of the analytic challenges inherent in applying to such strategies to biomedical research.
|Paper (Word document)||multiclass.siam_final_March_12_2003.pdf|
|Same datasets as this paper||http://www-genome.wi.mit.edu/cgi-bin/cancer/publications/pub_paper.cgi?mode=view&paper_id=61|