Computational and Mathematical Methods in Medicine
Volume 2013 (2013), Article ID 693901, 8 pages
http://dx.doi.org/10.1155/2013/693901
Research Article

Cancer Outlier Analysis Based on Mixture Modeling of Gene Expression Data

1Department of Statistical Science, School of Multidisciplinary Sciences, The Graduate University for Advanced Studies, 10-3 Midori-cho, Tachikawa, Tokyo 190-8562, Japan
2Clinical Trial Coordination Office, Shizuoka Cancer Center, 1007 Shimonagakubo, Nagaizumi-cho Sunto-gun, Shizuoka 411-8777, Japan
3Asia-Pacific Statistical Sciences, Lilly Research Laboratories Development Center of Excellence Asia Pacific, Eli Lilly Japan K. K. Sannomiya Plaza Building 7-1-5 Isogamidori, Chuo-ku, Kobe, Hyogo 651-0086, Japan
4Department of Data Science, The Institute of Statistical Mathematics, 10-3 Midori-cho, Tachikawa, Tokyo 190-8562, Japan

Received 30 January 2013; Accepted 23 March 2013

Academic Editor: Shinto Eguchi

Copyright © 2013 Keita Mori et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Molecular heterogeneity of cancer, partially caused by various chromosomal aberrations or gene mutations, can yield substantial heterogeneity in gene expression profile in cancer samples. To detect cancer-related genes which are active only in a subset of cancer samples or cancer outliers, several methods have been proposed in the context of multiple testing. Such cancer outlier analyses will generally suffer from a serious lack of power, compared with the standard multiple testing setting where common activation of genes across all cancer samples is supposed. In this paper, we consider information sharing across genes and cancer samples, via a parametric normal mixture modeling of gene expression levels of cancer samples across genes after a standardization using the reference, normal sample data. A gene-based statistic for gene selection is developed on the basis of a posterior probability of cancer outlier for each cancer sample. Some efficiency improvement by using our method was demonstrated, even under settings with misspecified, heavy-tailed -distributions. An application to a real dataset from hematologic malignancies is provided.