Model-based clustering in GENE expression microarrays: An application to breast cancer data

J. C. Mar, G. J. Mclachlan

Research output: Contribution to journalArticlepeer-review

7 Scopus citations


In microarray studies, the application of clustering techniques is often used to derive meaningful insights into the data. In the past, hierarchical methods have been the primary clustering tool employed to perform this task. The hierarchical algorithms have been mainly applied heuristically to these cluster analysis problems. Further, a major limitation of these methods is their inability to determine the number of clusters. Thus there is a need for a model-based approach to these clustering problems. To this end, McLachlan et al. [7] developed a mixture model-based algorithm (EMMIX-GENE) for the clustering of tissue samples. To further investigate the EMMIX-GENE procedure as a model-based approach, we present a case study involving the application of EMMIX-GENE to the breast cancer data as studied recently in van 't Veer et al. [10]. Our analysis considers the problem of clustering the tissue samples on the basis of the genes which is a non-standard problem because the number of genes greatly exceed the number of tissue samples. We demonstrate how EMMIX-GENE can be useful in reducing the initial set of genes down to a more computationally manageable size. The results from this analysis also emphasise the difficulty associated with the task of separating two tissue groups on the basis of a particular subset of genes. These results also shed light on why supervised methods have such a high misallocation error rate for the breast cancer data.

Original languageEnglish (US)
Pages (from-to)579-592
Number of pages14
JournalInternational Journal of Software Engineering and Knowledge Engineering
Issue number6
StatePublished - Dec 2003
Externally publishedYes


  • Cluster analysis
  • Microarray
  • Mixture modelling

ASJC Scopus subject areas

  • Software
  • Computer Networks and Communications
  • Computer Graphics and Computer-Aided Design
  • Artificial Intelligence


Dive into the research topics of 'Model-based clustering in GENE expression microarrays: An application to breast cancer data'. Together they form a unique fingerprint.

Cite this