JISE

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12]

Journal of Information Science and Engineering, Vol. 20 No. 4, pp. 665-677

An Efficient Approach to Identifying and Validating Clusters in Multivariate Datasets with Applications in Gene Expression Analysis

VINCENT SHIN-MU TSENG AND CHING-PING KAO
Department of Computer Science and Information Engineering
National Cheng Kung University
Tainan, 701 Taiwan
E-mail: tsengsm@mail.ncku.edu.tw
E-mail: zeno@dmlab.csie.ncku.edu.tw

Gene expression data analysis has become an important topic in bioinformatics due to its wide application in the biomedical industry. Effective analysis of gene expression data is an essential part of various data mining methods, especially the clustering techniques. Various kinds of clustering methods have been proposed, yet they do not satisfy for the requirements of high efficiency, high quality and automation in the mining of gene expression data. In this paper, we propose an efficient and automatic clustering approach that is suitable for gene expression analysis. The proposed approach primarily employs similarity-matrix based clustering techniques, complemented by new heuristics for reducing the computation cost. In particular, a novel validation technique is incorporated for evaluating the quality of the discovered gene expression patterns. Because it includes empirical evaluation of different gene expression datum, the proposed approach is able perform better than other methods in terms of efficiency, clustering quality and
automation.

Keywords: data mining, clustering, gene expression, microarray, validation technique

Retrieve PDF document (JISE_200404_05.pdf)