JISE


  [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16]


Journal of Information Science and Engineering, Vol. 39 No. 4, pp. 951-973


Heuristic Feature Selection with Classification Efficiency Using Soft Cluster Analysis for Biological Datasets


HUNG-YI LIN+ AND RONG-CHANG CHEN
Department of Distribution Management
National Taichung University of Science and Technology
Taichung, 404 Taiwan
E-mail: {linhy; rcchens}@nutc.edu.tw


With a deeper investigation to deciphering the sophisticated relations among input and output variables of multi-class classification problems, the goal of this paper is to propose a new model of variable selection which maximizes the discrimination and minimizes the size of the selected feature subsets. For molecular datasets with a tremendous amount of input variables, the proposed heuristic algorithm is capable of exploring the essential factors of classification problems. Our model devotes to three accomplishments of multi-class classification tasks. Feature discretization using fuzzy clustering analysis for the improvement of feature discrimination is the first. Multivariate analysis for the investigation of information relevance and redundancy is the second achievement in this study. The third is a novel heuristic feature selection algorithm with effectiveness but without overfitting problem. Experimental results convince our model acquires significant discrimination improvement for microarray classification problems.


Keywords: feature discretization, fuzzy c-means, feature selection, feature evaluation, discrimination power

  Retrieve PDF document (JISE_202304_15.pdf)