JISE


  [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14]


Journal of Information Science and Engineering, Vol. 30 No. 1, pp. 1-23


Clustering Non-Ordered Discrete Data


ALOK WATVE1, SAKTI PRAMANIK1, SUNGWON JUNG2, BUMJOON JO2, SUNIL KUMAR3 AND SHAMIK SURAL3
1Department of Computer Science and Engineering
Michigan State University
MI 48824-1226, USA
2Department of Computer Science and Engineering
Sogang University
Seoul 121-742, Korea
3School of Information Technology
Indian Institute of Technology
Kharagpur 721302, India


    Clustering in continuous vector data spaces is a well-studied problem. In recent years there has been a significant amount of research work in clustering categorical data. However, most of these works deal with market-basket type transaction data and are not specifically optimized for high-dimensional vectors. Our focus in this paper is to efficiently cluster high-dimensional vectors in non-ordered discrete data spaces (NDDS). We have defined several necessary geometrical concepts in NDDS which form the basis of our clustering algorithm. Several new heuristics have been employed exploiting the characteristics of vectors in NDDS. Experimental results on large synthetic datasets demonstrate that the proposed approach is effective, in terms of cluster quality, robustness and running time. We have also applied our clustering algorithm to real datasets with promising results.


Keywords: clustering, data mining, categorical data, non-ordered discrete data, vector data

  Retrieve PDF document (JISE_201401_01.pdf)