JISE

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19]

Journal of Information Science and Engineering, Vol. 31 No. 1, pp. 207-228

Approximate Clustering of Time-Series Datasets using k-Modes Partitioning

SAEED AGHABOZORGI AND TEH YING WAH
Department of Information System
Faculty of Computer Science and Information Technology
University of Malaya
Kuala Lumpur, Malaysia
E-mail: {saeed; tehyw}@um.edu.my

Data in various systems, such as those in finance, healthcare, and business, are stored as time series. As such, interest in time series mining in these areas has surged. Clustering of data is performed as a pre-processing or exploratory approach in many data mining tasks. Time series data sets are often very large, thus, data cannot fit in the main memory for clustering. In this case, dimension reduction is a common solution. However, the cost of data reduction is relatively high because of overlooking the data involved in this process, leading to low-quality clustering. In this paper, we propose a new approach for improving the approximate clustering accuracy of dimensionality reduced time series by discretization approach. A new distance measure is initially introduced. Thereafter, the partitional algorithm that best matches the representation method is proposed.

Keywords: data mining, clustering, time series, approximation, distance measure, dimensionality reduction

Retrieve PDF document (JISE_201501_11.pdf)