JISE


  [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12]


Journal of Information Science and Engineering, Vol. 36 No. 3, pp. 671-685


A Method Non-Deterministic and Computationally Viable for Detecting Outliers in Large Datasets


ALBERTO FERNÁNDEZ OLIVA1, FRANCISCO MACIÁ PÉREZ2,
JOSÉ VICENTE BERNÁ MARTINEZ2 AND MIGUEL ALFONSO ABREU ORTEGA3
1Department of Computer Science
University of Havana
Havana, 10100 Cuba
E-mail: afdez@matcom.uh.cu

2Department of Information Technology and Computer Science
University of Alicante
Alicante, 03690 Spain
E-mail: {pmacia; jvberna}@dtic.ua.es

3Georgia Institute of Technology
Atlanta, GA 30332 USA
E-mail: mabreu@gatech.edu


This paper presents an outlier detection method that is based on a Variable Precision Rough Set Model (VPRSM). This method generalizes the standard set inclusion relation, which is the foundation of the Rough Sets Basic Model (RSBM). The main contribution of this research is an improvement in the quality of detection because this generalization allows us to classify when there is some degree of uncertainty. From the proposed method, a computationally viable algorithm for large volumes of data is also introduced. The experiments performed in a real scenario and a comparison of the results with the RSBM-based method demonstrate the efficiency of both the method and the algorithm in diverse contexts that involve large volumes of data.


Keywords: outliers, rough sets (RS), RS basic model (RSBM), variable precision rough set model (VPRSM), data set, data mining

  Retrieve PDF document (JISE_202003_12.pdf)