[1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16]

Journal of Information Science and Engineering, Vol. 37 No. 5, pp. 1011-1023

DDCO - Diversified Data Characteristic-based Oversampling for Imbalance Classification Problems

Department of the Computer Science and Engineering
Koneru Lakshmaiah Education Foundation
Hyderabad, 500075 India
E-mail: gillala.rekha@klh.edu.in; vkrishnareddy@kluniversity.in

Although several techniques have been designed to handle class imbalance problems at data pre-processing level, they still face the difficult of over-generalization due to noisy minority samples and the overlapping region around class boundaries. In this study, an improved minority samples generation is proposed called Diversified Data Characteristic based Oversampling (DDCO) technique established on the instance characteristics of each dimension in the data space. In order to cope with over-generalization and overlapping problem, an improved minority samples generation is proposed to locate the newly generated synthetic samples in the minority region without any penetration into the majority space. The data characteristics of each dimension is used to control the location of the newly generated samples in same region. The performance of the proposed model has been evaluated on 14 imbalanced datasets and compared with state-of-the-art methods like SMOTE, Borderline-SMOTE, ADASYN, MWMOTE using AUC, and F-Measure as the performance measures. The results indicate significant improvement over the state-of-the-art methods.

Keywords: class imbalance problems, data pre-processing, data sampling methods, oversampling, synthetic sample generation

  Retrieve PDF document (JISE_202105_02.pdf)