[1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16]

Journal of Information Science and Engineering, Vol. 37 No. 5, pp. 1039-1051

Ensemble Case based Reasoning Imputation in Breast Cancer Classification

1Software Project Management Research Team
ENSIAS, Mohammed V University in Rabat
Rabat, 10112 Morocco

2MSDA, Mohammed VI Polytechnic University
Ben Guerir, 43150 Morocco
E-mail: imanechlioui@gmail.com; ali.idri@um5.ac.ma;
ibtissam_abnane@um5s.net.ma; mahmoud.ezzat@um6p.ma

Missing Data (MD) is a common drawback that affects breast cancer classification. Thus, handling missing data is primordial before building any breast cancer classifier. This paper presents the impact of using ensemble Case-Based Reasoning (CBR) imputation on breast cancer classification. Thereafter, we evaluated the influence of CBR using parameter tuning and ensemble CBR (E-CBR) with three missingness mechanisms (MCAR: missing completely at random, MAR: missing at random and NMAR: not missing at random) and nine percentages (10% to 90%) on the accuracy rates of five classifiers: Decision trees, Random forest, K-nearest neighbor, Support vector machine and Multi-layer perceptron over two Wisconsin breast cancer datasets. All experiments were implemented using Weka JAVA API code 3.8; SPSS v20 was used for statistical tests. The findings confirmed that E-CBR yields to better results compared to CBR for the five classifiers. The MD percentage affects negatively the classifier performance: as the MD percentage increases, the accuracy rates of the classifier decrease regardless the MD mechanism and technique. RF with E-CBR outperformed all the other combinations (MD technique, classifier) with 89.72% for MCAR, 87.08% for MAR and 86.84% for NMAR.

Keywords: breast cancer, ensemble, CBR imputation, missing data, classification

  Retrieve PDF document (JISE_202105_04.pdf)