Rimbun Siringoringo, Indra Kelana Jaya


Unbalanced data classification is a crucial problem in the field of machine learning and data mining. Data imbalances have a poor impact on classification results where minority classes are often misclassified as a majority class. Conventional machine learning algorithms are not equipped with the ability to work on unbalanced data, so the performance of conventional algorithms is always not optimal. In this study, ensemble learning using SMOTEBagging method was applied to classify 11 unbalanced datasets. SMOTEBagging performance is also compared with three types of conventional classification algorithms namely SVM, k-NN, and C4.5. By applying the 5 cross-validation scheme, the AUC value generated by SMOTEBagging is higher at 10 datasets. The mean values of the lowest to highest AUC were obtained by SVM, k-NN, C4.5 and SMOTEBagging algorithms with values 0.638, 0.742, 0.770 and 0.895. By applying Friedman test it was found that the performance of AUC SMOTEBagging differed significantly with the other three conventional methods SVM, k-NN and C4.5


Full Text:

HAL 75 - 81


A. Ali, S. M. Shamsuddin, and A. L. Ralescu, “Classification with class imbalance problem: a review,” Int J Adv. Soft Compu Appl, vol. 7, no. 3, 2015.

M. Buda, A. Maki, and M. A. Mazurowski, “A systematic study of the class imbalance problem in convolutional neural networks,” ArXiv Prepr. ArXiv171005381, 2017.

C. Zhang, Y. Chen, X. Liu, and X. Zhao, “Abstention-SMOTE: An over-sampling approach for imbalanced data classification,” in Proceedings of the 2017 International Conference on Information Technology, 2017, pp. 17–21.

G. Y. Wong, F. H. Leung, and S.-H. Ling, “A Hybrid Evolutionary Preprocessing Method for Imbalanced Datasets,” Inf. Sci., 2018.

Q. Gu, X.-M. Wang, Z. Wu, B. Ning, and C.-S. Xin, “An improved SMOTE algorithm based on genetic algorithm for imbalanced data classification,” J Dig Inf Manag, vol. 14, no. 2, pp. 92–103, 2016.

A. Mishra and U. S. Reddy, “A comparative study of customer churn prediction in telecom industry using ensemble based classifiers,” in Inventive Computing and Informatics (ICICI), International Conference on, 2017, pp. 721–725.

B. Karlik, A. Yibre, and K. Barış, Comprising Feature Selection and Classifier Methods with SMOTE for Prediction of Male Infertility, vol. 3. 2016.

R. Pruengkarn, K. W. Wong, and C. C. Fung, “Multiclass Imbalanced Classification Using Fuzzy C-Mean and SMOTE with Fuzzy Support Vector Machine,” in International Conference on Neural Information Processing, 2017, pp. 67–75.

A. Saifudin, “Penerapan Teknik Ensemble untuk Menangani Ketidakseimbangan Kelas pada Prediksi Cacat Software,” J. Softw. Eng., vol. 1, no. 1, p. 11, 2015.

A. Bisri and R. S. Wahono, “Penerapan Adaboost untuk Penyelesaian Ketidakseimbangan Kelas pada Penentuan Kelulusan Mahasiswa dengan Metode Decision Tree,” J. Intell. Syst., vol. 1, no. 1, p. 6, 2015.

M. Beckmann, N. F. F. Ebecken, and B. S. L. Pires de Lima, “A KNN Undersampling Approach for Data Balancing,” J. Intell. Learn. Syst. Appl., vol. 07, no. 04, pp. 104–116, 2015.

M. Moukhafi, K. E. Yassini, and S. Bri, “Mining network traffics for intrusion detection based on Bagging ensemble Multilayer perceptron with Genetic algorithm optimization,” p. 8, 2018.

L. Hakim, B. Sartono, and A. Saefuddin, “Bagging Based Ensemble Classification Method on Imbalance Datasets,” vol. 6, no. 6, p. 7, 2017.


  • There are currently no refbacks.

Department of Information System| Computer Science Faculty | Universitas Pelita Harapan | sistech.medan@uph.edu