ENSEMBLE LEARNING DENGAN METODE SMOTEBAGGING PADA KLASIFIKASI DATA TIDAK SEIMBANG

Authors

  • Rimbun Siringoringo Universitas Methodist Indonesia
  • Indra Kelana Jaya Universitas Methodist Indonesia

Abstract

Unbalanced data classification is a crucial problem in the field of machine learning and data mining. Data imbalances have a poor impact on classification results where minority classes are often misclassified as a majority class. Conventional machine learning algorithms are not equipped with the ability to work on unbalanced data, so the performance of conventional algorithms is always not optimal. In this study, ensemble learning using SMOTEBagging method was applied to classify 11 unbalanced datasets. SMOTEBagging performance is also compared with three types of conventional classification algorithms namely SVM, k-NN, and C4.5. By applying the 5 cross-validation scheme, the AUC value generated by SMOTEBagging is higher at 10 datasets. The mean values of the lowest to highest AUC were obtained by SVM, k-NN, C4.5 and SMOTEBagging algorithms with values 0.638, 0.742, 0.770 and 0.895. By applying Friedman test it was found that the performance of AUC SMOTEBagging differed significantly with the other three conventional methods SVM, k-NN and C4.5

ENSEMBLE LEARNING DENGAN  METODE  SMOTEBagging PADA KLASIFIKASI DATA TIDAK SEIMBANG

References

[1] A. Ali, S. M. Shamsuddin, and A. L. Ralescu, “Classification with class imbalance problem: a review,” Int J Adv. Soft Compu Appl, vol. 7, no. 3, 2015.
[2] M. Buda, A. Maki, and M. A. Mazurowski, “A systematic study of the class imbalance problem in convolutional neural networks,” ArXiv Prepr. ArXiv171005381, 2017.
[3] C. Zhang, Y. Chen, X. Liu, and X. Zhao, “Abstention-SMOTE: An over-sampling approach for imbalanced data classification,” in Proceedings of the 2017 International Conference on Information Technology, 2017, pp. 17–21.
[4] G. Y. Wong, F. H. Leung, and S.-H. Ling, “A Hybrid Evolutionary Preprocessing Method for Imbalanced Datasets,” Inf. Sci., 2018.
[5] Q. Gu, X.-M. Wang, Z. Wu, B. Ning, and C.-S. Xin, “An improved SMOTE algorithm based on genetic algorithm for imbalanced data classification,” J Dig Inf Manag, vol. 14, no. 2, pp. 92–103, 2016.
[6] A. Mishra and U. S. Reddy, “A comparative study of customer churn prediction in telecom industry using ensemble based classifiers,” in Inventive Computing and Informatics (ICICI), International Conference on, 2017, pp. 721–725.
[7] B. Karlik, A. Yibre, and K. Barış, Comprising Feature Selection and Classifier Methods with SMOTE for Prediction of Male Infertility, vol. 3. 2016.
[8] R. Pruengkarn, K. W. Wong, and C. C. Fung, “Multiclass Imbalanced Classification Using Fuzzy C-Mean and SMOTE with Fuzzy Support Vector Machine,” in International Conference on Neural Information Processing, 2017, pp. 67–75.
[9] A. Saifudin, “Penerapan Teknik Ensemble untuk Menangani Ketidakseimbangan Kelas pada Prediksi Cacat Software,” J. Softw. Eng., vol. 1, no. 1, p. 11, 2015.
[10] A. Bisri and R. S. Wahono, “Penerapan Adaboost untuk Penyelesaian Ketidakseimbangan Kelas pada Penentuan Kelulusan Mahasiswa dengan Metode Decision Tree,” J. Intell. Syst., vol. 1, no. 1, p. 6, 2015.
[11] M. Beckmann, N. F. F. Ebecken, and B. S. L. Pires de Lima, “A KNN Undersampling Approach for Data Balancing,” J. Intell. Learn. Syst. Appl., vol. 07, no. 04, pp. 104–116, 2015.
[12] M. Moukhafi, K. E. Yassini, and S. Bri, “Mining network traffics for intrusion detection based on Bagging ensemble Multilayer perceptron with Genetic algorithm optimization,” p. 8, 2018.
[13] L. Hakim, B. Sartono, and A. Saefuddin, “Bagging Based Ensemble Classification Method on Imbalance Datasets,” vol. 6, no. 6, p. 7, 2017.

Downloads

Published

2018-07-16