The Comparison of Imbalanced Data Handling Method in Software Defect Prediction

Khadijah Khadijah; Priyo Sidik Sasongko

doi:10.22219/kinetik.v5i3.1049

Issue

Vol. 5, No. 3, August 2020

Issue Published : Aug 31, 2020

The Comparison of Imbalanced Data Handling Method in Software Defect Prediction

https://doi.org/10.22219/kinetik.v5i3.1049

Khadijah Khadijah

Universitas Diponegoro

Priyo Sidik Sasongko

Universitas Diponegoro

Corresponding Author(s) : Khadijah Khadijah

khadijah@live.undip.ac.id

Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control, Vol. 5, No. 3, August 2020
Article Published : Aug 15, 2020

Abstract

Software testing is a crucial process in software development life cycle which will affect the software quality. However, testing is a tedious task and resource consuming. Software testing can be conducted more efficiently by focusing this activitiy to software modules which is prone to defect. Therefore, an automated software defect prediction is needed. This research implemented Extreme Learning Machine (ELM) as classification algorithm because of its simplicity in training process and good generalization performance. Aside classification algorithm, the most important problem need to be addressed is imbalanced data between samples of positive class (prone to defect) and negative class. Such imbalance problem could bias the performance of classifier. Therefore, this research compared some approaches to handle imbalance problem between SMOTE (resampling method) and weighted-ELM (algorithm-level method).The results of experiment using 10-fold cross validation on NASA MDP dataset show that including imbalance problem handling in building software defect prediction model is able to increase the specificity and g-mean of model. When the value of imbalance ratio is not very small, the SMOTE is better than weighted-ELM. Otherwise, weighted-ELM is better than SMOTE in term of sensitivity and g-mean, but worse in term of specificity and accuracy.

Keywords

Extreme Learning Machine (ELM) Software Defect Prediction SMOTE Weighted-ELM

Khadijah, K., & Sasongko, P. S. (2020). The Comparison of Imbalanced Data Handling Method in Software Defect Prediction. Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control, 5(3), 203-210. https://doi.org/10.22219/kinetik.v5i3.1049

Download Citation

References

R. S. Pressman, y Software Engineering: A Practitioner’s Approach Seventh Edition. McGraw-Hill, 2009.
D. Galin, Software Quality: Concepts and Practice. IEEE Computer Society, 2018.
P. Thi, M. Phuong, and P. H. Thong, “Empirical Study of Software Defect Prediction : A Systematic Mapping,” Symmetry (Basel)., Vol. 11, No. 212, Pp. 1–28, 2019. https://doi.org/10.3390/sym11020212
T. Sethi and Gagandeep, “Improved Approach for Software Defect Prediction using Artificial Neural Networks,” in 5th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO), 2016, Pp. 480–485. https://doi.org/10.1109/ICRITO.2016.7785003
E. Irawan and R. S. Wahono, “Penggunaan Random Under Sampling untukPenangananKetidakseimbanganKelaspadaPrediksiCacat Software Berbasis Neural Network,” J. Softw. Eng., Vol. 1, No. 2, Pp. 92–100, 2015.
X. Rong, F. Li, and Z. Cui, “A model for software defect prediction using support vector machine based on CBA,” Int. J. Intell. Syst. Technol. Appl., Vol. 15, No. 1, Pp. 19–34, 2016.
S. A. Putri and R. S. Wahono, “Integrasi SMOTE dan Information Gain pada Naive Bayes untukPrediksiCacat Software,” J. Softw. Eng., Vol. 1, No. 2, 2015.
J. Han and M. Kamber, Data Mining: Concepts and Techniques Second Edition. San Farnsisco: Elsevier Inc., 2006.
G. Huang, Q. Zhu, and C. Siew, “Extreme Learning Machine : Theory and Applications,” Neurocomputing, Vol. 70, Pp. 489–501, 2006. https://doi.org/10.1016/j.neucom.2005.12.126
G.-B. Huang, H. Zhou, X. Ding, and R. Zhang, “Extreme Learning Machine for Regression and Multiclass Classification,” IEEE Trans. Syst. Man, Cybern. - Part B Cybern., Vol. 42, No. 2, Pp. 513–528, 2012. https://doi.org/10.1109/TSMCB.2011.2168604
G. Haixiang, L. Yijing, J. Shang, G. Mingyun, H. Yuanyue, and G. Bing, “Learning from class-imbalanced data: Review of methods and applications,” Expert Syst. Appl., Vol. 73, Pp. 220–239, 2017. https://doi.org/10.1016/j.eswa.2016.12.035
G. Douzas, F. Bacao, and F. Last, “Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE,” Inf. Sci. (Ny)., Vol. 465, Pp. 1–20, 2018. https://doi.org/10.1016/j.ins.2018.06.056
N.V.Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: Synthetic minority over-sampling technique,” J. Artif. Intell.Res., Vol. 16, Pp. 321–357, 2002. https://doi.org/10.1613/jair.953
Y. Suh, J. Yu, J. Mo, L. Song, C. Kim, “A Comparison of Oversampling Methods on Imbalanced Topic Classification of Korean News Articles”, J. Cognitive Sci., Vol. 18, No. 4, Pp 391-437, 2017. https://doi.org/10.17791/jcs.2017.18.4.391
P. Sarakit, T. Theeramunkong, and C. Haruechaiyasak, “Improving Emotion Classification in Imbalanced YouTube Dataset Using SMOTE Algorithm,” in 2nd International Conference on Advanced Informatics: Concepts, Theory and Applications (ICAICTA), 2015. https://doi.org/10.1109/ICAICTA.2015.7335373
L. Demidova and I. Klyueva, “SVM Classification : Optimization with the SMOTE Algorithm for the Class Imbalance Problem,” in 6th Mediterranean Conference on Embedded Computing (MECO), 2017, No. 11-15 June, Pp. 17–20. https://doi.org/10.1109/MECO.2017.7977136
W. Zong, G. Bin Huang, and Y. Chen, “Weighted extreme learning machine for imbalance learning,” Neurocomputing, Vol. 101, Pp. 229–242, 2013. https://doi.org/10.1016/j.neucom.2012.08.010
M. Shepperd, Q. Song, Z. Sun, and C. Mair, “Data Quality : Some Comments on the NASA Software Defect Datasets,” Vol. 39, No. 9, Pp. 1208–1215, 2013. https://doi.org/10.1109/TSE.2013.11
G. Huang, “Extreme Learning Machine - Learning Without Iterative Tuning.” Tutorial in IJCNN2012/WCCI2012, Brisbane, 2012.
I. K. Timotius and S. G. Miaou, “Arithmetic Means of Accuracies: A Classifier Performance Measurement for Imbalanced Data Set,” in International Conference on Audio, Language and Image Processing (ICALIP), 2010, Pp. 1244–1251. https://doi.org/10.1109/ICALIP.2010.5685124
S. Haykin, Neural Networks - A Comprehensive Foundation, Second Edition. India: Pearson Education, 2005.
H. Rianto and R. S. Wahono, “Resampling Logistic Regression untukPenangananKetidakseimbangan Class padaPrediksiCacat Software,” J. Softw.Eng., Vol. 1, No. 1, Pp. 46–53, 2015.

References

R. S. Pressman, y Software Engineering: A Practitioner’s Approach Seventh Edition. McGraw-Hill, 2009.

D. Galin, Software Quality: Concepts and Practice. IEEE Computer Society, 2018.

P. Thi, M. Phuong, and P. H. Thong, “Empirical Study of Software Defect Prediction : A Systematic Mapping,” Symmetry (Basel)., Vol. 11, No. 212, Pp. 1–28, 2019. https://doi.org/10.3390/sym11020212

T. Sethi and Gagandeep, “Improved Approach for Software Defect Prediction using Artificial Neural Networks,” in 5th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO), 2016, Pp. 480–485. https://doi.org/10.1109/ICRITO.2016.7785003

E. Irawan and R. S. Wahono, “Penggunaan Random Under Sampling untukPenangananKetidakseimbanganKelaspadaPrediksiCacat Software Berbasis Neural Network,” J. Softw. Eng., Vol. 1, No. 2, Pp. 92–100, 2015.

X. Rong, F. Li, and Z. Cui, “A model for software defect prediction using support vector machine based on CBA,” Int. J. Intell. Syst. Technol. Appl., Vol. 15, No. 1, Pp. 19–34, 2016.

S. A. Putri and R. S. Wahono, “Integrasi SMOTE dan Information Gain pada Naive Bayes untukPrediksiCacat Software,” J. Softw. Eng., Vol. 1, No. 2, 2015.

J. Han and M. Kamber, Data Mining: Concepts and Techniques Second Edition. San Farnsisco: Elsevier Inc., 2006.

G. Huang, Q. Zhu, and C. Siew, “Extreme Learning Machine : Theory and Applications,” Neurocomputing, Vol. 70, Pp. 489–501, 2006. https://doi.org/10.1016/j.neucom.2005.12.126

G.-B. Huang, H. Zhou, X. Ding, and R. Zhang, “Extreme Learning Machine for Regression and Multiclass Classification,” IEEE Trans. Syst. Man, Cybern. - Part B Cybern., Vol. 42, No. 2, Pp. 513–528, 2012. https://doi.org/10.1109/TSMCB.2011.2168604

G. Haixiang, L. Yijing, J. Shang, G. Mingyun, H. Yuanyue, and G. Bing, “Learning from class-imbalanced data: Review of methods and applications,” Expert Syst. Appl., Vol. 73, Pp. 220–239, 2017. https://doi.org/10.1016/j.eswa.2016.12.035

G. Douzas, F. Bacao, and F. Last, “Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE,” Inf. Sci. (Ny)., Vol. 465, Pp. 1–20, 2018. https://doi.org/10.1016/j.ins.2018.06.056

N.V.Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: Synthetic minority over-sampling technique,” J. Artif. Intell.Res., Vol. 16, Pp. 321–357, 2002. https://doi.org/10.1613/jair.953

Y. Suh, J. Yu, J. Mo, L. Song, C. Kim, “A Comparison of Oversampling Methods on Imbalanced Topic Classification of Korean News Articles”, J. Cognitive Sci., Vol. 18, No. 4, Pp 391-437, 2017. https://doi.org/10.17791/jcs.2017.18.4.391

P. Sarakit, T. Theeramunkong, and C. Haruechaiyasak, “Improving Emotion Classification in Imbalanced YouTube Dataset Using SMOTE Algorithm,” in 2nd International Conference on Advanced Informatics: Concepts, Theory and Applications (ICAICTA), 2015. https://doi.org/10.1109/ICAICTA.2015.7335373

L. Demidova and I. Klyueva, “SVM Classification : Optimization with the SMOTE Algorithm for the Class Imbalance Problem,” in 6th Mediterranean Conference on Embedded Computing (MECO), 2017, No. 11-15 June, Pp. 17–20. https://doi.org/10.1109/MECO.2017.7977136

W. Zong, G. Bin Huang, and Y. Chen, “Weighted extreme learning machine for imbalance learning,” Neurocomputing, Vol. 101, Pp. 229–242, 2013. https://doi.org/10.1016/j.neucom.2012.08.010

M. Shepperd, Q. Song, Z. Sun, and C. Mair, “Data Quality : Some Comments on the NASA Software Defect Datasets,” Vol. 39, No. 9, Pp. 1208–1215, 2013. https://doi.org/10.1109/TSE.2013.11

G. Huang, “Extreme Learning Machine - Learning Without Iterative Tuning.” Tutorial in IJCNN2012/WCCI2012, Brisbane, 2012.

I. K. Timotius and S. G. Miaou, “Arithmetic Means of Accuracies: A Classifier Performance Measurement for Imbalanced Data Set,” in International Conference on Audio, Language and Image Processing (ICALIP), 2010, Pp. 1244–1251. https://doi.org/10.1109/ICALIP.2010.5685124

S. Haykin, Neural Networks - A Comprehensive Foundation, Second Edition. India: Pearson Education, 2005.

H. Rianto and R. S. Wahono, “Resampling Logistic Regression untukPenangananKetidakseimbangan Class padaPrediksiCacat Software,” J. Softw.Eng., Vol. 1, No. 1, Pp. 46–53, 2015.

Issue

Vol. 5, No. 3, August 2020

The Comparison of Imbalanced Data Handling Method in Software Defect Prediction

Corresponding Author(s) : Khadijah Khadijah

Abstract

Keywords

Download Citation

References

Downloads