This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Improving Software Defect Prediction Using a Combination of Ant Colony Optimization-based Feature Selection and Ensemble Technique
Corresponding Author(s) : Windi Eka Yulia Retnani
Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control,
Vol. 9, No. 4, November 2024
Abstract
Software defect prediction plays a vital role in enhancing software quality and minimizing maintenance costs. This study aims to improve software defect prediction by employing a combination of Ant Colony Optimization (ACO) for feature selection and ensemble techniques, particularly Gradient Boosting. This research utilized three NASA MDP datasets: MC1, KC1, and PC2, to evaluate the performance of four machine learning algorithms: Random Forest, Support Vector Machine (SVM), Decision Tree, and Naïve Bayes. The data preprocessing comprised handling class imbalance using SMOTE and converting categorical data into numerical representations. The results indicate that the integration of ACO and Gradient Boosting significantly enhances the accuracy of all four algorithms. Notably, the Random Forest algorithm achieved the highest accuracy of 99% on the MC1 dataset. The findings suggest that combining ACO-based feature selection with ensemble techniques can effectively boost the performance of software defect prediction models, offering a robust approach for early detection of potential software defects and contributing to improved software reliability and efficiency.
Keywords
Download Citation
Endnote/Zotero/Mendeley (RIS)BibTeX
- N. Hidayati, J. Suntoro, and G. G. Setiaji, “Perbandingan Algoritma Klasifikasi untuk Prediksi Cacat Software dengan Pendekatan CRISP-DM,” Jurnal Sains dan Informatika, vol. 7, no. 2, pp. 117–126, Nov. 2021. https://doi.org/10.34128/jsi.v7i2.313
- Y. Liu, W. Zhang, G. Qin, and J. Zhao, “A comparative study on the effect of data imbalance on software defect prediction,” in Procedia Computer Science, Elsevier B.V., 2022, pp. 1603–1616. https://doi.org/10.1016/j.procs.2022.11.349
- N. Grattan, D. Alencar da Costa, and N. Stanger, “The need for more informative defect prediction: A systematic literature review,” Jul. 01, 2024, Elsevier B.V. https://doi.org/10.1016/j.infsof.2024.107456
- A. Saifudin, D. Romi, and S. Wahono, “Pendekatan Level Data untuk Menangani Ketidakseimbangan Kelas pada Prediksi Cacat Software,” Journal of Software Engineering, vol. 1, no. 2, 2015.
- A. Hardoni, D. P. Rini, and S. Sukemi, “Integrasi SMOTE pada Naive Bayes dan Logistic Regression Berbasis Particle Swarm Optimization untuk Prediksi Cacat Perangkat Lunak,” JURNAL MEDIA INFORMATIKA BUDIDARMA, vol. 5, no. 1, p. 233, Jan. 2021. http://dx.doi.org/10.30865/mib.v5i1.2616
- V. Chauhan, C. Arora, H. Khalajzadeh, and J. Grundy, “How do software practitioners perceive human-centric defects?,” Inf Softw Technol, vol. 176, Dec. 2024. https://doi.org/10.1016/j.infsof.2024.107549
- A. Briciu, G. Czibula, and M. Lupea, “A study on the relevance of semantic features extracted using BERT-based language models for enhancing the performance of software defect classifiers,” Procedia Comput Sci, vol. 225, pp. 1601–1610, 2023. https://doi.org/10.1016/j.procs.2023.10.149
- A. S. Dyer et al., “Applied machine learning model comparison: Predicting offshore platform integrity with gradient boosting algorithms and neural networks,” Marine Structures, vol. 83, May 2022. https://doi.org/10.1016/j.marstruc.2021.103152
- X. Dong, Y. Liang, S. Miyamoto, and S. Yamaguchi, “Ensemble learning based software defect prediction,” Journal of Engineering Research, Nov. 2023. https://doi.org/10.1016/j.jer.2023.10.038
- A. Khalid, G. Badshah, N. Ayub, M. Shiraz, and M. Ghouse, “Software Defect Prediction Analysis Using Machine Learning Techniques,” Sustainability, vol. 15, no. 6, p. 5517, Mar. 2023. https://doi.org/10.3390/su15065517
- Y. Al-Smadi, M. Eshtay, A. Al-Qerem, S. Nashwan, O. Ouda, and A. A. Abd El-Aziz, “Reliable prediction of software defects using Shapley interpretable machine learning models,” Egyptian Informatics Journal, vol. 24, no. 3, Sep. 2023. https://doi.org/10.1016/j.eij.2023.05.011
- S. Manakkadu and S. Dutta, “Ant Colony Optimization based Support Vector Machine for Improved Classification of Unbalanced Datasets,” in Procedia Computer Science, Elsevier B.V., 2024, pp. 586–593. https://doi.org/10.1016/j.procs.2024.05.143
- G. Gumelar, Q. Ain, R. Marsuciati, S. Agustanti Bambang, A. Sunyoto, and M. Syukri Mustafa, “Kombinasi Algoritma Sampling dengan Algoritma Klasifikasi untuk Meningkatkan Performa Klasifikasi Dataset Imbalance,” 2021.
- A. John, I. F. Bin Isnin, S. H. H. Madni, and F. B. Muchtar, “Enhanced intrusion detection model based on principal component analysis and variable ensemble machine learning algorithm,” Intelligent Systems with Applications, vol. 24, Dec. 2024. https://doi.org/10.1016/j.iswa.2024.200442
- M. Athoillah and R. K. Putri, “Handwritten Arabic Numeral Character Recognition Using Multi Kernel Support Vector Machine,” Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control, pp. 99–106, Mar. 2019. https://doi.org/10.22219/kinetik.v4i2.724
- I. Muslim Karo Karo and Hendryana, “Klasifikasi Penderita Diabetes Menggunakan Algoritma Machine Learning dan Z-Score,” Jurnal Teknologi Terpadu , vol. 8 nomor 2, 2022.
- H. Nalatissifa, W. Gata, S. Diantika, and K. Nisa, “Perbandingan Kinerja Algoritma Klasifikasi Naive Bayes, Support Vector Machine (SVM), dan Random Forest untuk Prediksi Ketidakhadiran di Tempat Kerja,” Jurnal Informatika Universitas Pamulang, vol. 5, no. 4, p. 578, Dec. 2021. https://dx.doi.org/10.32493/informatika.v5i4.7575
- M. N. Ahmad, Z. Shao, X. Xiao, P. Fu, A. Javed, and I. Ara, “A novel ensemble learning approach to extract urban impervious surface based on machine learning algorithms using SAR and optical data,” International Journal of Applied Earth Observation and Geoinformation, vol. 132, Aug. 2024. https://doi.org/10.1016/j.jag.2024.104013
- S. Stradowski and L. Madeyski, “Industrial applications of software defect prediction using machine learning: A business-driven systematic literature review,” Jul. 01, 2023, Elsevier B.V. https://doi.org/10.1016/j.infsof.2023.107192
- I. Mehmood et al., “A Novel Approach to Improve Software Defect Prediction Accuracy Using Machine Learning,” IEEE Access, vol. 11, pp. 63579–63597, 2023. https://doi.org/10.1109/ACCESS.2023.3287326
- Y. Chachoui, N. Azizi, R. Hotte, and T. Bensebaa, “Enhancing algorithmic assessment in education: Equi-fused-data-based SMOTE for balanced learning,” Computers and Education: Artificial Intelligence, vol. 6, Jun. 2024. https://doi.org/10.1016/j.caeai.2024.100222
- K. Khadijah and P. S. Sasongko, “The Comparison of Imbalanced Data Handling Method in Software Defect Prediction,” Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control, pp. 203–210, Aug. 2020. https://doi.org/10.22219/kinetik.v5i3.1049
- W. Wu, K. Chen, and E. Tsotsas, “Prediction of rod-like particle mixing in rotary drums by three machine learning methods based on DEM simulation data,” Powder Technol, vol. 448, p. 120307, Dec. 2024. https://doi.org/10.1016/j.powtec.2024.120307
- M. Azhari, Z. Situmorang, and R. Rosnelly, “Perbandingan Akurasi, Recall, dan Presisi Klasifikasi pada Algoritma C4.5, Random Forest, SVM dan Naive Bayes,” JURNAL MEDIA INFORMATIKA BUDIDARMA, vol. 5, no. 2, p. 640, Apr. 2021. http://dx.doi.org/10.30865/mib.v5i2.2937
- L. Hakim, Z. Sari, A. Rizaldy Aristyo, and S. Pangestu, “Optimzing Android Program Malware Classification Using GridSearchCV Optimized Random Forest,” Computer Network, Computing, Electronics, and Control Journal, vol. 9, no. 2, pp. 173–180, 2024.
- P. R. Sihombing and I. F. Yuliati, “Penerapan Metode Machine Learning dalam Klasifikasi Risiko Kejadian Berat Badan Lahir Rendah di Indonesia,” MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer, vol. 20, no. 2, pp. 417–426, May 2021. https://doi.org/10.30812/matrik.v20i2.1174
References
N. Hidayati, J. Suntoro, and G. G. Setiaji, “Perbandingan Algoritma Klasifikasi untuk Prediksi Cacat Software dengan Pendekatan CRISP-DM,” Jurnal Sains dan Informatika, vol. 7, no. 2, pp. 117–126, Nov. 2021. https://doi.org/10.34128/jsi.v7i2.313
Y. Liu, W. Zhang, G. Qin, and J. Zhao, “A comparative study on the effect of data imbalance on software defect prediction,” in Procedia Computer Science, Elsevier B.V., 2022, pp. 1603–1616. https://doi.org/10.1016/j.procs.2022.11.349
N. Grattan, D. Alencar da Costa, and N. Stanger, “The need for more informative defect prediction: A systematic literature review,” Jul. 01, 2024, Elsevier B.V. https://doi.org/10.1016/j.infsof.2024.107456
A. Saifudin, D. Romi, and S. Wahono, “Pendekatan Level Data untuk Menangani Ketidakseimbangan Kelas pada Prediksi Cacat Software,” Journal of Software Engineering, vol. 1, no. 2, 2015.
A. Hardoni, D. P. Rini, and S. Sukemi, “Integrasi SMOTE pada Naive Bayes dan Logistic Regression Berbasis Particle Swarm Optimization untuk Prediksi Cacat Perangkat Lunak,” JURNAL MEDIA INFORMATIKA BUDIDARMA, vol. 5, no. 1, p. 233, Jan. 2021. http://dx.doi.org/10.30865/mib.v5i1.2616
V. Chauhan, C. Arora, H. Khalajzadeh, and J. Grundy, “How do software practitioners perceive human-centric defects?,” Inf Softw Technol, vol. 176, Dec. 2024. https://doi.org/10.1016/j.infsof.2024.107549
A. Briciu, G. Czibula, and M. Lupea, “A study on the relevance of semantic features extracted using BERT-based language models for enhancing the performance of software defect classifiers,” Procedia Comput Sci, vol. 225, pp. 1601–1610, 2023. https://doi.org/10.1016/j.procs.2023.10.149
A. S. Dyer et al., “Applied machine learning model comparison: Predicting offshore platform integrity with gradient boosting algorithms and neural networks,” Marine Structures, vol. 83, May 2022. https://doi.org/10.1016/j.marstruc.2021.103152
X. Dong, Y. Liang, S. Miyamoto, and S. Yamaguchi, “Ensemble learning based software defect prediction,” Journal of Engineering Research, Nov. 2023. https://doi.org/10.1016/j.jer.2023.10.038
A. Khalid, G. Badshah, N. Ayub, M. Shiraz, and M. Ghouse, “Software Defect Prediction Analysis Using Machine Learning Techniques,” Sustainability, vol. 15, no. 6, p. 5517, Mar. 2023. https://doi.org/10.3390/su15065517
Y. Al-Smadi, M. Eshtay, A. Al-Qerem, S. Nashwan, O. Ouda, and A. A. Abd El-Aziz, “Reliable prediction of software defects using Shapley interpretable machine learning models,” Egyptian Informatics Journal, vol. 24, no. 3, Sep. 2023. https://doi.org/10.1016/j.eij.2023.05.011
S. Manakkadu and S. Dutta, “Ant Colony Optimization based Support Vector Machine for Improved Classification of Unbalanced Datasets,” in Procedia Computer Science, Elsevier B.V., 2024, pp. 586–593. https://doi.org/10.1016/j.procs.2024.05.143
G. Gumelar, Q. Ain, R. Marsuciati, S. Agustanti Bambang, A. Sunyoto, and M. Syukri Mustafa, “Kombinasi Algoritma Sampling dengan Algoritma Klasifikasi untuk Meningkatkan Performa Klasifikasi Dataset Imbalance,” 2021.
A. John, I. F. Bin Isnin, S. H. H. Madni, and F. B. Muchtar, “Enhanced intrusion detection model based on principal component analysis and variable ensemble machine learning algorithm,” Intelligent Systems with Applications, vol. 24, Dec. 2024. https://doi.org/10.1016/j.iswa.2024.200442
M. Athoillah and R. K. Putri, “Handwritten Arabic Numeral Character Recognition Using Multi Kernel Support Vector Machine,” Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control, pp. 99–106, Mar. 2019. https://doi.org/10.22219/kinetik.v4i2.724
I. Muslim Karo Karo and Hendryana, “Klasifikasi Penderita Diabetes Menggunakan Algoritma Machine Learning dan Z-Score,” Jurnal Teknologi Terpadu , vol. 8 nomor 2, 2022.
H. Nalatissifa, W. Gata, S. Diantika, and K. Nisa, “Perbandingan Kinerja Algoritma Klasifikasi Naive Bayes, Support Vector Machine (SVM), dan Random Forest untuk Prediksi Ketidakhadiran di Tempat Kerja,” Jurnal Informatika Universitas Pamulang, vol. 5, no. 4, p. 578, Dec. 2021. https://dx.doi.org/10.32493/informatika.v5i4.7575
M. N. Ahmad, Z. Shao, X. Xiao, P. Fu, A. Javed, and I. Ara, “A novel ensemble learning approach to extract urban impervious surface based on machine learning algorithms using SAR and optical data,” International Journal of Applied Earth Observation and Geoinformation, vol. 132, Aug. 2024. https://doi.org/10.1016/j.jag.2024.104013
S. Stradowski and L. Madeyski, “Industrial applications of software defect prediction using machine learning: A business-driven systematic literature review,” Jul. 01, 2023, Elsevier B.V. https://doi.org/10.1016/j.infsof.2023.107192
I. Mehmood et al., “A Novel Approach to Improve Software Defect Prediction Accuracy Using Machine Learning,” IEEE Access, vol. 11, pp. 63579–63597, 2023. https://doi.org/10.1109/ACCESS.2023.3287326
Y. Chachoui, N. Azizi, R. Hotte, and T. Bensebaa, “Enhancing algorithmic assessment in education: Equi-fused-data-based SMOTE for balanced learning,” Computers and Education: Artificial Intelligence, vol. 6, Jun. 2024. https://doi.org/10.1016/j.caeai.2024.100222
K. Khadijah and P. S. Sasongko, “The Comparison of Imbalanced Data Handling Method in Software Defect Prediction,” Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control, pp. 203–210, Aug. 2020. https://doi.org/10.22219/kinetik.v5i3.1049
W. Wu, K. Chen, and E. Tsotsas, “Prediction of rod-like particle mixing in rotary drums by three machine learning methods based on DEM simulation data,” Powder Technol, vol. 448, p. 120307, Dec. 2024. https://doi.org/10.1016/j.powtec.2024.120307
M. Azhari, Z. Situmorang, and R. Rosnelly, “Perbandingan Akurasi, Recall, dan Presisi Klasifikasi pada Algoritma C4.5, Random Forest, SVM dan Naive Bayes,” JURNAL MEDIA INFORMATIKA BUDIDARMA, vol. 5, no. 2, p. 640, Apr. 2021. http://dx.doi.org/10.30865/mib.v5i2.2937
L. Hakim, Z. Sari, A. Rizaldy Aristyo, and S. Pangestu, “Optimzing Android Program Malware Classification Using GridSearchCV Optimized Random Forest,” Computer Network, Computing, Electronics, and Control Journal, vol. 9, no. 2, pp. 173–180, 2024.
P. R. Sihombing and I. F. Yuliati, “Penerapan Metode Machine Learning dalam Klasifikasi Risiko Kejadian Berat Badan Lahir Rendah di Indonesia,” MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer, vol. 20, no. 2, pp. 417–426, May 2021. https://doi.org/10.30812/matrik.v20i2.1174