This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Improving Software Defect Prediction With a Combination of Feature Selection Based On Ant Colony Optimization and Ensemble Technique
Corresponding Author(s) : Windi Eka Yulia Retnani
Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control,
Vol. 9, No. 4, November 2024 (Article in Progress)
Abstract
Software defect prediction plays a vital role in enhancing
software quality and minimizing maintenance costs. This
study aims to improve software defect prediction by
employing a combination of Ant Colony Optimization (ACO)
for feature selection and ensemble techniques, particularly
Gradient Boosting. The research utilizes three NASA MDP
datasets: MC1, KC1, and PC2, to evaluate the performance
of four machine learning algorithms: Random Forest,
Support Vector Machine (SVM), Decision Tree, and Naïve
Bayes. Data preprocessing involved handling class
imbalances using the SMOTE technique and transforming
categorical data into numerical representations. The results
indicate that the integration of ACO and Gradient Boosting
significantly enhances the accuracy of all four algorithms.
Notably, the Random Forest algorithm achieved the
highest accuracy of 99% on the MC1 dataset. The findings
suggest that combining ACO-based feature selection with
ensemble techniques can effectively boost the
performance of software defect prediction models, offering
a robust approach for early detection of potential software
defects and contributing to improved software reliability and
efficiency.
Keywords
Download Citation
Endnote/Zotero/Mendeley (RIS)BibTeX
- Al-Smadi, Y., Eshtay, M., Al-Qerem, A., Nashwan, S., Ouda, O., & Abd El-Aziz, A. A. (2023).
- Reliable prediction of software defects using Shapley interpretable machine learning
- models. Egyptian Informatics Journal, 24(3). https://doi.org/10.1016/j.eij.2023.05.011
- Athoillah, M., & Putri, R. K. (2019). Handwritten Arabic Numeral Character Recognition Using
- Multi Kernel Support Vector Machine. Kinetik: Game Technology, Information System,
- Computer Network, Computing, Electronics, and Control, 99–106.
- https://doi.org/10.22219/kinetik.v4i2.724
- Azhari, M., Situmorang, Z., & Rosnelly, R. (2021). Perbandingan Akurasi, Recall, dan Presisi
- Klasifikasi pada Algoritma C4.5, Random Forest, SVM dan Naive Bayes. JURNAL MEDIA
- INFORMATIKA BUDIDARMA, 5(2), 640. https://doi.org/10.30865/mib.v5i2.2937
- Briciu, A., Czibula, G., & Lupea, M. (2023). A study on the relevance of semantic features
- extracted using BERT-based language models for enhancing the performance of software
- defect classifiers. Procedia Computer Science, 225, 1601–1610.
- https://doi.org/10.1016/j.procs.2023.10.149
- Dong, X., Liang, Y., Miyamoto, S., & Yamaguchi, S. (2023). Ensemble learning based software
- defect prediction. Journal of Engineering Research.
- https://doi.org/10.1016/j.jer.2023.10.038
- Gumelar, G., Ain, Q., Marsuciati, R., Agustanti Bambang, S., Sunyoto, A., & Syukri Mustafa, M.
- (2021). Kombinasi Algoritma Sampling dengan Algoritma Klasifikasi untuk Meningkatkan
- Performa Klasifikasi Dataset Imbalance.
- Hakim, L., Sari, Z., Rizaldy Aristyo, A., & Pangestu, S. (2024). Optimzing Android Program
- Malware Classification Using GridSearchCV Optimized Random Forest. Computer Network,
- Computing, Electronics, and Control Journal, 9(2), 173–180.
- Hardoni, A., Rini, D. P., & Sukemi, S. (2021). Integrasi SMOTE pada Naive Bayes dan Logistic
- Regression Berbasis Particle Swarm Optimization untuk Prediksi Cacat Perangkat Lunak.
- JURNAL MEDIA INFORMATIKA BUDIDARMA, 5(1), 233.
- https://doi.org/10.30865/mib.v5i1.2616
- Hidayati, N., Suntoro, J., & Setiaji, G. G. (2021). Perbandingan Algoritma Klasifikasi untuk
- Prediksi Cacat Software dengan Pendekatan CRISP-DM. Jurnal Sains Dan Informatika,
- (2), 117–126. https://doi.org/10.34128/jsi.v7i2.313
- Khadijah, K., & Sasongko, P. S. (2020). The Comparison of Imbalanced Data Handling Method
- in Software Defect Prediction. Kinetik: Game Technology, Information System, Computer
- Network, Computing, Electronics, and Control, 203–210.
- https://doi.org/10.22219/kinetik.v5i3.1049
- Khalid, A., Badshah, G., Ayub, N., Shiraz, M., & Ghouse, M. (2023). Software Defect Prediction
- Analysis Using Machine Learning Techniques. Sustainability, 15(6), 5517.
- https://doi.org/10.3390/su15065517
- Liu, Y., Zhang, W., Qin, G., & Zhao, J. (2022). A comparative study on the effect of data imbalance
- on software defect prediction. Procedia Computer Science, 214(C), 1603–1616.
- https://doi.org/10.1016/j.procs.2022.11.349
- Mehmood, I., Shahid, S., Hussain, H., Khan, I., Ahmad, S., Rahman, S., Ullah, N., & Huda, S.
- (2023). A Novel Approach to Improve Software Defect Prediction Accuracy Using Machine
- Learning. IEEE Access, 11, 63579–63597. https://doi.org/10.1109/ACCESS.2023.3287326
- Muslim Karo Karo, I., & Hendryana. (2022). KLASIFIKASI PENDERITA DIABETES
- MENGGUNAKAN ALGORITMA MACHINE LEARNING DAN Z-SCORE. Jurnal Teknologi
- Terpadu , 8 nomor 2.
- Nalatissifa, H., Gata, W., Diantika, S., & Nisa, K. (2021). Perbandingan Kinerja Algoritma
- Klasifikasi Naive Bayes, Support Vector Machine (SVM), dan Random Forest untuk Prediksi
- Ketidakhadiran di Tempat Kerja. Jurnal Informatika Universitas Pamulang, 5(4), 578.
- https://doi.org/10.32493/informatika.v5i4.7575
- Saifudin, A., Romi, D., & Wahono, S. (2015). Pendekatan Level Data untuk Menangani
- Ketidakseimbangan Kelas pada Prediksi Cacat Software. Journal of Software Engineering,
- (2). http://journal.ilmukomputer.org
- Sihombing, P. R., & Yuliati, I. F. (2021). Penerapan Metode Machine Learning dalam Klasifikasi
- Risiko Kejadian Berat Badan Lahir Rendah di Indonesia. MATRIK : Jurnal Manajemen,
- Teknik Informatika Dan Rekayasa Komputer, 20(2), 417–426.
- https://doi.org/10.30812/matrik.v20i2.1174
- Stradowski, S., & Madeyski, L. (2023). Industrial applications of software defect prediction using
- machine learning: A business-driven systematic literature review. In Information and
- Software Technology (Vol. 159). Elsevier B.V. https://doi.org/10.1016/j.infsof.2023.107192
References
Al-Smadi, Y., Eshtay, M., Al-Qerem, A., Nashwan, S., Ouda, O., & Abd El-Aziz, A. A. (2023).
Reliable prediction of software defects using Shapley interpretable machine learning
models. Egyptian Informatics Journal, 24(3). https://doi.org/10.1016/j.eij.2023.05.011
Athoillah, M., & Putri, R. K. (2019). Handwritten Arabic Numeral Character Recognition Using
Multi Kernel Support Vector Machine. Kinetik: Game Technology, Information System,
Computer Network, Computing, Electronics, and Control, 99–106.
https://doi.org/10.22219/kinetik.v4i2.724
Azhari, M., Situmorang, Z., & Rosnelly, R. (2021). Perbandingan Akurasi, Recall, dan Presisi
Klasifikasi pada Algoritma C4.5, Random Forest, SVM dan Naive Bayes. JURNAL MEDIA
INFORMATIKA BUDIDARMA, 5(2), 640. https://doi.org/10.30865/mib.v5i2.2937
Briciu, A., Czibula, G., & Lupea, M. (2023). A study on the relevance of semantic features
extracted using BERT-based language models for enhancing the performance of software
defect classifiers. Procedia Computer Science, 225, 1601–1610.
https://doi.org/10.1016/j.procs.2023.10.149
Dong, X., Liang, Y., Miyamoto, S., & Yamaguchi, S. (2023). Ensemble learning based software
defect prediction. Journal of Engineering Research.
https://doi.org/10.1016/j.jer.2023.10.038
Gumelar, G., Ain, Q., Marsuciati, R., Agustanti Bambang, S., Sunyoto, A., & Syukri Mustafa, M.
(2021). Kombinasi Algoritma Sampling dengan Algoritma Klasifikasi untuk Meningkatkan
Performa Klasifikasi Dataset Imbalance.
Hakim, L., Sari, Z., Rizaldy Aristyo, A., & Pangestu, S. (2024). Optimzing Android Program
Malware Classification Using GridSearchCV Optimized Random Forest. Computer Network,
Computing, Electronics, and Control Journal, 9(2), 173–180.
Hardoni, A., Rini, D. P., & Sukemi, S. (2021). Integrasi SMOTE pada Naive Bayes dan Logistic
Regression Berbasis Particle Swarm Optimization untuk Prediksi Cacat Perangkat Lunak.
JURNAL MEDIA INFORMATIKA BUDIDARMA, 5(1), 233.
https://doi.org/10.30865/mib.v5i1.2616
Hidayati, N., Suntoro, J., & Setiaji, G. G. (2021). Perbandingan Algoritma Klasifikasi untuk
Prediksi Cacat Software dengan Pendekatan CRISP-DM. Jurnal Sains Dan Informatika,
(2), 117–126. https://doi.org/10.34128/jsi.v7i2.313
Khadijah, K., & Sasongko, P. S. (2020). The Comparison of Imbalanced Data Handling Method
in Software Defect Prediction. Kinetik: Game Technology, Information System, Computer
Network, Computing, Electronics, and Control, 203–210.
https://doi.org/10.22219/kinetik.v5i3.1049
Khalid, A., Badshah, G., Ayub, N., Shiraz, M., & Ghouse, M. (2023). Software Defect Prediction
Analysis Using Machine Learning Techniques. Sustainability, 15(6), 5517.
https://doi.org/10.3390/su15065517
Liu, Y., Zhang, W., Qin, G., & Zhao, J. (2022). A comparative study on the effect of data imbalance
on software defect prediction. Procedia Computer Science, 214(C), 1603–1616.
https://doi.org/10.1016/j.procs.2022.11.349
Mehmood, I., Shahid, S., Hussain, H., Khan, I., Ahmad, S., Rahman, S., Ullah, N., & Huda, S.
(2023). A Novel Approach to Improve Software Defect Prediction Accuracy Using Machine
Learning. IEEE Access, 11, 63579–63597. https://doi.org/10.1109/ACCESS.2023.3287326
Muslim Karo Karo, I., & Hendryana. (2022). KLASIFIKASI PENDERITA DIABETES
MENGGUNAKAN ALGORITMA MACHINE LEARNING DAN Z-SCORE. Jurnal Teknologi
Terpadu , 8 nomor 2.
Nalatissifa, H., Gata, W., Diantika, S., & Nisa, K. (2021). Perbandingan Kinerja Algoritma
Klasifikasi Naive Bayes, Support Vector Machine (SVM), dan Random Forest untuk Prediksi
Ketidakhadiran di Tempat Kerja. Jurnal Informatika Universitas Pamulang, 5(4), 578.
https://doi.org/10.32493/informatika.v5i4.7575
Saifudin, A., Romi, D., & Wahono, S. (2015). Pendekatan Level Data untuk Menangani
Ketidakseimbangan Kelas pada Prediksi Cacat Software. Journal of Software Engineering,
(2). http://journal.ilmukomputer.org
Sihombing, P. R., & Yuliati, I. F. (2021). Penerapan Metode Machine Learning dalam Klasifikasi
Risiko Kejadian Berat Badan Lahir Rendah di Indonesia. MATRIK : Jurnal Manajemen,
Teknik Informatika Dan Rekayasa Komputer, 20(2), 417–426.
https://doi.org/10.30812/matrik.v20i2.1174
Stradowski, S., & Madeyski, L. (2023). Industrial applications of software defect prediction using
machine learning: A business-driven systematic literature review. In Information and
Software Technology (Vol. 159). Elsevier B.V. https://doi.org/10.1016/j.infsof.2023.107192