
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
A Metaheuristic wrapper approach to feature selection with genetic algorithm for enhancing XGBoost classification in diabetes prediction
Corresponding Author(s) : Nur Alamsyah
Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control,
Vol. 10, No. 4, November 2025
Abstract
This study addressed the problem of selecting the most relevant features for improving the accuracy of diabetes classification using health indicator data. The research focused on a binary classification task based on the Behavioral Risk Factor Surveillance System dataset, which comprised over seventy thousand records and twenty-one predictive features related to individual health behaviors and conditions. A metaheuristic wrapper approach was developed by integrating a Genetic Algorithm for feature selection with an XGBoost classifier to evaluate the predictive quality of each feature subset. The fitness function was defined as the average classification accuracy obtained through cross-validation. In addition to feature selection, hyperparameter optimization of the XGBoost model was carried out using a Bayesian-based search strategy to further enhance performance. The proposed method successfully identified a subset of fourteen optimal features that contributed most significantly to the prediction of diabetes. The final model, combining the selected features and optimized parameters, achieved an accuracy of 0.753, outperforming baseline models trained with all features and models using features selected by deterministic methods. These results confirmed the effectiveness of combining evolutionary feature selection with model tuning to build efficient and interpretable predictive models for medical data classification. This approach demonstrated a practical solution for managing high-dimensional data in the context of chronic disease prediction.
Keywords
Download Citation
Endnote/Zotero/Mendeley (RIS)BibTeX
- G. Rajarajeshwari and G. C. Selvi, “Application of artificial intelligence for classification, segmentation, early detection, early diagnosis, and grading of diabetic retinopathy from fundus retinal images: A comprehensive review,” IEEE Access, 2024, doi: https://doi.org/10.1109/ACCESS.2024.3494840.
- Z. Amiri, “Leveraging AI-Enabled Information Systems for Healthcare Management,” J. Comput. Inf. Syst., pp. 1–28, 2024, doi: https://doi.org/10.1080/08874417.2024.2414216.
- R. Legenstein and W. Maass, “Edge of chaos and prediction of computational performance for neural circuit models,” Neural Netw., vol. 20, no. 3, pp. 323–334, 2007, doi: https://doi.org/10.1016/j.neunet.2007.04.017.
- N. Alamsyah, A. P. Kurniati, and others, “Event Detection Optimization Through Stacking Ensemble and BERT Fine-tuning For Dynamic Pricing of Airline Tickets,” IEEE Access, 2024, doi: 10.1109/ACCESS.2024.3466270.
- R. Hasan et al., “Enhancing malware detection with feature selection and scaling techniques using machine learning models,” Sci. Rep., vol. 15, no. 1, p. 9122, 2025, doi: https://doi.org/10.1038/s41598-025-93447-x.
- J. Hamidzadeh, Z. Mehravaran, and A. Harati, “Feature selection by utilizing kernel-based fuzzy rough set and entropy-based non-dominated sorting genetic algorithm in multi-label data,” Knowl. Inf. Syst., pp. 1–31, 2025, doi: https://doi.org/10.1007/s10115-025-02341-5.
- O. Bulut, B. Tan, E. Mazzullo, and A. Syed, “Benchmarking Variants of Recursive Feature Elimination: Insights from Predictive Tasks in Education and Healthcare,” Information, vol. 16, no. 6, p. 476, 2025, doi: https://doi.org/10.3390/info16060476.
- N. Alamsyah, T. P. Yoga, B. Budiman, and others, “IMPROVING TRAFFIC DENSITY PREDICTION USING LSTM WITH PARAMETRIC ReLU (PReLU) ACTIVATION,” JITK J. Ilmu Pengetah. Dan Teknol. Komput., vol. 9, no. 2, pp. 154–160, 2024, doi: https://doi.org/10.33480/jitk.v9i2.5046.
- M. Nazari and H. Saadatfar, “Enhanced instance selection for large-scale data using integrated clustering and autoencoder techniques,” Int. J. Data Sci. Anal., pp. 1–18, 2025, doi: https://doi.org/10.1007/s41060-025-00794-z.
- A. B. Ghorbal, A. Grine, I. Elbatal, E. M. Almetwally, M. M. Eid, and E.-S. M. El-Kenawy, “Predicting carbon dioxide emissions using deep learning and Ninja metaheuristic optimization algorithm,” Sci. Rep., vol. 15, no. 1, p. 4021, 2025, doi: https://doi.org/10.1038/s41598-025-86251-0.
- A. G. Putrada, N. Alamsyah, I. D. Oktaviani, and M. N. Fauzan, “LSTM For Web Visit Forecasting with Genetic Algorithm and Predictive Bandwidth Allocation,” in 2024 International Conference on Information Technology Research and Innovation (ICITRI), IEEE, 2024, pp. 53–58. doi: 10.1109/ICITRI62858.2024.10698840.
- A. Zeinalpour and C. P. McElroy, “Comparative Analysis of Feature Selection Methods in Clustering-Based Detection Methods,” Electronics, vol. 14, no. 11, p. 2119, 2025, doi: https://doi.org/10.3390/electronics14112119.
- C. Zhu, Z. Wang, Y. Peng, and W. Xiao, “An improved Red-billed blue magpie feature selection algorithm for medical data processing,” PLoS One, vol. 20, no. 5, p. e0324866, 2025, doi: https://doi.org/10.1371/journal.pone.0324866.
- A. Roy, P. Saha, N. Gautam, F. Schwenker, and R. Sarkar, “Adaptive genetic algorithm based deep feature selector for cancer detection in lung histopathological images,” Sci. Rep., vol. 15, no. 1, p. 4803, 2025, doi: https://doi.org/10.1038/s41598-025-86362-8.
- N. M. Shahani, X. Zheng, X. Wei, and Y. Wei, “Predicting Elastic Modulus of Rocks Using Metaheuristic-Optimized Ensemble Regression Models,” Rock Mech. Rock Eng., pp. 1–17, 2025, doi: https://doi.org/10.1007/s00603-025-04499-4.
- A. Jha, A. Bhatia, and K. Tiwari, “Bayesian Deep Learning Meets Self-Attention: A Risk-Aware Approach to Advertisement Optimization,” IEEE Access, 2025, doi: 10.1109/ACCESS.2025.3570537.
- K. G. Reddy and D. Mishra, “Advances in Feature Selection Using Memetic Algorithms: A Comprehensive Review,” Wiley Interdiscip. Rev. Data Min. Knowl. Discov., vol. 15, no. 2, p. e70026, 2025, doi: https://doi.org/10.1002/widm.70026.
- N. Alamsyah, A. P. Kurniati, and others, “A novel airfare dataset to predict travel agent profits based on dynamic pricing,” in 2023 11th International Conference on Information and Communication Technology (ICoICT), IEEE, 2023, pp. 575–581. doi: 10.1109/ICoICT58202.2023.10262694.
- E. Hikmawati and N. Alamsyah, “Supervised Learning for Emotional Prediction and Feature Importance Analysis Using SHAP on Social Media User Data.,” Ingénierie Systèmes Inf., vol. 29, no. 6, 2024, doi: 10.18280/isi.290622.
- H. A. Al-Mamun, M. F. Danilevicz, J. I. Marsh, C. Gondro, and D. Edwards, “Exploring genomic feature selection: A comparative analysis of GWAS and machine learning algorithms in a large-scale soybean dataset,” Plant Genome, vol. 18, no. 1, p. e20503, 2025, doi: https://doi.org/10.1002/tpg2.20503.
- K. Yan, C.-F. Lam, S. Fong, J. A. L. Marques, R. C. Millham, and S. Mohammed, “A Novel Improvement of Feature Selection for Dynamic Hand Gesture Identification Based on Double Machine Learning,” Sensors, vol. 25, no. 4, p. 1126, 2025, doi: https://doi.org/10.3390/s25041126.
- N. Alamsyah, V. Restreva Danestiara, B. Budiman, R. Nursyanti, E. Setiana, and A. Hendra, “OPTIMIZED FACEBOOK PROPHET FOR MPOX FORECASTING: ENHANCING PREDICTIVE ACCURACY WITH HYPERPARAMETER TUNING,” J. Techno Nusa Mandiri, vol. 22, no. 1, pp. 90–98, Mar. 2025, doi: 10.33480/techno.v22i1.6507.
- M. Q. Ibrahim, N. K. Hussein, D. Guinovart, and M. Qaraad, “Optimizing Convolutional Neural Networks: A Comprehensive Review of Hyperparameter Tuning Through Metaheuristic Algorithms,” Arch. Comput. Methods Eng., pp. 1–38, 2025, doi: https://doi.org/10.1007/s11831-025-10292-x.
- R. Narayanan and N. Ganesh, “A Comprehensive Review of Metaheuristics for Hyperparameter Optimization in Machine Learning,” Metaheuristics Mach. Learn. Algorithms Appl., pp. 37–72, 2024, doi: https://doi.org/10.1002/9781394233953.ch2.
- D. O. Hassan and B. A. Hassan, “A comprehensive systematic review of machine learning in the retail industry: classifications, limitations, opportunities, and challenges,” Neural Comput. Appl., vol. 37, no. 4, pp. 2035–2070, 2025, doi: https://doi.org/10.1007/s00521-024-10869-w.
- M. Y. Shams, Z. Tarek, and A. M. Elshewey, “A novel RFE-GRU model for diabetes classification using PIMA Indian dataset,” Sci. Rep., vol. 15, no. 1, p. 982, 2025, doi: https://doi.org/10.1038/s41598-024-82420-9.
References
G. Rajarajeshwari and G. C. Selvi, “Application of artificial intelligence for classification, segmentation, early detection, early diagnosis, and grading of diabetic retinopathy from fundus retinal images: A comprehensive review,” IEEE Access, 2024, doi: https://doi.org/10.1109/ACCESS.2024.3494840.
Z. Amiri, “Leveraging AI-Enabled Information Systems for Healthcare Management,” J. Comput. Inf. Syst., pp. 1–28, 2024, doi: https://doi.org/10.1080/08874417.2024.2414216.
R. Legenstein and W. Maass, “Edge of chaos and prediction of computational performance for neural circuit models,” Neural Netw., vol. 20, no. 3, pp. 323–334, 2007, doi: https://doi.org/10.1016/j.neunet.2007.04.017.
N. Alamsyah, A. P. Kurniati, and others, “Event Detection Optimization Through Stacking Ensemble and BERT Fine-tuning For Dynamic Pricing of Airline Tickets,” IEEE Access, 2024, doi: 10.1109/ACCESS.2024.3466270.
R. Hasan et al., “Enhancing malware detection with feature selection and scaling techniques using machine learning models,” Sci. Rep., vol. 15, no. 1, p. 9122, 2025, doi: https://doi.org/10.1038/s41598-025-93447-x.
J. Hamidzadeh, Z. Mehravaran, and A. Harati, “Feature selection by utilizing kernel-based fuzzy rough set and entropy-based non-dominated sorting genetic algorithm in multi-label data,” Knowl. Inf. Syst., pp. 1–31, 2025, doi: https://doi.org/10.1007/s10115-025-02341-5.
O. Bulut, B. Tan, E. Mazzullo, and A. Syed, “Benchmarking Variants of Recursive Feature Elimination: Insights from Predictive Tasks in Education and Healthcare,” Information, vol. 16, no. 6, p. 476, 2025, doi: https://doi.org/10.3390/info16060476.
N. Alamsyah, T. P. Yoga, B. Budiman, and others, “IMPROVING TRAFFIC DENSITY PREDICTION USING LSTM WITH PARAMETRIC ReLU (PReLU) ACTIVATION,” JITK J. Ilmu Pengetah. Dan Teknol. Komput., vol. 9, no. 2, pp. 154–160, 2024, doi: https://doi.org/10.33480/jitk.v9i2.5046.
M. Nazari and H. Saadatfar, “Enhanced instance selection for large-scale data using integrated clustering and autoencoder techniques,” Int. J. Data Sci. Anal., pp. 1–18, 2025, doi: https://doi.org/10.1007/s41060-025-00794-z.
A. B. Ghorbal, A. Grine, I. Elbatal, E. M. Almetwally, M. M. Eid, and E.-S. M. El-Kenawy, “Predicting carbon dioxide emissions using deep learning and Ninja metaheuristic optimization algorithm,” Sci. Rep., vol. 15, no. 1, p. 4021, 2025, doi: https://doi.org/10.1038/s41598-025-86251-0.
A. G. Putrada, N. Alamsyah, I. D. Oktaviani, and M. N. Fauzan, “LSTM For Web Visit Forecasting with Genetic Algorithm and Predictive Bandwidth Allocation,” in 2024 International Conference on Information Technology Research and Innovation (ICITRI), IEEE, 2024, pp. 53–58. doi: 10.1109/ICITRI62858.2024.10698840.
A. Zeinalpour and C. P. McElroy, “Comparative Analysis of Feature Selection Methods in Clustering-Based Detection Methods,” Electronics, vol. 14, no. 11, p. 2119, 2025, doi: https://doi.org/10.3390/electronics14112119.
C. Zhu, Z. Wang, Y. Peng, and W. Xiao, “An improved Red-billed blue magpie feature selection algorithm for medical data processing,” PLoS One, vol. 20, no. 5, p. e0324866, 2025, doi: https://doi.org/10.1371/journal.pone.0324866.
A. Roy, P. Saha, N. Gautam, F. Schwenker, and R. Sarkar, “Adaptive genetic algorithm based deep feature selector for cancer detection in lung histopathological images,” Sci. Rep., vol. 15, no. 1, p. 4803, 2025, doi: https://doi.org/10.1038/s41598-025-86362-8.
N. M. Shahani, X. Zheng, X. Wei, and Y. Wei, “Predicting Elastic Modulus of Rocks Using Metaheuristic-Optimized Ensemble Regression Models,” Rock Mech. Rock Eng., pp. 1–17, 2025, doi: https://doi.org/10.1007/s00603-025-04499-4.
A. Jha, A. Bhatia, and K. Tiwari, “Bayesian Deep Learning Meets Self-Attention: A Risk-Aware Approach to Advertisement Optimization,” IEEE Access, 2025, doi: 10.1109/ACCESS.2025.3570537.
K. G. Reddy and D. Mishra, “Advances in Feature Selection Using Memetic Algorithms: A Comprehensive Review,” Wiley Interdiscip. Rev. Data Min. Knowl. Discov., vol. 15, no. 2, p. e70026, 2025, doi: https://doi.org/10.1002/widm.70026.
N. Alamsyah, A. P. Kurniati, and others, “A novel airfare dataset to predict travel agent profits based on dynamic pricing,” in 2023 11th International Conference on Information and Communication Technology (ICoICT), IEEE, 2023, pp. 575–581. doi: 10.1109/ICoICT58202.2023.10262694.
E. Hikmawati and N. Alamsyah, “Supervised Learning for Emotional Prediction and Feature Importance Analysis Using SHAP on Social Media User Data.,” Ingénierie Systèmes Inf., vol. 29, no. 6, 2024, doi: 10.18280/isi.290622.
H. A. Al-Mamun, M. F. Danilevicz, J. I. Marsh, C. Gondro, and D. Edwards, “Exploring genomic feature selection: A comparative analysis of GWAS and machine learning algorithms in a large-scale soybean dataset,” Plant Genome, vol. 18, no. 1, p. e20503, 2025, doi: https://doi.org/10.1002/tpg2.20503.
K. Yan, C.-F. Lam, S. Fong, J. A. L. Marques, R. C. Millham, and S. Mohammed, “A Novel Improvement of Feature Selection for Dynamic Hand Gesture Identification Based on Double Machine Learning,” Sensors, vol. 25, no. 4, p. 1126, 2025, doi: https://doi.org/10.3390/s25041126.
N. Alamsyah, V. Restreva Danestiara, B. Budiman, R. Nursyanti, E. Setiana, and A. Hendra, “OPTIMIZED FACEBOOK PROPHET FOR MPOX FORECASTING: ENHANCING PREDICTIVE ACCURACY WITH HYPERPARAMETER TUNING,” J. Techno Nusa Mandiri, vol. 22, no. 1, pp. 90–98, Mar. 2025, doi: 10.33480/techno.v22i1.6507.
M. Q. Ibrahim, N. K. Hussein, D. Guinovart, and M. Qaraad, “Optimizing Convolutional Neural Networks: A Comprehensive Review of Hyperparameter Tuning Through Metaheuristic Algorithms,” Arch. Comput. Methods Eng., pp. 1–38, 2025, doi: https://doi.org/10.1007/s11831-025-10292-x.
R. Narayanan and N. Ganesh, “A Comprehensive Review of Metaheuristics for Hyperparameter Optimization in Machine Learning,” Metaheuristics Mach. Learn. Algorithms Appl., pp. 37–72, 2024, doi: https://doi.org/10.1002/9781394233953.ch2.
D. O. Hassan and B. A. Hassan, “A comprehensive systematic review of machine learning in the retail industry: classifications, limitations, opportunities, and challenges,” Neural Comput. Appl., vol. 37, no. 4, pp. 2035–2070, 2025, doi: https://doi.org/10.1007/s00521-024-10869-w.
M. Y. Shams, Z. Tarek, and A. M. Elshewey, “A novel RFE-GRU model for diabetes classification using PIMA Indian dataset,” Sci. Rep., vol. 15, no. 1, p. 982, 2025, doi: https://doi.org/10.1038/s41598-024-82420-9.