This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Enhancing Accuracy on Chronic-Kidney Disease Detection Using Machine Learning with Technique of Resampling and Missing Value Treatment
Corresponding Author(s) : Muhammad Raihan Wibowo
Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control,
Vol. 8, No. 4, November 2023
Abstract
Chronic kidney disease is one of the deadliest diseases in the world. It is important to identify chronic kidney disease at an early stage, so that treatment and prevention can be carried out early. This study used linear interpolation method to treat the missing values, resampling using SMOTE method, and several feature selection methods, such as Pearson’s correlation coefficient and Principal component analysis. For the classification methods, Support Vector Machine and Logistic Regression were used to build prediction models for chronic kidney disease based on dataset on UCI Machine Learning. To measure the performance of the model, several test scenarios were tested out so it can be compared to the previous research on the detection of chronic kidney disease, which is used as a benchmark for this study. The best result from the experiment is obtained from the scenario of resampling using SMOTE and feature selection using Principal Component Analysis with averaged accuracy, precision, and f1-score respectively are 98,8%, 100%, dan 98,77%.
Keywords
Download Citation
Endnote/Zotero/Mendeley (RIS)BibTeX
- Y. Amirgaliyev, S. Shamuluulu, and Serek Azamat, “Analysis of Chronic Kidney Disease Dataset by Applying Machine Learning Methods,” IEEE, 2018. https://doi.org/10.1109/ICAICT.2018.8747140
- C. K. Leung et al., “Data science for healthcare predictive analytics,” in ACM International Conference Proceeding Series, Association for Computing Machinery, Aug. 2020. https://doi.org/10.1145/3410566.3410598
- G. D. Kalyankar, S. R. Poojara, and N. V. Dharwadkar, Predictive analysis of diabetic patient data using machine learning and Hadoop. I-SMAC, 2017. https://doi.org/10.1109/I-SMAC.2017.8058253
- K. Deepika and S. Seema, Predictive analytics to prevent and control chronic diseases. IEEE, 2016. https://doi.org/10.1109/ICATCCT.2016.7912028
- A. Maurya, R. Wable, R. Shinde, S. John, R. Jadhav, and R. Dakshayani, Chronic Kidney Disease Prediction and Recommendation of Suitable Diet Plan by using Machine Learning. 2019. 10.1109/icnte44896.2019.8946029
- F. A. N. Masruriyah, H. H. Handayani, T. Djatna, D. Wahiddin, and K. M. D. Hardhienata, “Predictive Analytics For Stroke Disease,” 2019. https://doi.org/10.1109/ICIC47613.2019.8985716
- I. U. Ekanayake and D. Herath, Chronic Kidney Disease Prediction Using Machine Learning Methods. 2020. https://doi.org/10.1109/MERCon50084.2020.9185249
- S. Pal, “Prediction for chronic kidney disease by categorical and non_categorical attributes using different machine learning algorithms,” Multimed Tools Appl, 2023. https://doi.org/10.1007/s11042-023-15188-1
- D. A. Debal and T. M. Sitote, “Chronic kidney disease prediction using machine learning techniques,” J Big Data, vol. 9, no. 1, Dec. 2022. https://doi.org/10.1186/s40537-022-00657-5
- G. M. Ifraz, M. H. Rashid, T. Tazin, S. Bourouis, and M. M. Khan, “Comparative Analysis for Prediction of Kidney Disease Using Intelligent Machine Learning Methods,” Comput Math Methods Med, vol. 2021, 2021. https://doi.org/10.1155/2021/6141470
- A. Charleonnan, T. Fufaung, T. Niyomwong, W. Chokchueypattanakit, S. Suwannawach, and N. Ninchawee, Predictive Analytics for Chronic Kidney Disease Using Machine Learning Techniques. IEEE, 2016. https://doi.org/10.1109/MITICON.2016.8025242
- L. Rubini, P. Soundarapandian, and P. Eswaran, “Chronic_Kidney_Disease,” UCI Machine Learning Repository.
- G. Huang, “Missing data filling method based on linear interpolation and lightgbm,” in Journal of Physics: Conference Series, IOP Publishing Ltd, Feb. 2021. https://doi.org/10.1088/1742-6596/1754/1/012187
- D. Thera, S. H. Sitorus, and D. M. Midyanti, “Penerapan Metode Interpolasi Linear dan Histogram Equalization Untuk Perbesaran dan Perbaikan Citra,” Coding : Jurnal Komputer dan Aplikasi, vol. 08, 2020.
- M. Tahir, F. Khan, M. K. I. Rahmani, and V. T. Hoang, “Discrimination of golgi proteins through efficient exploitation of hybrid feature spaces coupled with smote and ensemble of support vector machine,” IEEE Access, vol. 8, pp. 206028–206038, 2020. https://doi.org/10.1109/ACCESS.2020.3037343
- A. Ishaq et al., “Improving the Prediction of Heart Failure Patients’ Survival Using SMOTE and Effective Data Mining Techniques,” IEEE Access, vol. 9, pp. 39707–39716, 2021. https://doi.org/10.1109/ACCESS.2021.3064084
- R. Gupta, N. Koli, N. Mahor, and N. Tejashri, Performance Analysis of Machine Learning Classifier for Predicting Chronic Kidney Disease. International Conference for Emerging Technology (INCET), 2020. https://doi.org/10.1109/INCET49848.2020.9154147
- C. M. Feng, Y. Xu, J. X. Liu, Y. L. Gao, and C. H. Zheng, “Supervised Discriminative Sparse PCA for Com-Characteristic Gene Selection and Tumor Classification on Multiview Biological Data,” IEEE Trans Neural Netw Learn Syst, vol. 30, no. 10, pp. 2926–2937, Oct. 2019. https://doi.org/10.1109/TNNLS.2019.2893190
- H. Kwon, W. Q. Malik, S. B. Rutkove, and B. Sanchez, “Separation of Subcutaneous Fat from Muscle in Surface Electrical Impedance Myography Measurements Using Model Component Analysis,” IEEE Trans Biomed Eng, vol. 66, no. 2, pp. 354–364, Feb. 2019. https://doi.org/10.1109/TBME.2018.2839977
- A. K. Chaudhuri and A. Das, “Variable Selection in Genetic Algorithm Model with Logistic Regression for Prediction of Progression to Diseases,” in 2020 IEEE International Conference for Innovation in Technology, INOCON 2020, Institute of Electrical and Electronics Engineers Inc., Nov. 2020. http://dx.doi.org/10.1109/INOCON50539.2020.9298372
- S. H. Adil, M. Ebrahim, K. Raza, and M. A. Hashmani, “Liver Patient Classification using Logistic Regression,” 2018. https://doi.org/10.1109/ICCOINS.2018.8510581
- P. Chittora et al., “Prediction of Chronic Kidney Disease - A Machine Learning Perspective,” IEEE Access, vol. 9. Institute of Electrical and Electronics Engineers Inc., pp. 17312–17334, 2021. https://doi.org/10.1109/ACCESS.2021.3053763
- H. Alshamlan, H. Bin Taleb, and A. Al Sahow, “A Gene Prediction Function for Type 2 Diabetes Mellitus using Logistic Regression,” in 2020 11th International Conference on Information and Communication Systems, ICICS 2020, Institute of Electrical and Electronics Engineers Inc., Apr. 2020, pp. 38–41. https://doi.org/10.1109/ICICS49469.2020.239549
- R. G. Brereton and G. R. Lloyd, “Support Vector Machines for classification and regression,” Analyst, vol. 135, no. 2. Royal Society of Chemistry, pp. 230–267, 2010. https://doi.org/10.1039/b918972f
- O. Karal, “Performance comparison of different kernel functions in SVM for different k value in k-fold cross-validation,” in Proceedings - 2020 Innovations in Intelligent Systems and Applications Conference, ASYU 2020, Institute of Electrical and Electronics Engineers Inc., Oct. 2020. https://doi.org/10.1109/ASYU50717.2020.9259880
References
Y. Amirgaliyev, S. Shamuluulu, and Serek Azamat, “Analysis of Chronic Kidney Disease Dataset by Applying Machine Learning Methods,” IEEE, 2018. https://doi.org/10.1109/ICAICT.2018.8747140
C. K. Leung et al., “Data science for healthcare predictive analytics,” in ACM International Conference Proceeding Series, Association for Computing Machinery, Aug. 2020. https://doi.org/10.1145/3410566.3410598
G. D. Kalyankar, S. R. Poojara, and N. V. Dharwadkar, Predictive analysis of diabetic patient data using machine learning and Hadoop. I-SMAC, 2017. https://doi.org/10.1109/I-SMAC.2017.8058253
K. Deepika and S. Seema, Predictive analytics to prevent and control chronic diseases. IEEE, 2016. https://doi.org/10.1109/ICATCCT.2016.7912028
A. Maurya, R. Wable, R. Shinde, S. John, R. Jadhav, and R. Dakshayani, Chronic Kidney Disease Prediction and Recommendation of Suitable Diet Plan by using Machine Learning. 2019. 10.1109/icnte44896.2019.8946029
F. A. N. Masruriyah, H. H. Handayani, T. Djatna, D. Wahiddin, and K. M. D. Hardhienata, “Predictive Analytics For Stroke Disease,” 2019. https://doi.org/10.1109/ICIC47613.2019.8985716
I. U. Ekanayake and D. Herath, Chronic Kidney Disease Prediction Using Machine Learning Methods. 2020. https://doi.org/10.1109/MERCon50084.2020.9185249
S. Pal, “Prediction for chronic kidney disease by categorical and non_categorical attributes using different machine learning algorithms,” Multimed Tools Appl, 2023. https://doi.org/10.1007/s11042-023-15188-1
D. A. Debal and T. M. Sitote, “Chronic kidney disease prediction using machine learning techniques,” J Big Data, vol. 9, no. 1, Dec. 2022. https://doi.org/10.1186/s40537-022-00657-5
G. M. Ifraz, M. H. Rashid, T. Tazin, S. Bourouis, and M. M. Khan, “Comparative Analysis for Prediction of Kidney Disease Using Intelligent Machine Learning Methods,” Comput Math Methods Med, vol. 2021, 2021. https://doi.org/10.1155/2021/6141470
A. Charleonnan, T. Fufaung, T. Niyomwong, W. Chokchueypattanakit, S. Suwannawach, and N. Ninchawee, Predictive Analytics for Chronic Kidney Disease Using Machine Learning Techniques. IEEE, 2016. https://doi.org/10.1109/MITICON.2016.8025242
L. Rubini, P. Soundarapandian, and P. Eswaran, “Chronic_Kidney_Disease,” UCI Machine Learning Repository.
G. Huang, “Missing data filling method based on linear interpolation and lightgbm,” in Journal of Physics: Conference Series, IOP Publishing Ltd, Feb. 2021. https://doi.org/10.1088/1742-6596/1754/1/012187
D. Thera, S. H. Sitorus, and D. M. Midyanti, “Penerapan Metode Interpolasi Linear dan Histogram Equalization Untuk Perbesaran dan Perbaikan Citra,” Coding : Jurnal Komputer dan Aplikasi, vol. 08, 2020.
M. Tahir, F. Khan, M. K. I. Rahmani, and V. T. Hoang, “Discrimination of golgi proteins through efficient exploitation of hybrid feature spaces coupled with smote and ensemble of support vector machine,” IEEE Access, vol. 8, pp. 206028–206038, 2020. https://doi.org/10.1109/ACCESS.2020.3037343
A. Ishaq et al., “Improving the Prediction of Heart Failure Patients’ Survival Using SMOTE and Effective Data Mining Techniques,” IEEE Access, vol. 9, pp. 39707–39716, 2021. https://doi.org/10.1109/ACCESS.2021.3064084
R. Gupta, N. Koli, N. Mahor, and N. Tejashri, Performance Analysis of Machine Learning Classifier for Predicting Chronic Kidney Disease. International Conference for Emerging Technology (INCET), 2020. https://doi.org/10.1109/INCET49848.2020.9154147
C. M. Feng, Y. Xu, J. X. Liu, Y. L. Gao, and C. H. Zheng, “Supervised Discriminative Sparse PCA for Com-Characteristic Gene Selection and Tumor Classification on Multiview Biological Data,” IEEE Trans Neural Netw Learn Syst, vol. 30, no. 10, pp. 2926–2937, Oct. 2019. https://doi.org/10.1109/TNNLS.2019.2893190
H. Kwon, W. Q. Malik, S. B. Rutkove, and B. Sanchez, “Separation of Subcutaneous Fat from Muscle in Surface Electrical Impedance Myography Measurements Using Model Component Analysis,” IEEE Trans Biomed Eng, vol. 66, no. 2, pp. 354–364, Feb. 2019. https://doi.org/10.1109/TBME.2018.2839977
A. K. Chaudhuri and A. Das, “Variable Selection in Genetic Algorithm Model with Logistic Regression for Prediction of Progression to Diseases,” in 2020 IEEE International Conference for Innovation in Technology, INOCON 2020, Institute of Electrical and Electronics Engineers Inc., Nov. 2020. http://dx.doi.org/10.1109/INOCON50539.2020.9298372
S. H. Adil, M. Ebrahim, K. Raza, and M. A. Hashmani, “Liver Patient Classification using Logistic Regression,” 2018. https://doi.org/10.1109/ICCOINS.2018.8510581
P. Chittora et al., “Prediction of Chronic Kidney Disease - A Machine Learning Perspective,” IEEE Access, vol. 9. Institute of Electrical and Electronics Engineers Inc., pp. 17312–17334, 2021. https://doi.org/10.1109/ACCESS.2021.3053763
H. Alshamlan, H. Bin Taleb, and A. Al Sahow, “A Gene Prediction Function for Type 2 Diabetes Mellitus using Logistic Regression,” in 2020 11th International Conference on Information and Communication Systems, ICICS 2020, Institute of Electrical and Electronics Engineers Inc., Apr. 2020, pp. 38–41. https://doi.org/10.1109/ICICS49469.2020.239549
R. G. Brereton and G. R. Lloyd, “Support Vector Machines for classification and regression,” Analyst, vol. 135, no. 2. Royal Society of Chemistry, pp. 230–267, 2010. https://doi.org/10.1039/b918972f
O. Karal, “Performance comparison of different kernel functions in SVM for different k value in k-fold cross-validation,” in Proceedings - 2020 Innovations in Intelligent Systems and Applications Conference, ASYU 2020, Institute of Electrical and Electronics Engineers Inc., Oct. 2020. https://doi.org/10.1109/ASYU50717.2020.9259880