Quick jump to page content
  • Main Navigation
  • Main Content
  • Sidebar

  • Home
  • Current
  • Archives
  • Join As Reviewer
  • Info
  • Announcements
  • Statistics
  • About
    • About the Journal
    • Submissions
    • Editorial Team
    • Privacy Statement
    • Contact
  • Register
  • Login
  • Home
  • Current
  • Archives
  • Join As Reviewer
  • Info
  • Announcements
  • Statistics
  • About
    • About the Journal
    • Submissions
    • Editorial Team
    • Privacy Statement
    • Contact
  1. Home
  2. Archives
  3. Vol. 11, No. 3, August 2026 (Article in Progress)
  4. Articles

Issue

Vol. 11, No. 3, August 2026 (Article in Progress)

Issue Published : Jun 4, 2026
Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

A Data-Driven Framework Integrating Clustering and Classification for Fair Tuition Grouping (UKT) Prediction

https://doi.org/10.22219/kinetik.v11i3.2578
Windy Chikita Cornia Putri
Universitas Negeri Surabaya
Wiyli Yustanti
Universitas Negeri Surabaya
Ervin Yohannes
Universitas Negeri Surabaya
Yoyok Prastyo
Universitas Negeri Surabaya

Corresponding Author(s) : Windy Chikita Cornia Putri

windychikita@unesa.ac.id

Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control, Vol. 11, No. 3, August 2026 (Article in Progress)
Article Published : Jun 7, 2026

Share
WA Share on Facebook Share on Twitter Pinterest Email Telegram
  • Abstract
  • Cite
  • References
  • Authors Details

Abstract

This study aims to identify the most effective combination of feature selection techniques and classification algorithms for predicting student tuition groups (Uang Kuliah Tunggal, UKT) based on pre-admission data. Three feature selection methods Exploratory Factor Analysis (EFA), Recursive Feature Elimination (RFE), and Random Forest Feature Importance (RFFI) were employed and combined with five supervised learning models: Decision Tree, Random Forest, Support Vector Machine (SVM) with RBF kernel, Naïve Bayes, and K-Nearest Neighbor (KNN). The results demonstrate that the EFA–SVM (RBF) combination achieved the best performance, with an average accuracy exceeding 98%, outperforming other models across most faculties. EFA also yielded the highest Silhouette Score (0.2933), indicating a more stable and distinct cluster structure compared to RFE (0.2564) and RFFI (0.2575). These findings highlight the critical role of appropriate feature selection in improving classification accuracy and model generalization, particularly when emphasizing socioeconomic variables such as parental income, land area, housing conditions, and basic family facilities. The integration of factor-based dimensionality reduction with non-linear classification algorithms proved effective in developing a more transparent and equitable UKT prediction model. This research contributes to the advancement of data-driven decision support systems in higher education and provides a foundation for future automation in tuition group determination processes.

Keywords

Feature Selection Exploratory Factor Analysis (EFA) Machine Learning Support Vector Machine (SVM) UKT classification
Putri, W. C. C., Wiyli Yustanti, Ervin Yohannes, & Yoyok Prastyo. (2026). A Data-Driven Framework Integrating Clustering and Classification for Fair Tuition Grouping (UKT) Prediction. Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control, 11(3). https://doi.org/10.22219/kinetik.v11i3.2578
  • ACM
  • ACS
  • APA
  • ABNT
  • Chicago
  • Harvard
  • IEEE
  • MLA
  • Turabian
  • Vancouver
Download Citation
Endnote/Zotero/Mendeley (RIS)
BibTeX
References
  1. Ministry of Education and Culture of the Republic of Indonesia. (2020). Regulation on the implementation of the single tuition fee (UKT) policy in public universities. Jakarta: Ministry of Education and Culture of the Republic of Indonesia. Retrieved from https://peraturan.bpk.go.id/
  2. Hasan, M., & Lubis, R. (2023). Analysis of the single tuition fee (UKT) policy and its implications for social equity among public university students in Indonesia. Journal of Educational Policy, 12(1), 45–58. https://doi.org/10.21009/jkp.2023.12.1.45
  3. Yates, H., & Chamberlain, C. (2017). Machine learning and higher education. EDUCAUSE Review. https://er.educause.edu/articles/2017/12/machine-learning-and-higher-education
  4. Kosztyán, Z. T., Boda, G., & Kádek, T. (2020). Analyzing and clustering students’ application preferences for higher education institutions. PLoS One, 15(7), e0235420. https://doi.org/10.1371/journal.pone.0235420
  5. Mohamed Nafuri, A. F., Sani, N. S., Zainudin, N. F. A., Rahman, A. H. A., & Aliff, M. (2022). Clustering analysis for classifying student academic performance in higher education. Applied Sciences, 12(19), 9467. https://doi.org/10.3390/app12199467
  6. Minor, R. (2023). How tuition fees affected student enrollment at higher education institutions: The aftermath of a German quasi-experiment. Journal for Labour Market Research, 57(1). https://doi.org/10.1186/s12651-023-00354-7
  7. Lundin, H. (2024). Tuition fees for international students: A policy instrument of higher education institutions? Studies in Higher Education. https://doi.org/10.1080/21568235.2024.2353757
  8. Putri, W. C. C., Yustanti, W., & Yohannes, E. (2025). A comparative study of supervised feature selection methods for predicting Uang Kuliah Tunggal (UKT) groups. J-ICON: Jurnal Komputer dan Informatika, 13(2), 68–76. Universitas Nusa Cendana.
  9. Yu, S., Cai, Y., Pan, B., & Leung, M.-F. (2024). Semi-supervised feature selection of educational data mining for student performance analysis. Electronics, 13(3), 659. https://doi.org/10.3390/electronics13030659
  10. Garrido-Labrador, J. L., Fernández-García, A. J., López-Morales, J. M., & García-Sánchez, P. (2024). Ensemble methods and semi-supervised learning for student classification: A systematic review. Information Sciences, 658, 119785. https://doi.org/10.1016/j.ins.2024.00088
  11. Han, J., Kamber, M., & Pei, J. (2012). Data mining: Concepts and techniques (3rd ed.). Morgan Kaufmann.
  12. Guanin-Fajardo, J. H., et al. (2024). Predicting academic success of college students using machine learning: Feature selection, balancing techniques, and interpretation. Data, 9(4), 60. https://doi.org/10.3390/data9040060
  13. Yates, D. S., & Chamberlain, S. (2017). Principles of data wrangling: Practical techniques for data preparation. O’Reilly Media.
  14. Nguyen, H. T., & Do, T. T. (2023). An effective data preprocessing framework for educational datasets: Improving student performance prediction. Education and Information Technologies, 28(2), 1893–1912. https://doi.org/10.1007/s10639-022-11346-9
  15. Yusliani, N. (2022). The effect of Chi-Square feature selection on question classification. Sinkron: Jurnal Politeknik Pancasila, 6(3), 77–84.
  16. Mustapha, S., Shah, N., & Arshad, M. (2023). A comparative study of feature selection methods. Informatics, 6(5), 86. https://doi.org/10.3390/informatics6050086
  17. Tariq, M. A. (2024). A study on comparative analysis of feature selection. Journal of Information and Organizational Sciences, 48(2), 133–146.
  18. Haryanto, A., & Widodo, A. (2024). Evaluating recursive feature elimination stability on socio-economic surveys. Indonesian Journal of Artificial Intelligence, 11(2), 87–99
  19. Gul, M. N., et al. (2025). Data-driven decisions in education using a comprehensive machine learning framework. Information Retrieval Journal, 28(3), 211–229. https://doi.org/10.1007/s10791-025-09585-3
  20. Basri, F., & Jannah, M. (2023). Hybrid Chi-Square–LASSO feature selection for imbalanced educational data. Journal of Educational Data Science, 2(1), 15–29
  21. Cappelli, F., et al. (2024). Random forest and feature-importance measures for multidimensional classification. International Journal of Environmental Research and Public Health, 21(7), 867. https://doi.org/10.3390/ijerph21070867
  22. Wibowo, F. A. S., et al. (2025). Impact of feature selection on decision tree and random forest for classifying student study success. Barekeng Journal of Mathematics and Applications, 19(1), 51–61.
  23. Malik, S., et al. (2025). Advancing educational data mining for enhanced student performance prediction: Integrating feature selection and latent factor analysis. Scientific Reports, 15(1), 92324. https://doi.org/10.1038/s41598-025-92324-x
  24. MacQueen, J. B. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability (Vol. 1, pp. 281–297). University of California Press.
  25. Jain, A. K. (2010). Data clustering: 50 years beyond K-Means. Pattern Recognition Letters, 31(8), 651–666. https://doi.org/10.1016/j.patrec.2009.09.011
  26. Zhang, Y., & Ma, X. (2023). Dealing with imbalanced datasets in educational prediction: A review of resampling and ensemble methods. Education and Information Technologies, 28(5), 4557–4578. https://doi.org/10.1007/s10639-023-11526-0
  27. Shu, Y., & Li, C. (2025). Application of improved clustering algorithm in mixed teaching of modern educational technology. Smart Learning Environments, 12(1), 39. https://doi.org/10.1007/s44163-025-00393-8
  28. Stats StackExchange. (2013). Do low silhouette widths mean the data has little underlying structure? Retrieved October 2025, from https://stats.stackexchange.com/questions/45232/do-low-silhouette-widths-mean-the-data-has-little-underlying-structure
  29. BMC Bioinformatics. (2022). Assessing clustering performance with silhouette score and related validation indices in high-dimensional biological data. BMC Bioinformatics, 23(1), 412. https://doi.org/10.1186/s12859-022-04957-3
  30. Quinlan, J. R. (1993). C4.5: Programs for Machine Learning. Morgan Kaufmann.
  31. Han, J., Kamber, M., & Pei, J. (2021). Data mining: Concepts and techniques (4th ed.). Morgan Kaufmann.
  32. Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. https://doi.org/10.1023/A:1010933404324
  33. Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297. https://doi.org/10.1007/BF00994018
  34. Smola, A. J., & Schölkopf, B. (2004). A tutorial on support vector regression. Statistics and Computing, 14(3), 199–222. https://doi.org/10.1023/B:STCO.0000035301.49549.88
  35. Maron, M. E. (1961). Automatic indexing: An experimental inquiry. Journal of the ACM, 8(3), 404–417.
  36. Rish, I. (2001). An empirical study of the Naïve Bayes classifier. In IJCAI Workshop on Empirical Methods in AI (pp. 41–46).
  37. Cover, T., & Hart, P. (1967). Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 13(1), 21–27. https://doi.org/10.1109/TIT.1967.1053964
  38. Altman, N. S. (1992). An introduction to kernel and nearest neighbor nonparametric regression. The American Statistician, 46(3), 175–185. https://doi.org/10.1080/00031305.1992.10475879
  39. Powers, D. M. W. (2011). Evaluation: From precision, recall and F-measure to ROC, informedness, markedness and correlation. Journal of Machine Learning Technologies, 2(1), 37–63.
  40. Sokolova, M., & Lapalme, G. (2009). A systematic analysis of performance measures for classification tasks. Information Processing & Management, 45(4), 427–437. https://doi.org/10.1016/j.ipm.2009.03.002
Read More

References


Ministry of Education and Culture of the Republic of Indonesia. (2020). Regulation on the implementation of the single tuition fee (UKT) policy in public universities. Jakarta: Ministry of Education and Culture of the Republic of Indonesia. Retrieved from https://peraturan.bpk.go.id/

Hasan, M., & Lubis, R. (2023). Analysis of the single tuition fee (UKT) policy and its implications for social equity among public university students in Indonesia. Journal of Educational Policy, 12(1), 45–58. https://doi.org/10.21009/jkp.2023.12.1.45

Yates, H., & Chamberlain, C. (2017). Machine learning and higher education. EDUCAUSE Review. https://er.educause.edu/articles/2017/12/machine-learning-and-higher-education

Kosztyán, Z. T., Boda, G., & Kádek, T. (2020). Analyzing and clustering students’ application preferences for higher education institutions. PLoS One, 15(7), e0235420. https://doi.org/10.1371/journal.pone.0235420

Mohamed Nafuri, A. F., Sani, N. S., Zainudin, N. F. A., Rahman, A. H. A., & Aliff, M. (2022). Clustering analysis for classifying student academic performance in higher education. Applied Sciences, 12(19), 9467. https://doi.org/10.3390/app12199467

Minor, R. (2023). How tuition fees affected student enrollment at higher education institutions: The aftermath of a German quasi-experiment. Journal for Labour Market Research, 57(1). https://doi.org/10.1186/s12651-023-00354-7

Lundin, H. (2024). Tuition fees for international students: A policy instrument of higher education institutions? Studies in Higher Education. https://doi.org/10.1080/21568235.2024.2353757

Putri, W. C. C., Yustanti, W., & Yohannes, E. (2025). A comparative study of supervised feature selection methods for predicting Uang Kuliah Tunggal (UKT) groups. J-ICON: Jurnal Komputer dan Informatika, 13(2), 68–76. Universitas Nusa Cendana.

Yu, S., Cai, Y., Pan, B., & Leung, M.-F. (2024). Semi-supervised feature selection of educational data mining for student performance analysis. Electronics, 13(3), 659. https://doi.org/10.3390/electronics13030659

Garrido-Labrador, J. L., Fernández-García, A. J., López-Morales, J. M., & García-Sánchez, P. (2024). Ensemble methods and semi-supervised learning for student classification: A systematic review. Information Sciences, 658, 119785. https://doi.org/10.1016/j.ins.2024.00088

Han, J., Kamber, M., & Pei, J. (2012). Data mining: Concepts and techniques (3rd ed.). Morgan Kaufmann.

Guanin-Fajardo, J. H., et al. (2024). Predicting academic success of college students using machine learning: Feature selection, balancing techniques, and interpretation. Data, 9(4), 60. https://doi.org/10.3390/data9040060

Yates, D. S., & Chamberlain, S. (2017). Principles of data wrangling: Practical techniques for data preparation. O’Reilly Media.

Nguyen, H. T., & Do, T. T. (2023). An effective data preprocessing framework for educational datasets: Improving student performance prediction. Education and Information Technologies, 28(2), 1893–1912. https://doi.org/10.1007/s10639-022-11346-9

Yusliani, N. (2022). The effect of Chi-Square feature selection on question classification. Sinkron: Jurnal Politeknik Pancasila, 6(3), 77–84.

Mustapha, S., Shah, N., & Arshad, M. (2023). A comparative study of feature selection methods. Informatics, 6(5), 86. https://doi.org/10.3390/informatics6050086

Tariq, M. A. (2024). A study on comparative analysis of feature selection. Journal of Information and Organizational Sciences, 48(2), 133–146.

Haryanto, A., & Widodo, A. (2024). Evaluating recursive feature elimination stability on socio-economic surveys. Indonesian Journal of Artificial Intelligence, 11(2), 87–99

Gul, M. N., et al. (2025). Data-driven decisions in education using a comprehensive machine learning framework. Information Retrieval Journal, 28(3), 211–229. https://doi.org/10.1007/s10791-025-09585-3

Basri, F., & Jannah, M. (2023). Hybrid Chi-Square–LASSO feature selection for imbalanced educational data. Journal of Educational Data Science, 2(1), 15–29

Cappelli, F., et al. (2024). Random forest and feature-importance measures for multidimensional classification. International Journal of Environmental Research and Public Health, 21(7), 867. https://doi.org/10.3390/ijerph21070867

Wibowo, F. A. S., et al. (2025). Impact of feature selection on decision tree and random forest for classifying student study success. Barekeng Journal of Mathematics and Applications, 19(1), 51–61.

Malik, S., et al. (2025). Advancing educational data mining for enhanced student performance prediction: Integrating feature selection and latent factor analysis. Scientific Reports, 15(1), 92324. https://doi.org/10.1038/s41598-025-92324-x

MacQueen, J. B. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability (Vol. 1, pp. 281–297). University of California Press.

Jain, A. K. (2010). Data clustering: 50 years beyond K-Means. Pattern Recognition Letters, 31(8), 651–666. https://doi.org/10.1016/j.patrec.2009.09.011

Zhang, Y., & Ma, X. (2023). Dealing with imbalanced datasets in educational prediction: A review of resampling and ensemble methods. Education and Information Technologies, 28(5), 4557–4578. https://doi.org/10.1007/s10639-023-11526-0

Shu, Y., & Li, C. (2025). Application of improved clustering algorithm in mixed teaching of modern educational technology. Smart Learning Environments, 12(1), 39. https://doi.org/10.1007/s44163-025-00393-8

Stats StackExchange. (2013). Do low silhouette widths mean the data has little underlying structure? Retrieved October 2025, from https://stats.stackexchange.com/questions/45232/do-low-silhouette-widths-mean-the-data-has-little-underlying-structure

BMC Bioinformatics. (2022). Assessing clustering performance with silhouette score and related validation indices in high-dimensional biological data. BMC Bioinformatics, 23(1), 412. https://doi.org/10.1186/s12859-022-04957-3

Quinlan, J. R. (1993). C4.5: Programs for Machine Learning. Morgan Kaufmann.

Han, J., Kamber, M., & Pei, J. (2021). Data mining: Concepts and techniques (4th ed.). Morgan Kaufmann.

Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. https://doi.org/10.1023/A:1010933404324

Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297. https://doi.org/10.1007/BF00994018

Smola, A. J., & Schölkopf, B. (2004). A tutorial on support vector regression. Statistics and Computing, 14(3), 199–222. https://doi.org/10.1023/B:STCO.0000035301.49549.88

Maron, M. E. (1961). Automatic indexing: An experimental inquiry. Journal of the ACM, 8(3), 404–417.

Rish, I. (2001). An empirical study of the Naïve Bayes classifier. In IJCAI Workshop on Empirical Methods in AI (pp. 41–46).

Cover, T., & Hart, P. (1967). Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 13(1), 21–27. https://doi.org/10.1109/TIT.1967.1053964

Altman, N. S. (1992). An introduction to kernel and nearest neighbor nonparametric regression. The American Statistician, 46(3), 175–185. https://doi.org/10.1080/00031305.1992.10475879

Powers, D. M. W. (2011). Evaluation: From precision, recall and F-measure to ROC, informedness, markedness and correlation. Journal of Machine Learning Technologies, 2(1), 37–63.

Sokolova, M., & Lapalme, G. (2009). A systematic analysis of performance measures for classification tasks. Information Processing & Management, 45(4), 427–437. https://doi.org/10.1016/j.ipm.2009.03.002

Author biographies is not available.
Download this PDF file
Statistic
Read Counter : 0

Downloads

Download data is not yet available.

Quick Link

  • Author Guidelines
  • Download Manuscript Template
  • Peer Review Process
  • Editorial Board
  • Reviewer Acknowledgement
  • Aim and Scope
  • Publication Ethics
  • Licensing Term
  • Copyright Notice
  • Open Access Policy
  • Important Dates
  • Author Fees
  • Indexing and Abstracting
  • Archiving Policy
  • Scopus Citation Analysis
  • Statistic
  • Article Withdrawal

Meet Our Editorial Team

Ir. Amrul Faruq, M.Eng., Ph.D
Editor in Chief
Universitas Muhammadiyah Malang
Google Scholar Scopus
Prof. Robert Lis
Editorial Board
Wrocław University of Science and Technology
Orcid  Scopus
Hanung Adi Nugroho
Editorial Board
Universitas Gadjah Mada
Google Scholar Scopus
Prof. Roman Voliansky
Editorial Board
Dniprovsky State Technical University, Ukraine
Google Scholar Scopus
Read More
 

KINETIK: Game Technology, Information System, Computer Network, Computing, Electronics, and Control
eISSN : 2503-2267
pISSN : 2503-2259


Address

Program Studi Elektro dan Informatika

Fakultas Teknik, Universitas Muhammadiyah Malang

Jl. Raya Tlogomas 246 Malang

Phone 0341-464318 EXT 247

Contact Info

Principal Contact

Amrul Faruq
Phone: +62 812-9398-6539
Email: faruq@umm.ac.id

Support Contact

Fauzi Dwi Setiawan Sumadi
Phone: +62 815-1145-6946
Email: fauzisumadi@umm.ac.id

© 2020 KINETIK, All rights reserved. This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License