
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Integrating Tabular Data and Textual Representations for Clinical Risk Prediction Using Machine Learning and Large Language Models
Corresponding Author(s) : La Febry Andira Rose Cynthia
Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control,
Vol. 11, No. 2, May 2026 (Article in Progress)
Abstract
Global health is currently facing serious challenges due to the increasing number of chronic disease patients, such as those with heart failure, diabetes, and cancer. This issue arises from the limitations of electronic health record (EHR) systems, which are not yet fully capable of ensuring accurate clinical diagnoses because of potential data input errors and delays in symptom identification by medical personnel. In response to this issue, this paper focuses on the integration of medical tabular data with a classification approach based on classical machine learning (ML) and large language models (LLM) to improve the accuracy of patient diagnosis predictions. This paper aims to develop and compare the performance of various ML models, such as XGBoost, SVM, and logistic regression, as well as LLM models like Gemini, LLaMA, and Qwen in fine-tuning, few-shot, and zero-shot scenarios. The paper results show that the combination of Gemini and the few-shot approach (250 shots) achieved the highest accuracy of up to 99.8% in predicting heart failure risk. The main finding of this study is that the narrative text representation of tabular data processed with LLM significantly enhances contextual understanding and classification accuracy, making this approach highly potent for application in AI-based clinical decision-making.
Keywords
Download Citation
Endnote/Zotero/Mendeley (RIS)BibTeX
- N. R. F. Collaboration, “Worldwide trends in diabetes prevalence and treatment from 1990 to 2022: a pooled analysis of 1108 population-representative studies with 141 million participants,” Lancet, vol. 404, no. 10467, pp. 2077–2093, 2024, doi: https://doi.org/10.1016/S0140-6736(24)02317-1.
- H. Sung et al., “Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries.,” CA. Cancer J. Clin., vol. 71, no. 3, pp. 209–249, May 2021,
- doi: https://doi.org/10.3322/caac.21660.
- B. Shahim, C. J. Kapelios, G. Savarese, and L. H. Lund, “Global Public Health Burden of Heart Failure: An Updated Review.,” Card. Fail. Rev., vol. 9, p. e11, 2023, doi: https://doi.org/10.15420/cfr.2023.05.
- T. N. Bogale et al., “Effect of electronic records on mortality among patients in hospital and primary healthcare settings: a systematic review and meta-analyses,” Front. Digit. Heal., vol. 6, no. June, pp. 1–14, 2024, doi: https://doi.org/10.3389/fdgth.2024.1377826.
- T. S. Hwang, M. Thomas, M. Hribar, A. Chen, and E. White, “The Impact of Documentation Workflow on the Accuracy of the Coded Diagnoses in the Electronic Health Record,” Ophthalmol. Sci., vol. 4, no. 1, p. 100409, 2024,
- doi: https://doi.org/10.1016/j.xops.2023.100409.
- R. A. Dixit, C. L. Boxley, S. Samuel, V. Mohan, R. M. Ratwani, and J. A. Gold, “Electronic Health Record Use Issues and Diagnostic Error: A Scoping Review and Framework,” J. Patient Saf., vol. 19, no. 1, pp. E25–E30, 2023,
- doi: 10.1097/PTS.0000000000001081.
- M. R. Kale, A. H. Mutlag, S. P, N. H. Al-Muraad, H. S. Mahdi, and S. Muthuperumal, “AI Powered Decision Support Systems for Healthcare Enhancing Diagnosis and Treatment with Deep Learning,” in 2025 International Conference on Intelligent Computing and Knowledge Extraction (ICICKE), 2025, pp. 1–5.
- doi: https://doi.org/10.1109/ICICKE65317.2025.11136681.
- E. Hassan and C. E. Omenogor, “AI powered predictive healthcare: Deep learning for early diagnosis, personalized treatment, and disease prevention,” Int. J. Sci. Res. Arch., vol. 14, no. 3, pp. 806–823, 2025,
- doi: 10.30574/ijsra.2025.14.3.0731.
- A. Jafar, N. Bibi, and R. A. Naqvi, “Revolutionizing agriculture with arti fi cial intelligence : plant disease detection methods , applications , and their limitations,” no. March, pp. 1–20, 2024, doi: https://doi.org/10.3389/fpls.2024.1356260.
- R. M. Shohel and S. Jeff, “AI in Healthcare: Transforming Patient Care through Predictive Analytics and Decision Support Systems,” J. Artif. Intell. Gen. Sci. ISSN3006-4023, vol. 1, no. 1, 2024,
- doi: https://doi.org/10.60087/jaigs.v1i1.30.
- M. A. Islam et al., “Harnessing Predictive Analytics: The Role of Machine Learning in Early Disease Detection and Healthcare Optimization,” J. Ecohumanism, vol. 4, no. 3, pp. 312–321, 2025, doi: https://doi.org/10.62754/joe.v4i3.6642.
- S E. Z. Snigdha, M. R. Hossain, and S. Mahabub, “AI-Powered Healthcare Tracker Development: Advancing Real-Time Patient Monitoring and Predictive Analytics Through Data-Driven Intelligence",” J. Comput. Sci. Technol. Stud., vol. 5, no. 4, pp. 229–239, 2023, doi: https://doi.org/10.32996/jcsts.2023.5.4.24.
- V Q. Niu, K. Chen, M. Li, P. Feng, Z. Bi, L. K. Q. Yan, Y. Zhang, C. H. Yin, C. Fei, J. Liu, T. Wang, Y. Wang, S. Chen, and B. Peng, “From text to multimodality: Exploring the evolution and impact of large language models in medical practice,” arXiv preprint arXiv:2410.01812, 2024, doi: https://doi.org/10.48550/arXiv.2410.01812.
- A S. Maity and M. J. Saikia, “Large language models in healthcare and medical applications: A review,” Bioengineering, vol. 12, no. 6, p. 631, 2025, doi:https://doi.org/10.3390/bioengineering12060631.
- R. Yang, T. F. Tan, W. Lu, A. J. Thirunavukarasu, D. S. W. Ting, and N. Liu, “Large language models in health care: Development, applications, and challenges,” Health Care Science, vol. 2, no. 4, pp. 255–263, 2023, doi: https://doi.org/10.1002/hcs2.61.
- G. Huang, Y. Li, S. Jameel, Y. Long, and G. Papanastasiou, “From explainable to interpretable deep learning for natural language processing in healthcare: How far from reality?,” Comput. Struct. Biotechnol. J., vol. 24, no. November 2023, pp. 362–373, 2024, doi: https://doi.org/10.1016/j.csbj.2024.05.004.
- F. Markowetz, “All models are wrong and yours are useless: making clinical prediction models impactful for patients,” npj Precis. Oncol., vol. 8, no. 1, pp. 6–8, 2024, doi: https://doi.org/10.1038/s41698-024-00553-6.
- K. Mavrogiorgos, A. Kiourtis, A. Mavrogiorgou, A. Menychtas, and D. Kyriazis, “Bias in Machine Learning: A Literature Review,” Appl. Sci., vol. 14, no. 19, 2024, doi: https://doi.org/10.3390/ app14198860.
- K. Ono and S. A. Lee, “Text Serialization and Their Relationship with the Conventional Paradigms of Tabular Machine Learning,” 2024, doi: https://doi.org/10.48550/arXiv.2406.13846.
- World Health Organization, “The top 10 causes of death,” 8 August 2024. [Online]. Available: https://www.who.int/news-room/fact-sheets/detail/the-top-10-causes-of-death
- R. Guido, S. Ferrisi, D. Lofaro, and D. Conforti, “An Overview on the Advancements of Support Vector Machine Models in Healthcare Applications: A Review,” Inf., vol. 15, no. 4, 2024, doi: https://doi.org/10.3390/info15040235.
- K. Gohari et al., “A Bayesian latent class extension of naive Bayesian classifier and its application to the classification of gastric cancer patients,” BMC Med. Res. Methodol., vol. 23, no. 1, pp. 1–15, 2023,
- doi: https://doi.org/10.1186/s12874-023-02013-4.
- X. Tang et al., “A clinical diagnostic model based on an eXtreme Gradient Boosting algorithm to distinguish type 1 diabetes,” Ann. Transl. Med., vol. 9, no. 5, pp. 409–409, 2021, doi: https://doi.org/10.21037/atm-20-7115.
- S. A. Hicks et al., “On evaluation metrics for medical applications of artificial intelligence,” Sci. Rep., vol. 12, no. 1, pp. 1–9, 2022, doi: https://doi.org/10.1038/s41598-022-09954-8.
- S. Maity and M. J. Saikia, “Large Language Models in Healthcare and Medical Domain: A Review,” Informatics, vol. 11, no. 3, pp. 1–25, 2024, doi: https://doi.org/10.3390/informatics11030057.
- M. J. Schuemie et al., “Standardized patient profile review using large language models for case adjudication in observational research,” npj Digit. Med., vol. 8, no. 1, pp. 1–7, 2025, doi: http://dx.doi.org/10.1038/s41746-025-01433-4.
- A. Q. Xie, Q. Chen, A. Chen, and C. Peng, “Me-LLaMA : Medical Foundation Large Language Models for Comprehensive Text Analysis and Beyond,” pp. 1–21, doi: https://doi.org/10.21203/rs.3.rs-5456223/v1.
- K. Saab et al., “Capabilities of Gemini Models in Medicine,” pp. 1–58, 2024,
- doi: https://doi.org/10.48550/arXiv.2404.18416.
- S. Zhu, W. Hu, Z. Yang, J. Yan, and F. Zhang, “Qwen-2.5 Outperforms Other Large Language Models in the Chinese National Nursing Licensing Examination: Retrospective Cross-Sectional Comparative Study.,” JMIR Med. informatics, vol. 13, p. e63731, Jan. 2025, doi: https://doi.org/10.2196/63731.
- D. M. Anisuzzaman, J. G. Malins, P. A. Friedman, and Z. I. Attia, “Fine-Tuning Large Language Models for Specialized Use Cases,” Mayo Clin. Proc. Digit. Heal., vol. 3, no. 1, p. 100184, 2025,
- doi: https://doi.org/10.1016/j.mcpdig.2024.11.005.
- Y. Ge, Y. Guo, Y.-C. Yang, M. A. Al-Garadi, and A. Sarker, “Few-shot learning for medical text: A systematic review,” 2022, doi: https://doi.org/10.48550/arXiv.2204.14081.
- B. Neves et al., “Zero-shot learning for clinical phenotyping: Comparing LLMs and rule-based methods,” Comput. Biol. Med., vol. 192, no. PA, p. 110181, 2025, doi: https://doi.org/10.1016/j.compbiomed.2025.110181.
References
N. R. F. Collaboration, “Worldwide trends in diabetes prevalence and treatment from 1990 to 2022: a pooled analysis of 1108 population-representative studies with 141 million participants,” Lancet, vol. 404, no. 10467, pp. 2077–2093, 2024, doi: https://doi.org/10.1016/S0140-6736(24)02317-1.
H. Sung et al., “Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries.,” CA. Cancer J. Clin., vol. 71, no. 3, pp. 209–249, May 2021,
doi: https://doi.org/10.3322/caac.21660.
B. Shahim, C. J. Kapelios, G. Savarese, and L. H. Lund, “Global Public Health Burden of Heart Failure: An Updated Review.,” Card. Fail. Rev., vol. 9, p. e11, 2023, doi: https://doi.org/10.15420/cfr.2023.05.
T. N. Bogale et al., “Effect of electronic records on mortality among patients in hospital and primary healthcare settings: a systematic review and meta-analyses,” Front. Digit. Heal., vol. 6, no. June, pp. 1–14, 2024, doi: https://doi.org/10.3389/fdgth.2024.1377826.
T. S. Hwang, M. Thomas, M. Hribar, A. Chen, and E. White, “The Impact of Documentation Workflow on the Accuracy of the Coded Diagnoses in the Electronic Health Record,” Ophthalmol. Sci., vol. 4, no. 1, p. 100409, 2024,
doi: https://doi.org/10.1016/j.xops.2023.100409.
R. A. Dixit, C. L. Boxley, S. Samuel, V. Mohan, R. M. Ratwani, and J. A. Gold, “Electronic Health Record Use Issues and Diagnostic Error: A Scoping Review and Framework,” J. Patient Saf., vol. 19, no. 1, pp. E25–E30, 2023,
doi: 10.1097/PTS.0000000000001081.
M. R. Kale, A. H. Mutlag, S. P, N. H. Al-Muraad, H. S. Mahdi, and S. Muthuperumal, “AI Powered Decision Support Systems for Healthcare Enhancing Diagnosis and Treatment with Deep Learning,” in 2025 International Conference on Intelligent Computing and Knowledge Extraction (ICICKE), 2025, pp. 1–5.
doi: https://doi.org/10.1109/ICICKE65317.2025.11136681.
E. Hassan and C. E. Omenogor, “AI powered predictive healthcare: Deep learning for early diagnosis, personalized treatment, and disease prevention,” Int. J. Sci. Res. Arch., vol. 14, no. 3, pp. 806–823, 2025,
doi: 10.30574/ijsra.2025.14.3.0731.
A. Jafar, N. Bibi, and R. A. Naqvi, “Revolutionizing agriculture with arti fi cial intelligence : plant disease detection methods , applications , and their limitations,” no. March, pp. 1–20, 2024, doi: https://doi.org/10.3389/fpls.2024.1356260.
R. M. Shohel and S. Jeff, “AI in Healthcare: Transforming Patient Care through Predictive Analytics and Decision Support Systems,” J. Artif. Intell. Gen. Sci. ISSN3006-4023, vol. 1, no. 1, 2024,
doi: https://doi.org/10.60087/jaigs.v1i1.30.
M. A. Islam et al., “Harnessing Predictive Analytics: The Role of Machine Learning in Early Disease Detection and Healthcare Optimization,” J. Ecohumanism, vol. 4, no. 3, pp. 312–321, 2025, doi: https://doi.org/10.62754/joe.v4i3.6642.
S E. Z. Snigdha, M. R. Hossain, and S. Mahabub, “AI-Powered Healthcare Tracker Development: Advancing Real-Time Patient Monitoring and Predictive Analytics Through Data-Driven Intelligence",” J. Comput. Sci. Technol. Stud., vol. 5, no. 4, pp. 229–239, 2023, doi: https://doi.org/10.32996/jcsts.2023.5.4.24.
V Q. Niu, K. Chen, M. Li, P. Feng, Z. Bi, L. K. Q. Yan, Y. Zhang, C. H. Yin, C. Fei, J. Liu, T. Wang, Y. Wang, S. Chen, and B. Peng, “From text to multimodality: Exploring the evolution and impact of large language models in medical practice,” arXiv preprint arXiv:2410.01812, 2024, doi: https://doi.org/10.48550/arXiv.2410.01812.
A S. Maity and M. J. Saikia, “Large language models in healthcare and medical applications: A review,” Bioengineering, vol. 12, no. 6, p. 631, 2025, doi:https://doi.org/10.3390/bioengineering12060631.
R. Yang, T. F. Tan, W. Lu, A. J. Thirunavukarasu, D. S. W. Ting, and N. Liu, “Large language models in health care: Development, applications, and challenges,” Health Care Science, vol. 2, no. 4, pp. 255–263, 2023, doi: https://doi.org/10.1002/hcs2.61.
G. Huang, Y. Li, S. Jameel, Y. Long, and G. Papanastasiou, “From explainable to interpretable deep learning for natural language processing in healthcare: How far from reality?,” Comput. Struct. Biotechnol. J., vol. 24, no. November 2023, pp. 362–373, 2024, doi: https://doi.org/10.1016/j.csbj.2024.05.004.
F. Markowetz, “All models are wrong and yours are useless: making clinical prediction models impactful for patients,” npj Precis. Oncol., vol. 8, no. 1, pp. 6–8, 2024, doi: https://doi.org/10.1038/s41698-024-00553-6.
K. Mavrogiorgos, A. Kiourtis, A. Mavrogiorgou, A. Menychtas, and D. Kyriazis, “Bias in Machine Learning: A Literature Review,” Appl. Sci., vol. 14, no. 19, 2024, doi: https://doi.org/10.3390/ app14198860.
K. Ono and S. A. Lee, “Text Serialization and Their Relationship with the Conventional Paradigms of Tabular Machine Learning,” 2024, doi: https://doi.org/10.48550/arXiv.2406.13846.
World Health Organization, “The top 10 causes of death,” 8 August 2024. [Online]. Available: https://www.who.int/news-room/fact-sheets/detail/the-top-10-causes-of-death
R. Guido, S. Ferrisi, D. Lofaro, and D. Conforti, “An Overview on the Advancements of Support Vector Machine Models in Healthcare Applications: A Review,” Inf., vol. 15, no. 4, 2024, doi: https://doi.org/10.3390/info15040235.
K. Gohari et al., “A Bayesian latent class extension of naive Bayesian classifier and its application to the classification of gastric cancer patients,” BMC Med. Res. Methodol., vol. 23, no. 1, pp. 1–15, 2023,
doi: https://doi.org/10.1186/s12874-023-02013-4.
X. Tang et al., “A clinical diagnostic model based on an eXtreme Gradient Boosting algorithm to distinguish type 1 diabetes,” Ann. Transl. Med., vol. 9, no. 5, pp. 409–409, 2021, doi: https://doi.org/10.21037/atm-20-7115.
S. A. Hicks et al., “On evaluation metrics for medical applications of artificial intelligence,” Sci. Rep., vol. 12, no. 1, pp. 1–9, 2022, doi: https://doi.org/10.1038/s41598-022-09954-8.
S. Maity and M. J. Saikia, “Large Language Models in Healthcare and Medical Domain: A Review,” Informatics, vol. 11, no. 3, pp. 1–25, 2024, doi: https://doi.org/10.3390/informatics11030057.
M. J. Schuemie et al., “Standardized patient profile review using large language models for case adjudication in observational research,” npj Digit. Med., vol. 8, no. 1, pp. 1–7, 2025, doi: http://dx.doi.org/10.1038/s41746-025-01433-4.
A. Q. Xie, Q. Chen, A. Chen, and C. Peng, “Me-LLaMA : Medical Foundation Large Language Models for Comprehensive Text Analysis and Beyond,” pp. 1–21, doi: https://doi.org/10.21203/rs.3.rs-5456223/v1.
K. Saab et al., “Capabilities of Gemini Models in Medicine,” pp. 1–58, 2024,
doi: https://doi.org/10.48550/arXiv.2404.18416.
S. Zhu, W. Hu, Z. Yang, J. Yan, and F. Zhang, “Qwen-2.5 Outperforms Other Large Language Models in the Chinese National Nursing Licensing Examination: Retrospective Cross-Sectional Comparative Study.,” JMIR Med. informatics, vol. 13, p. e63731, Jan. 2025, doi: https://doi.org/10.2196/63731.
D. M. Anisuzzaman, J. G. Malins, P. A. Friedman, and Z. I. Attia, “Fine-Tuning Large Language Models for Specialized Use Cases,” Mayo Clin. Proc. Digit. Heal., vol. 3, no. 1, p. 100184, 2025,
doi: https://doi.org/10.1016/j.mcpdig.2024.11.005.
Y. Ge, Y. Guo, Y.-C. Yang, M. A. Al-Garadi, and A. Sarker, “Few-shot learning for medical text: A systematic review,” 2022, doi: https://doi.org/10.48550/arXiv.2204.14081.
B. Neves et al., “Zero-shot learning for clinical phenotyping: Comparing LLMs and rule-based methods,” Comput. Biol. Med., vol. 192, no. PA, p. 110181, 2025, doi: https://doi.org/10.1016/j.compbiomed.2025.110181.