Comparative Evaluation of BM25–FAISS and Small-LLM–GPT in Retrieval-Augmented Generation Concept Map Assessment

Maskur Maskur; Didik Dwi  Prasetya; Triyanna  Widiyaningtyas; Azlan Mohd  Zain

doi:10.22219/kinetik.v11i1.2594

Issue

Vol. 11, No. 1, February 2026

Issue Published : Feb 1, 2026

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Comparative Evaluation of BM25–FAISS and Small-LLM–GPT in Retrieval-Augmented Generation Concept Map Assessment

https://doi.org/10.22219/kinetik.v11i1.2594

Maskur Maskur

Universitas Negeri Malang; Politeknik Negeri Malang

Didik Dwi Prasetya

Universitas Negeri Malang

Triyanna Widiyaningtyas

Universitas Negeri Malang

Azlan Mohd Zain

Universiti Teknologi Malaysia

Corresponding Author(s) : Didik Dwi Prasetya

didikdwi@um.ac.id

Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control, Vol. 11, No. 1, February 2026
Article Published : Feb 1, 2026

Abstract

Concept map-based assessment is a practical approach to measure students’ conceptual understanding, but manual assessment still faces challenges such as subjectivity, inconsistency, and limited scalability. This study proposes the application of Retrieval-Augmented Generation (RAG) as an artificial intelligence-based automated assessment solution in an educational context. The objectives of this study are to compare the effectiveness of two retrieval methods, BM25 and FAISS, and to analyse the trade-off between large-scale generative models (GPT) and Small-LLM in assessing concept map propositions. This study uses a quantitative experimental approach by combining a retriever and a generator in the RAG system. Performance evaluation is carried out using the Macro-F1 and QWK metrics to measure agreement with expert judgment, and the Explanation Relevance Score (ERS) to assess explanation quality. The experimental results show that the FAISS–GPT combination achieves the best performance, with a Macro-F1 of 0.338 and a QWK of 0.146, slightly superior to the BM25–GPT combination. In contrast, the use of Small-LLM, both with BM25 and FAISS, showed lower performance with Macro-F1 values in the range of 0.167–0.221 and QWK close to zero. This finding confirms that semantic-based retrieval plays a vital role in improving the accuracy of automated assessment, while large-scale generative models are more effective in representing conceptual relationships in depth. This study contributes through a comparative analysis of retrievers and generators, and by introducing ERS as an additional metric for RAG-based automated assessment in the field of education.

Keywords

Retrieval-Augmented Generation BM25 FAISS Small-LLM GPT Concept Map Assessment

Maskur, M., Prasetya, D. D. ., Widiyaningtyas, T. ., & Zain, A. M. . (2026). Comparative Evaluation of BM25–FAISS and Small-LLM–GPT in Retrieval-Augmented Generation Concept Map Assessment. Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control, 11(1), 173-182. https://doi.org/10.22219/kinetik.v11i1.2594

Download Citation

References

S. Bouguettaya, F. Pupo, M. Chen, and G. Fortino, “A Meta-Survey of Generative AI in Education: Trends, Challenges, and Research Directions,” Sep. 01, 2025, Multidisciplinary Digital Publishing Institute (MDPI). doi: https://doi.org/10.3390/bdcc9090237
C. Cohn, N. Hutchins, T. Le, and G. Biswas, “A Chain-of-Thought Prompting Approach with LLMs for Evaluating Students’ Formative Assessment Responses in Science,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 21, pp. 23182–23190, 2024, doi: https://doi.org/10.1609/aaai.v38i21.30364
S. Y. Yanes-Luis, D. G. Gutiérrez-Reina, and S. T. Marín, “Towards a Retrieval-Augmented Generation Framework for Originality Evaluation in Projects-Based Learning Classrooms,” Educ. Sci. (Basel)., vol. 15, no. 6, 2025, doi: https://doi.org/10.3390/educsci15060706
R. Waterman, B. Lafving, C. Okar, and N. Jain, “A Custom GPT for Executive MBA Students: A Case Study in Enhancing Learning,” Stat, vol. 14, no. 4, 2025, doi: https://doi.org/10.1002/sta4.70109
T. Evans and I. Jeong, “Concept maps as assessment for learning in university mathematics,” Educational Studies in Mathematics, vol. 113, no. 3, pp. 475–498, 2023, doi: https://doi.org/10.1007/s10649-023-10209-0
K. E. de Ries, H. Schaap, A.-M. M. J. A. P. van Loon, M. M. H. Kral, and P. C. Meijer, “A literature review of open-ended concept maps as a research instrument to study knowledge and learning,” Qual. Quant., vol. 56, no. 1, pp. 73–107, Feb. 2022, doi: https://doi.org/10.1007/s11135-021-01113-x
P. Dahal, S. Nugroho, C. Schmidt, and V. Sanger, “AI-Based Learning Recommendations: Use in Higher Education †,” Future Internet, vol. 17, no. 7, 2025, doi: https://doi.org/10.3390/fi17070285
A. T. Neumann, Y. Yin, S. Sowe, S. Decker, and M. Jarke, “An LLM-Driven Chatbot in Higher Education for Databases and Information Systems,” IEEE Transactions on Education, vol. 68, no. 1, pp. 103–116, 2025, doi: https://doi.org/10.1109/TE.2024.3467912
D. Hennekeuser, D. D. Vaziri, D. Golchinfar, D. Schreiber, and G. Stevens, “Enlarged Education – Exploring the Use of Generative AI to Support Lecturing in Higher Education,” Int. J. Artif. Intell. Educ., vol. 35, no. 3, pp. 1096–1128, 2025, doi: https://doi.org/10.1007/s40593-024-00424-y
P. Fergus et al., “Towards Context-Rich Automated Biodiversity Assessments: Deriving AI-Powered Insights from Camera Trap Data,” Sensors, vol. 24, no. 24, 2024, doi: https://doi.org/10.3390/s24248122
M. Klesel and H. F. Wittmann, “Retrieval-Augmented Generation (RAG),” Business & Information Systems Engineering, vol. 67, no. 4, pp. 551–561, 2025, doi: https://doi.org/10.1007/s12599-025-00945-3
B. E. Perron, B. S. Hiltz, E. M. Khang, and S. A. Savas, “AI-Enhanced Social Work: Developing and Evaluating Retrieval-Augmented Generation (RAG) Support Systems,” J. Soc. Work Educ., vol. 61, no. 1, pp. 3–13, 2025, doi: https://doi.org/10.1080/10437797.2024.2411172
Y. Lee, “Developing a computer-based tutor utilizing Generative Artificial Intelligence (GAI) and Retrieval-Augmented Generation (RAG),” Educ. Inf. Technol. (Dordr)., vol. 30, no. 6, pp. 7841–7862, 2025, doi: https://doi.org/10.1007/s10639-024-13129-5
C. Cole, A. Hajikhani, E. Hylkilä, E. Paronen, and H. Pihkola, “Towards AI-augmented sustainability assessments: integrating large language models in the case of product social life cycle assessment,” International Journal of Life Cycle Assessment, 2025, doi: https://doi.org/10.1007/s11367-025-02508-w
F. Noorbehbahani and A. A. Kardan, “The automatic assessment of free text answers using a modified BLEU algorithm,” Comput. Educ., vol. 56, no. 2, pp. 337–345, Feb. 2011, doi: https://doi.org/10.1016/j.compedu.2010.07.013
W.-J. HOU and J.-H. TSAO, “AUTOMATIC ASSESSMENT OF STUDENTS’ FREE-TEXT ANSWERS WITH DIFFERENT LEVELS,” International Journal on Artificial Intelligence Tools, vol. 20, no. 02, pp. 327–347, Apr. 2011, doi: https://doi.org/10.1142/S0218213011000188
L. D. Krisnawati, A. W. Mahastama, S.-C. Haw, K.-W. Ng, and P. Naveen, “Indonesian-English Textual Similarity Detection Using Universal Sentence Encoder (USE) and Facebook AI Similarity Search (FAISS),” CommIT (Communication and Information Technology) Journal, vol. 18, no. 2, pp. 183–195, Sep. 2024, doi: https://doi.org/10.21512/commit.v18i2.11274
G. Dobriţa, S.-V. Oprea, and A. Bâra, “An NLP-driven e-learning platform with LLMs and graph databases for personalised guidance,” Conn. Sci., vol. 37, no. 1, Dec. 2025, doi: https://doi.org/10.1080/09540091.2025.2518991
J. B. Vargas Bernuy, M. A. Nolasco-Mamani, N. C. Velásquez Rodríguez, R. L. Gambetta Quelopana, A. N. Martinez Valdivia, and S. M. Espinoza Vidaurre, “Relative Advantage and Compatibility as Drivers of ChatGPT Adoption in Latin American Higher Education: A PLS SEM Study Towards Sustainable Digital Education,” Sustainability, vol. 17, no. 18, p. 8329, Sep. 2025, doi: https://doi.org/10.3390/su17188329
A. H. Nasution, A. Onan, Y. Murakami, W. Monika, and A. Hanafiah, “Benchmarking Open-Source Large Language Models for Sentiment and Emotion Classification in Indonesian Tweets,” IEEE Access, vol. 13, pp. 94009–94025, 2025, doi: https://doi.org/10.1109/ACCESS.2025.3574629
V. Karpukhin et al., “Dense Passage Retrieval for Open-Domain Question Answering,” in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Stroudsburg, PA, USA: Association for Computational Linguistics, 2020, pp. 6769–6781. doi: https://doi.org/10.18653/v1/2020.emnlp-main.550
S. Aksoy and A. Daou, “An Explainable Web-Based Diagnostic System for Alzheimer’s Disease Using XRAI and Deep Learning on Brain MRI,” Diagnostics, vol. 15, no. 20, p. 2559, Oct. 2025, doi: https://doi.org/10.3390/diagnostics15202559
D. de Oliveira, “Assessing ChatGPT in digital education: A case study on student perception,” Digital Engineering, vol. 7, p. 100067, Dec. 2025, doi: https://doi.org/10.1016/j.dte.2025.100067
J. S. Jauhiainen and A. G. Guerra, “Evaluating Students’ Open-ended Written Responses with LLMs: Using the RAG Framework for GPT-3.5, GPT-4, Claude-3, and Mistral-Large,” Advances in Artificial Intelligence and Machine Learning, vol. 4, no. 4, pp. 3097–3113, 2024, doi: https://doi.org/10.54364/AAIML.2024.44177
F. Fanelli, M. Saleh, P. Santamaria, K. Zhurakivska, L. Nibali, and G. Troiano, “Development and Comparative Evaluation of a Reinstructed GPT-4o Model Specialized in Periodontology,” J. Clin. Periodontol., vol. 52, no. 5, pp. 707–716, 2025, doi: https://doi.org/10.1111/jcpe.14101
S. Elmitwalli, J. Mehegan, S. Braznell, and A. Gallagher, “Scalable evaluation framework for retrieval augmented generation in tobacco research using large Language models,” Sci. Rep., vol. 15, no. 1, 2025, doi: https://doi.org/10.1038/s41598-025-05726-2
J. Lee, S. Ahn, D. Kim, and D. Kim, “Performance comparison of retrieval-augmented generation and fine-tuned large language models for construction safety management knowledge retrieval,” Autom. Constr., vol. 168, p. 105846, Dec. 2024, doi: https://doi.org/10.1016/j.autcon.2024.105846
D. D. Prasetya, A. Pinandito, Y. Hayashi, and T. Hirashima, “Analysis of quality of knowledge structure and students’ perceptions in extension concept mapping,” Res. Pract. Technol. Enhanc. Learn., vol. 17, no. 1, p. 14, 2022, doi: https://doi.org/10.1186/s41039-022-00189-9
Q. Chen, W. Zhou, J. Cheng, and J. Yang, “An Enhanced Retrieval Scheme for a Large Language Model with a Joint Strategy of Probabilistic Relevance and Semantic Association in the Vertical Domain,” Applied Sciences, vol. 14, no. 24, p. 11529, Dec. 2024, doi: https://doi.org/10.3390/app142411529
S. Xu, Z. Yan, C. Dai, and F. Wu, “MEGA-RAG: a retrieval-augmented generation framework with multi-evidence guided answer refinement for mitigating hallucinations of LLMs in public health,” Front. Public Health, vol. 13, Oct. 2025, doi: https://doi.org/10.3389/fpubh.2025.1635381
M. Ramesh et al., “Assessing WildfireGPT: a comparative analysis of AI models for quantitative wildfire spread prediction,” Natural Hazards, vol. 121, no. 11, pp. 13117–13130, Jun. 2025, doi: https://doi.org/10.1007/s11069-025-07344-7
V. Ramnarain-Seetohul, Y. Rosunally, and V. Bassoo, “A Unified Conceptual Hybrid Framework for the Automated Assessment of Short Answers,” Int. J. Artif. Intell. Educ., Jun. 2025, doi: https://doi.org/10.1007/s40593-025-00487-5
N. Lotfy, A. Shehab, M. Elhoseny, and A. Abu-Elfetouh, “An Enhanced Automatic Arabic Essay Scoring System Based on Machine Learning Algorithms,” Computers, Materials & Continua, vol. 77, no. 1, pp. 1227–1249, 2023, doi: https://doi.org/10.32604/cmc.2023.039185
A. Doewes, N. A. Kurdhi, and A. Saxena, “Evaluating Quadratic Weighted Kappa as the Standard Performance Metric for Automated Essay Scoring,” 2023, doi: https://doi.org/10.5281/zenodo.8115784
Y. Wang, Y. Wan, X. Lei, Q. Chen, and H. Hu, “A retrieval augmented generation based optimization approach for medical knowledge understanding and reasoning in large language models,” Array, vol. 28, p. 100504, 2025, doi: https://doi.org/10.1016/j.array.2025.100504
C. Yao and S. Fujita, “Adaptive Control of Retrieval-Augmented Generation for Large Language Models Through Reflective Tags,” Electronics (Basel)., vol. 13, no. 23, p. 4643, Nov. 2024, doi: https://doi.org/10.3390/electronics13234643
Y. Fukui, Y. Kawata, K. Kobashi, Y. Nagatani, and H. Iguchi, “Evaluation of a retrieval-augmented generation system using a Japanese Institutional Nuclear Medicine Manual and large language model-automated scoring,” Radiol. Phys. Technol., vol. 18, no. 3, pp. 861–876, 2025, doi: https://doi.org/10.1007/s12194-025-00941-y
R. Xu, Y. Hong, F. Zhang, and H. Xu, “Evaluation of the integration of retrieval-augmented generation in large language model for breast cancer nursing care responses,” Sci. Rep., vol. 14, no. 1, 2024, doi: https://doi.org/10.1038/s41598-024-81052-3
D. D. Prasetya, T. Widiyaningtyas, and T. Hirashima, “Interrelatedness patterns of knowledge representation in extension concept mapping,” Res. Pract. Technol. Enhanc. Learn., vol. 20, p. 009, May 2024, doi: https://doi.org/10.58459/rptel.2025.20009

References

S. Bouguettaya, F. Pupo, M. Chen, and G. Fortino, “A Meta-Survey of Generative AI in Education: Trends, Challenges, and Research Directions,” Sep. 01, 2025, Multidisciplinary Digital Publishing Institute (MDPI). doi: https://doi.org/10.3390/bdcc9090237

C. Cohn, N. Hutchins, T. Le, and G. Biswas, “A Chain-of-Thought Prompting Approach with LLMs for Evaluating Students’ Formative Assessment Responses in Science,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 21, pp. 23182–23190, 2024, doi: https://doi.org/10.1609/aaai.v38i21.30364

S. Y. Yanes-Luis, D. G. Gutiérrez-Reina, and S. T. Marín, “Towards a Retrieval-Augmented Generation Framework for Originality Evaluation in Projects-Based Learning Classrooms,” Educ. Sci. (Basel)., vol. 15, no. 6, 2025, doi: https://doi.org/10.3390/educsci15060706

R. Waterman, B. Lafving, C. Okar, and N. Jain, “A Custom GPT for Executive MBA Students: A Case Study in Enhancing Learning,” Stat, vol. 14, no. 4, 2025, doi: https://doi.org/10.1002/sta4.70109

T. Evans and I. Jeong, “Concept maps as assessment for learning in university mathematics,” Educational Studies in Mathematics, vol. 113, no. 3, pp. 475–498, 2023, doi: https://doi.org/10.1007/s10649-023-10209-0

K. E. de Ries, H. Schaap, A.-M. M. J. A. P. van Loon, M. M. H. Kral, and P. C. Meijer, “A literature review of open-ended concept maps as a research instrument to study knowledge and learning,” Qual. Quant., vol. 56, no. 1, pp. 73–107, Feb. 2022, doi: https://doi.org/10.1007/s11135-021-01113-x

P. Dahal, S. Nugroho, C. Schmidt, and V. Sanger, “AI-Based Learning Recommendations: Use in Higher Education †,” Future Internet, vol. 17, no. 7, 2025, doi: https://doi.org/10.3390/fi17070285

A. T. Neumann, Y. Yin, S. Sowe, S. Decker, and M. Jarke, “An LLM-Driven Chatbot in Higher Education for Databases and Information Systems,” IEEE Transactions on Education, vol. 68, no. 1, pp. 103–116, 2025, doi: https://doi.org/10.1109/TE.2024.3467912

D. Hennekeuser, D. D. Vaziri, D. Golchinfar, D. Schreiber, and G. Stevens, “Enlarged Education – Exploring the Use of Generative AI to Support Lecturing in Higher Education,” Int. J. Artif. Intell. Educ., vol. 35, no. 3, pp. 1096–1128, 2025, doi: https://doi.org/10.1007/s40593-024-00424-y

P. Fergus et al., “Towards Context-Rich Automated Biodiversity Assessments: Deriving AI-Powered Insights from Camera Trap Data,” Sensors, vol. 24, no. 24, 2024, doi: https://doi.org/10.3390/s24248122

M. Klesel and H. F. Wittmann, “Retrieval-Augmented Generation (RAG),” Business & Information Systems Engineering, vol. 67, no. 4, pp. 551–561, 2025, doi: https://doi.org/10.1007/s12599-025-00945-3

B. E. Perron, B. S. Hiltz, E. M. Khang, and S. A. Savas, “AI-Enhanced Social Work: Developing and Evaluating Retrieval-Augmented Generation (RAG) Support Systems,” J. Soc. Work Educ., vol. 61, no. 1, pp. 3–13, 2025, doi: https://doi.org/10.1080/10437797.2024.2411172

Y. Lee, “Developing a computer-based tutor utilizing Generative Artificial Intelligence (GAI) and Retrieval-Augmented Generation (RAG),” Educ. Inf. Technol. (Dordr)., vol. 30, no. 6, pp. 7841–7862, 2025, doi: https://doi.org/10.1007/s10639-024-13129-5

C. Cole, A. Hajikhani, E. Hylkilä, E. Paronen, and H. Pihkola, “Towards AI-augmented sustainability assessments: integrating large language models in the case of product social life cycle assessment,” International Journal of Life Cycle Assessment, 2025, doi: https://doi.org/10.1007/s11367-025-02508-w

F. Noorbehbahani and A. A. Kardan, “The automatic assessment of free text answers using a modified BLEU algorithm,” Comput. Educ., vol. 56, no. 2, pp. 337–345, Feb. 2011, doi: https://doi.org/10.1016/j.compedu.2010.07.013

W.-J. HOU and J.-H. TSAO, “AUTOMATIC ASSESSMENT OF STUDENTS’ FREE-TEXT ANSWERS WITH DIFFERENT LEVELS,” International Journal on Artificial Intelligence Tools, vol. 20, no. 02, pp. 327–347, Apr. 2011, doi: https://doi.org/10.1142/S0218213011000188

L. D. Krisnawati, A. W. Mahastama, S.-C. Haw, K.-W. Ng, and P. Naveen, “Indonesian-English Textual Similarity Detection Using Universal Sentence Encoder (USE) and Facebook AI Similarity Search (FAISS),” CommIT (Communication and Information Technology) Journal, vol. 18, no. 2, pp. 183–195, Sep. 2024, doi: https://doi.org/10.21512/commit.v18i2.11274

G. Dobriţa, S.-V. Oprea, and A. Bâra, “An NLP-driven e-learning platform with LLMs and graph databases for personalised guidance,” Conn. Sci., vol. 37, no. 1, Dec. 2025, doi: https://doi.org/10.1080/09540091.2025.2518991

J. B. Vargas Bernuy, M. A. Nolasco-Mamani, N. C. Velásquez Rodríguez, R. L. Gambetta Quelopana, A. N. Martinez Valdivia, and S. M. Espinoza Vidaurre, “Relative Advantage and Compatibility as Drivers of ChatGPT Adoption in Latin American Higher Education: A PLS SEM Study Towards Sustainable Digital Education,” Sustainability, vol. 17, no. 18, p. 8329, Sep. 2025, doi: https://doi.org/10.3390/su17188329

A. H. Nasution, A. Onan, Y. Murakami, W. Monika, and A. Hanafiah, “Benchmarking Open-Source Large Language Models for Sentiment and Emotion Classification in Indonesian Tweets,” IEEE Access, vol. 13, pp. 94009–94025, 2025, doi: https://doi.org/10.1109/ACCESS.2025.3574629

V. Karpukhin et al., “Dense Passage Retrieval for Open-Domain Question Answering,” in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Stroudsburg, PA, USA: Association for Computational Linguistics, 2020, pp. 6769–6781. doi: https://doi.org/10.18653/v1/2020.emnlp-main.550

S. Aksoy and A. Daou, “An Explainable Web-Based Diagnostic System for Alzheimer’s Disease Using XRAI and Deep Learning on Brain MRI,” Diagnostics, vol. 15, no. 20, p. 2559, Oct. 2025, doi: https://doi.org/10.3390/diagnostics15202559

D. de Oliveira, “Assessing ChatGPT in digital education: A case study on student perception,” Digital Engineering, vol. 7, p. 100067, Dec. 2025, doi: https://doi.org/10.1016/j.dte.2025.100067

J. S. Jauhiainen and A. G. Guerra, “Evaluating Students’ Open-ended Written Responses with LLMs: Using the RAG Framework for GPT-3.5, GPT-4, Claude-3, and Mistral-Large,” Advances in Artificial Intelligence and Machine Learning, vol. 4, no. 4, pp. 3097–3113, 2024, doi: https://doi.org/10.54364/AAIML.2024.44177

F. Fanelli, M. Saleh, P. Santamaria, K. Zhurakivska, L. Nibali, and G. Troiano, “Development and Comparative Evaluation of a Reinstructed GPT-4o Model Specialized in Periodontology,” J. Clin. Periodontol., vol. 52, no. 5, pp. 707–716, 2025, doi: https://doi.org/10.1111/jcpe.14101

S. Elmitwalli, J. Mehegan, S. Braznell, and A. Gallagher, “Scalable evaluation framework for retrieval augmented generation in tobacco research using large Language models,” Sci. Rep., vol. 15, no. 1, 2025, doi: https://doi.org/10.1038/s41598-025-05726-2

J. Lee, S. Ahn, D. Kim, and D. Kim, “Performance comparison of retrieval-augmented generation and fine-tuned large language models for construction safety management knowledge retrieval,” Autom. Constr., vol. 168, p. 105846, Dec. 2024, doi: https://doi.org/10.1016/j.autcon.2024.105846

D. D. Prasetya, A. Pinandito, Y. Hayashi, and T. Hirashima, “Analysis of quality of knowledge structure and students’ perceptions in extension concept mapping,” Res. Pract. Technol. Enhanc. Learn., vol. 17, no. 1, p. 14, 2022, doi: https://doi.org/10.1186/s41039-022-00189-9

Q. Chen, W. Zhou, J. Cheng, and J. Yang, “An Enhanced Retrieval Scheme for a Large Language Model with a Joint Strategy of Probabilistic Relevance and Semantic Association in the Vertical Domain,” Applied Sciences, vol. 14, no. 24, p. 11529, Dec. 2024, doi: https://doi.org/10.3390/app142411529

S. Xu, Z. Yan, C. Dai, and F. Wu, “MEGA-RAG: a retrieval-augmented generation framework with multi-evidence guided answer refinement for mitigating hallucinations of LLMs in public health,” Front. Public Health, vol. 13, Oct. 2025, doi: https://doi.org/10.3389/fpubh.2025.1635381

M. Ramesh et al., “Assessing WildfireGPT: a comparative analysis of AI models for quantitative wildfire spread prediction,” Natural Hazards, vol. 121, no. 11, pp. 13117–13130, Jun. 2025, doi: https://doi.org/10.1007/s11069-025-07344-7

V. Ramnarain-Seetohul, Y. Rosunally, and V. Bassoo, “A Unified Conceptual Hybrid Framework for the Automated Assessment of Short Answers,” Int. J. Artif. Intell. Educ., Jun. 2025, doi: https://doi.org/10.1007/s40593-025-00487-5

N. Lotfy, A. Shehab, M. Elhoseny, and A. Abu-Elfetouh, “An Enhanced Automatic Arabic Essay Scoring System Based on Machine Learning Algorithms,” Computers, Materials & Continua, vol. 77, no. 1, pp. 1227–1249, 2023, doi: https://doi.org/10.32604/cmc.2023.039185

A. Doewes, N. A. Kurdhi, and A. Saxena, “Evaluating Quadratic Weighted Kappa as the Standard Performance Metric for Automated Essay Scoring,” 2023, doi: https://doi.org/10.5281/zenodo.8115784

Y. Wang, Y. Wan, X. Lei, Q. Chen, and H. Hu, “A retrieval augmented generation based optimization approach for medical knowledge understanding and reasoning in large language models,” Array, vol. 28, p. 100504, 2025, doi: https://doi.org/10.1016/j.array.2025.100504

C. Yao and S. Fujita, “Adaptive Control of Retrieval-Augmented Generation for Large Language Models Through Reflective Tags,” Electronics (Basel)., vol. 13, no. 23, p. 4643, Nov. 2024, doi: https://doi.org/10.3390/electronics13234643

Y. Fukui, Y. Kawata, K. Kobashi, Y. Nagatani, and H. Iguchi, “Evaluation of a retrieval-augmented generation system using a Japanese Institutional Nuclear Medicine Manual and large language model-automated scoring,” Radiol. Phys. Technol., vol. 18, no. 3, pp. 861–876, 2025, doi: https://doi.org/10.1007/s12194-025-00941-y

R. Xu, Y. Hong, F. Zhang, and H. Xu, “Evaluation of the integration of retrieval-augmented generation in large language model for breast cancer nursing care responses,” Sci. Rep., vol. 14, no. 1, 2024, doi: https://doi.org/10.1038/s41598-024-81052-3

D. D. Prasetya, T. Widiyaningtyas, and T. Hirashima, “Interrelatedness patterns of knowledge representation in extension concept mapping,” Res. Pract. Technol. Enhanc. Learn., vol. 20, p. 009, May 2024, doi: https://doi.org/10.58459/rptel.2025.20009

Author Biography

Maskur Maskur, Universitas Negeri Malang; Politeknik Negeri Malang

Affiliation 1 : Department of Electrical Engineering and Informatics, Universitas Negeri Malang, Malang, Indonesia
Affiliation 2 : Departement of Business Administration, State Polytechnic of Malang, Indonesia

Issue

Vol. 11, No. 1, February 2026

Comparative Evaluation of BM25–FAISS and Small-LLM–GPT in Retrieval-Augmented Generation Concept Map Assessment

Corresponding Author(s) : Didik Dwi Prasetya

Abstract

Keywords

Download Citation

References

Author Biography

Downloads