Comparison of Word2Vec and GloVe performance in Bi-LSTM models for Indonesian news classification

Muhammad Faris Wafda; Husni; Ika Oktavia Suzanti; Firdaus Solihin; Mula'ab; Army Justitia

doi:10.22219/kinetik.v11i3.2608

Issue

Vol. 11, No. 3, August 2026 (Article in Progress)

Issue Published : Jun 4, 2026

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Comparison of Word2Vec and GloVe performance in Bi-LSTM models for Indonesian news classification

https://doi.org/10.22219/kinetik.v11i3.2608

Muhammad Faris Wafda

Universitas Trunojoyo Madura

Husni

Universitas Trunojoyo Madura

Ika Oktavia Suzanti

Firdaus Solihin

Universitas Trunojoyo Madura

Mula'ab

Universitas Trunojoyo Madura

Army Justitia

National Cheng Kung University

Corresponding Author(s) : Muhammad Faris Wafda

220411100039@student.trunojoyo.ac.id

Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control, Vol. 11, No. 3, August 2026 (Article in Progress)
Article Published : Jun 7, 2026

Abstract

The explosion in the volume of textual data from digital news presents challenges in classifying content automatically and efficiently. For the task of classifying Indonesian-language news, this study aims to compare the performance of several word embeddings specifically Word2Vec using CBOW and Skip-Gram architectures and GloVe when applied to a Bidirectional Long Short-Term Memory (Bi-LSTM) model. This study uses a dataset consisting of 6,715 news articles from the Indonesian news portal that have undergone pre-processing, divided into five categories. The model was trained using 80% of the training data with K-Fold Cross Validation (K=5), while the remaining 20% of the data was used for testing. The experimental findings indicate that the Bi-LSTM model, when combined with CBOW embedding, yielded the best performance, achieving 95.16% accuracy and a 95.15% F1-Score. The Skip-Gram model followed with solid performance, achieving an accuracy of 93.30% and the fastest computation time. Conversely, the model that used pre-trained GloVe embedding delivered the poorest performance, achieving 88.98% accuracy. This result suggests that training embeddings on a specific domain is more effective at capturing local context. The conclusion of this study confirms that selecting a word embedding method specifically trained on local datasets is also an important step in achieving optimal accuracy in Indonesian news text classification.

Keywords

Text Classification Bi-LSTM Word2Vec GloVe Indonesian News

Wafda, M. F., Husni, Ika Oktavia Suzanti, Solihin, F., Mula’ab, & Army Justitia. (2026). Comparison of Word2Vec and GloVe performance in Bi-LSTM models for Indonesian news classification. Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control, 11(3). https://doi.org/10.22219/kinetik.v11i3.2608

Download Citation

References

D. Cordeiro, C. Lopezosa, and J. Guallar, “A Methodological Framework for AI-Driven Textual Data Analysis in Digital Media,” Future Internet, vol. 17, no. 2, Feb. 2025, doi: https://doi.org/10.3390/fi17020059.
N. Newman, A. Ross Arguedas, C. T. Robertson, R. Kleis Nielsen, and R. Fletcher, “Reuters Institute Digital News Report 2025”, doi: https://doi.org/10.60625/risj-8qqf-jt36.
I. Ghozi Zulfikar, Y. Wibisono, and A. Wahyudin, “Intelligent News Aggregation System with Automatic Classification, Clustering, and Summarization,” vol. 5, no. 2, 2025, doi: https://doi.org/10.47709/brilliance.v5i2.6712.
K. Taha, P. D. Yoo, C. Yeun, D. Homouz, and A. Taha, “A comprehensive survey of text classification techniques and their research applications: Observational and experimental insights,” Nov. 01, 2024, Elsevier Ireland Ltd. doi: https://doi.org/10.1016/j.cosrev.2024.100664.
R. V Mohare, S. S. Uparkar, P. Y. Karmore, and V. Vardhan Budati, “Bag of Words to Bag of Concepts : Improving Text Categorization using SVM,” 2024. doi: https://doi.org/10.62441/nano-ntp.vi.1025.
X. Guo, J. Wang, G. Gao, L. Li, J. Zhou, and Y. Li, “Improving Text Classification in Agricultural Expert Systems with a Bidirectional Encoder Recurrent Convolutional Neural Network,” Electronics (Switzerland), vol. 13, no. 20, Oct. 2024, doi: https://doi.org/10.3390/electronics13204054.
E. Prasetio Widhi and D. Hatta Fudholi, “IMPLEMENTATION OF DEEP LEARNING FOR FAKE NEWS CLASSIFICATION IN BAHASA INDONESIA,” vol. 03, no. 02, pp. 370–381, doi: https://doi.org/10.59141/jrssem.v3i2.546.
Z. Li, A. Basit, A. Daraz, and A. Jan, “Deep causal speech enhancement and recognition using efficient long-short term memory Recurrent Neural Network,” PLoS One, vol. 19, no. 1 January, Jan. 2024, doi: https://doi.org/10.1371/journal.pone.0291240.
Z. Hameed and B. Garcia-Zapirain, “Sentiment Classification Using a Single-Layered BiLSTM Model,” IEEE Access, vol. 8, pp. 73992–74001, 2020, doi: https://doi.org/10.1109/ACCESS.2020.2988550.
K. Babić, S. Martinčić-Ipšić, and A. Meštrović, “Survey of neural text representation models,” Nov. 01, 2020, MDPI AG. doi: https://doi.org/10.3390/info11110511.
Y. Zhang, Y. Zhou, and J. T. Yao, “Feature Extraction with TF-IDF and Game-Theoretic Shadowed Sets,” in Communications in Computer and Information Science, Springer, 2020, pp. 722–733. doi: https://doi.org/10.1007/978-3-030-50146-4_53.
H. Peng, Q. Ke, C. Budak, D. M. Romero, and Y.-Y. Ahn, “Neural embeddings of scholarly periodicals reveal complex disciplinary organizations,” 2021. doi: https://doi.org/10.1126/sciadv.abb9004.
C. Galli, C. Cusano, S. Guizzardi, N. Donos, and E. Calciolari, “Embeddings for Efficient Literature Screening: A Primer for Life Science Investigators,” Metrics, vol. 1, no. 1, p. 1, Sep. 2024, doi: https://doi.org/10.3390/metrics1010001.
S. F. Sabbeh and H. A. Fasihuddin, “A Comparative Analysis of Word Embedding and Deep Learning for Arabic Sentiment Classification,” Electronics (Switzerland), vol. 12, no. 6, Mar. 2023, doi: https://doi.org/10.3390/electronics12061425.
D. S. , N. N. K. & S. P. Asudani, “Impact of word embedding models on text analytics in deep learning environment: a review.,” Artif Intell Rev, Sep. 2023, doi: https://doi.org/10.1007/s10462-023-10419-1.
A. Vallebueno, C. Handan-Nader, C. D. Manning, and D. E. Ho, “Statistical Uncertainty in Word Embeddings: GloVe-V,” Jun. 2024, doi: https://doi.org/10.48550/arXiv.2406.12165.
H. Alkaabi, A. K. Jasim, and A. Darroudi, “From Static to Contextual: A Survey of Embedding Advances in NLP,” PERFECT: Journal of Smart Algorithms, vol. 2, no. 2, pp. 57–66, Jul. 2025, doi: https://doi.org/10.62671/perfect.v2i2.77.
T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient Estimation of Word Representations in Vector Space,” Sep. 2013, doi: https://doi.org/10.48550/arXiv.1301.3781.
M. Gedeon, “A Comparative Analysis of Static Word Embeddings for Hungarian,” May 2025, doi: https://doi.org/10.48550/arXiv.2505.07809.
H. A. Almuzaini and A. M. Azmi, “Impact of Stemming and Word Embedding on Deep Learning-Based Arabic Text Categorization,” IEEE Access, vol. 8, pp. 127913–127928, 2020, doi: https://doi.org/10.1109/ACCESS.2020.3009217.
J. Pennington, R. Socher, and C. D. Manning, “GloVe: Global Vectors for Word Representation.” doi: https://doi.org/10.3115/v1/D14-1162.
F. K. Khattak, S. Jeblee, C. Pou-Prom, M. Abdalla, C. Meaney, and F. Rudzicz, “A survey of word embeddings for clinical text,” Dec. 01, 2019, Academic Press Inc. doi: https://doi.org/10.1016/j.yjbinx.2019.100057.
A. Nurdin, B. Anggo, S. Aji, A. Bustamin, and Z. Abidin, “PERBANDINGAN KINERJA WORD EMBEDDING WORD2VEC, GLOVE, DAN FASTTEXT PADA KLASIFIKASI TEKS,” Jurnal TEKNOKOMPAK, vol. 14, no. 2, p. 74, 2020, doi: https://doi.org/10.33365/jtk.v14i2.732.
M. G. Adrian, S. S. Prasetyowati, and Y. Sibaroni, “Effectiveness of Word Embedding GloVe and Word2Vec within News Detection of Indonesian Using LSTM,” JURNAL MEDIA INFORMATIKA BUDIDARMA, vol. 7, no. 3, p. 1180, Jul. 2023, doi: https://doi.org/10.30865/mib.v7i3.6411.
G. Xu, Y. Meng, X. Qiu, Z. Yu, and X. Wu, “Sentiment analysis of comment texts based on BiLSTM,” IEEE Access, vol. 7, pp. 51522–51532, 2019, doi: https://doi.org/10.1109/ACCESS.2019.2909919.
R. Carlson, J. Bauer, and C. D. Manning, “A New Pair of GloVes,” Jul. 2025, doi: https://doi.org/10.48550/arXiv.2507.18103.

References

D. Cordeiro, C. Lopezosa, and J. Guallar, “A Methodological Framework for AI-Driven Textual Data Analysis in Digital Media,” Future Internet, vol. 17, no. 2, Feb. 2025, doi: https://doi.org/10.3390/fi17020059.

N. Newman, A. Ross Arguedas, C. T. Robertson, R. Kleis Nielsen, and R. Fletcher, “Reuters Institute Digital News Report 2025”, doi: https://doi.org/10.60625/risj-8qqf-jt36.

I. Ghozi Zulfikar, Y. Wibisono, and A. Wahyudin, “Intelligent News Aggregation System with Automatic Classification, Clustering, and Summarization,” vol. 5, no. 2, 2025, doi: https://doi.org/10.47709/brilliance.v5i2.6712.

K. Taha, P. D. Yoo, C. Yeun, D. Homouz, and A. Taha, “A comprehensive survey of text classification techniques and their research applications: Observational and experimental insights,” Nov. 01, 2024, Elsevier Ireland Ltd. doi: https://doi.org/10.1016/j.cosrev.2024.100664.

R. V Mohare, S. S. Uparkar, P. Y. Karmore, and V. Vardhan Budati, “Bag of Words to Bag of Concepts : Improving Text Categorization using SVM,” 2024. doi: https://doi.org/10.62441/nano-ntp.vi.1025.

X. Guo, J. Wang, G. Gao, L. Li, J. Zhou, and Y. Li, “Improving Text Classification in Agricultural Expert Systems with a Bidirectional Encoder Recurrent Convolutional Neural Network,” Electronics (Switzerland), vol. 13, no. 20, Oct. 2024, doi: https://doi.org/10.3390/electronics13204054.

E. Prasetio Widhi and D. Hatta Fudholi, “IMPLEMENTATION OF DEEP LEARNING FOR FAKE NEWS CLASSIFICATION IN BAHASA INDONESIA,” vol. 03, no. 02, pp. 370–381, doi: https://doi.org/10.59141/jrssem.v3i2.546.

Z. Li, A. Basit, A. Daraz, and A. Jan, “Deep causal speech enhancement and recognition using efficient long-short term memory Recurrent Neural Network,” PLoS One, vol. 19, no. 1 January, Jan. 2024, doi: https://doi.org/10.1371/journal.pone.0291240.

Z. Hameed and B. Garcia-Zapirain, “Sentiment Classification Using a Single-Layered BiLSTM Model,” IEEE Access, vol. 8, pp. 73992–74001, 2020, doi: https://doi.org/10.1109/ACCESS.2020.2988550.

K. Babić, S. Martinčić-Ipšić, and A. Meštrović, “Survey of neural text representation models,” Nov. 01, 2020, MDPI AG. doi: https://doi.org/10.3390/info11110511.

Y. Zhang, Y. Zhou, and J. T. Yao, “Feature Extraction with TF-IDF and Game-Theoretic Shadowed Sets,” in Communications in Computer and Information Science, Springer, 2020, pp. 722–733. doi: https://doi.org/10.1007/978-3-030-50146-4_53.

H. Peng, Q. Ke, C. Budak, D. M. Romero, and Y.-Y. Ahn, “Neural embeddings of scholarly periodicals reveal complex disciplinary organizations,” 2021. doi: https://doi.org/10.1126/sciadv.abb9004.

C. Galli, C. Cusano, S. Guizzardi, N. Donos, and E. Calciolari, “Embeddings for Efficient Literature Screening: A Primer for Life Science Investigators,” Metrics, vol. 1, no. 1, p. 1, Sep. 2024, doi: https://doi.org/10.3390/metrics1010001.

S. F. Sabbeh and H. A. Fasihuddin, “A Comparative Analysis of Word Embedding and Deep Learning for Arabic Sentiment Classification,” Electronics (Switzerland), vol. 12, no. 6, Mar. 2023, doi: https://doi.org/10.3390/electronics12061425.

D. S. , N. N. K. & S. P. Asudani, “Impact of word embedding models on text analytics in deep learning environment: a review.,” Artif Intell Rev, Sep. 2023, doi: https://doi.org/10.1007/s10462-023-10419-1.

A. Vallebueno, C. Handan-Nader, C. D. Manning, and D. E. Ho, “Statistical Uncertainty in Word Embeddings: GloVe-V,” Jun. 2024, doi: https://doi.org/10.48550/arXiv.2406.12165.

H. Alkaabi, A. K. Jasim, and A. Darroudi, “From Static to Contextual: A Survey of Embedding Advances in NLP,” PERFECT: Journal of Smart Algorithms, vol. 2, no. 2, pp. 57–66, Jul. 2025, doi: https://doi.org/10.62671/perfect.v2i2.77.

T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient Estimation of Word Representations in Vector Space,” Sep. 2013, doi: https://doi.org/10.48550/arXiv.1301.3781.

M. Gedeon, “A Comparative Analysis of Static Word Embeddings for Hungarian,” May 2025, doi: https://doi.org/10.48550/arXiv.2505.07809.

H. A. Almuzaini and A. M. Azmi, “Impact of Stemming and Word Embedding on Deep Learning-Based Arabic Text Categorization,” IEEE Access, vol. 8, pp. 127913–127928, 2020, doi: https://doi.org/10.1109/ACCESS.2020.3009217.

J. Pennington, R. Socher, and C. D. Manning, “GloVe: Global Vectors for Word Representation.” doi: https://doi.org/10.3115/v1/D14-1162.

F. K. Khattak, S. Jeblee, C. Pou-Prom, M. Abdalla, C. Meaney, and F. Rudzicz, “A survey of word embeddings for clinical text,” Dec. 01, 2019, Academic Press Inc. doi: https://doi.org/10.1016/j.yjbinx.2019.100057.

A. Nurdin, B. Anggo, S. Aji, A. Bustamin, and Z. Abidin, “PERBANDINGAN KINERJA WORD EMBEDDING WORD2VEC, GLOVE, DAN FASTTEXT PADA KLASIFIKASI TEKS,” Jurnal TEKNOKOMPAK, vol. 14, no. 2, p. 74, 2020, doi: https://doi.org/10.33365/jtk.v14i2.732.

M. G. Adrian, S. S. Prasetyowati, and Y. Sibaroni, “Effectiveness of Word Embedding GloVe and Word2Vec within News Detection of Indonesian Using LSTM,” JURNAL MEDIA INFORMATIKA BUDIDARMA, vol. 7, no. 3, p. 1180, Jul. 2023, doi: https://doi.org/10.30865/mib.v7i3.6411.

G. Xu, Y. Meng, X. Qiu, Z. Yu, and X. Wu, “Sentiment analysis of comment texts based on BiLSTM,” IEEE Access, vol. 7, pp. 51522–51532, 2019, doi: https://doi.org/10.1109/ACCESS.2019.2909919.

R. Carlson, J. Bauer, and C. D. Manning, “A New Pair of GloVes,” Jul. 2025, doi: https://doi.org/10.48550/arXiv.2507.18103.

Author biographies is not available.

Issue

Vol. 11, No. 3, August 2026 (Article in Progress)

Comparison of Word2Vec and GloVe performance in Bi-LSTM models for Indonesian news classification

Corresponding Author(s) : Muhammad Faris Wafda

Abstract

Keywords

Download Citation

References

Downloads