
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Comparison of Word2Vec and GloVe performance in Bi-LSTM models for Indonesian news classification
Corresponding Author(s) : Muhammad Faris Wafda
Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control,
Vol. 11, No. 3, August 2026 (Article in Progress)
Abstract
The explosion in the volume of textual data from digital news presents challenges in classifying content automatically and efficiently. For the task of classifying Indonesian-language news, this study aims to compare the performance of several word embeddings specifically Word2Vec using CBOW and Skip-Gram architectures and GloVe when applied to a Bidirectional Long Short-Term Memory (Bi-LSTM) model. This study uses a dataset consisting of 6,715 news articles from the Indonesian news portal that have undergone pre-processing, divided into five categories. The model was trained using 80% of the training data with K-Fold Cross Validation (K=5), while the remaining 20% of the data was used for testing. The experimental findings indicate that the Bi-LSTM model, when combined with CBOW embedding, yielded the best performance, achieving 95.16% accuracy and a 95.15% F1-Score. The Skip-Gram model followed with solid performance, achieving an accuracy of 93.30% and the fastest computation time. Conversely, the model that used pre-trained GloVe embedding delivered the poorest performance, achieving 88.98% accuracy. This result suggests that training embeddings on a specific domain is more effective at capturing local context. The conclusion of this study confirms that selecting a word embedding method specifically trained on local datasets is also an important step in achieving optimal accuracy in Indonesian news text classification.
Keywords
Download Citation
Endnote/Zotero/Mendeley (RIS)BibTeX
- D. Cordeiro, C. Lopezosa, and J. Guallar, “A Methodological Framework for AI-Driven Textual Data Analysis in Digital Media,” Future Internet, vol. 17, no. 2, Feb. 2025, doi: https://doi.org/10.3390/fi17020059.
- N. Newman, A. Ross Arguedas, C. T. Robertson, R. Kleis Nielsen, and R. Fletcher, “Reuters Institute Digital News Report 2025”, doi: https://doi.org/10.60625/risj-8qqf-jt36.
- I. Ghozi Zulfikar, Y. Wibisono, and A. Wahyudin, “Intelligent News Aggregation System with Automatic Classification, Clustering, and Summarization,” vol. 5, no. 2, 2025, doi: https://doi.org/10.47709/brilliance.v5i2.6712.
- K. Taha, P. D. Yoo, C. Yeun, D. Homouz, and A. Taha, “A comprehensive survey of text classification techniques and their research applications: Observational and experimental insights,” Nov. 01, 2024, Elsevier Ireland Ltd. doi: https://doi.org/10.1016/j.cosrev.2024.100664.
- R. V Mohare, S. S. Uparkar, P. Y. Karmore, and V. Vardhan Budati, “Bag of Words to Bag of Concepts : Improving Text Categorization using SVM,” 2024. doi: https://doi.org/10.62441/nano-ntp.vi.1025.
- X. Guo, J. Wang, G. Gao, L. Li, J. Zhou, and Y. Li, “Improving Text Classification in Agricultural Expert Systems with a Bidirectional Encoder Recurrent Convolutional Neural Network,” Electronics (Switzerland), vol. 13, no. 20, Oct. 2024, doi: https://doi.org/10.3390/electronics13204054.
- E. Prasetio Widhi and D. Hatta Fudholi, “IMPLEMENTATION OF DEEP LEARNING FOR FAKE NEWS CLASSIFICATION IN BAHASA INDONESIA,” vol. 03, no. 02, pp. 370–381, doi: https://doi.org/10.59141/jrssem.v3i2.546.
- Z. Li, A. Basit, A. Daraz, and A. Jan, “Deep causal speech enhancement and recognition using efficient long-short term memory Recurrent Neural Network,” PLoS One, vol. 19, no. 1 January, Jan. 2024, doi: https://doi.org/10.1371/journal.pone.0291240.
- Z. Hameed and B. Garcia-Zapirain, “Sentiment Classification Using a Single-Layered BiLSTM Model,” IEEE Access, vol. 8, pp. 73992–74001, 2020, doi: https://doi.org/10.1109/ACCESS.2020.2988550.
- K. Babić, S. Martinčić-Ipšić, and A. Meštrović, “Survey of neural text representation models,” Nov. 01, 2020, MDPI AG. doi: https://doi.org/10.3390/info11110511.
- Y. Zhang, Y. Zhou, and J. T. Yao, “Feature Extraction with TF-IDF and Game-Theoretic Shadowed Sets,” in Communications in Computer and Information Science, Springer, 2020, pp. 722–733. doi: https://doi.org/10.1007/978-3-030-50146-4_53.
- H. Peng, Q. Ke, C. Budak, D. M. Romero, and Y.-Y. Ahn, “Neural embeddings of scholarly periodicals reveal complex disciplinary organizations,” 2021. doi: https://doi.org/10.1126/sciadv.abb9004.
- C. Galli, C. Cusano, S. Guizzardi, N. Donos, and E. Calciolari, “Embeddings for Efficient Literature Screening: A Primer for Life Science Investigators,” Metrics, vol. 1, no. 1, p. 1, Sep. 2024, doi: https://doi.org/10.3390/metrics1010001.
- S. F. Sabbeh and H. A. Fasihuddin, “A Comparative Analysis of Word Embedding and Deep Learning for Arabic Sentiment Classification,” Electronics (Switzerland), vol. 12, no. 6, Mar. 2023, doi: https://doi.org/10.3390/electronics12061425.
- D. S. , N. N. K. & S. P. Asudani, “Impact of word embedding models on text analytics in deep learning environment: a review.,” Artif Intell Rev, Sep. 2023, doi: https://doi.org/10.1007/s10462-023-10419-1.
- A. Vallebueno, C. Handan-Nader, C. D. Manning, and D. E. Ho, “Statistical Uncertainty in Word Embeddings: GloVe-V,” Jun. 2024, doi: https://doi.org/10.48550/arXiv.2406.12165.
- H. Alkaabi, A. K. Jasim, and A. Darroudi, “From Static to Contextual: A Survey of Embedding Advances in NLP,” PERFECT: Journal of Smart Algorithms, vol. 2, no. 2, pp. 57–66, Jul. 2025, doi: https://doi.org/10.62671/perfect.v2i2.77.
- T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient Estimation of Word Representations in Vector Space,” Sep. 2013, doi: https://doi.org/10.48550/arXiv.1301.3781.
- M. Gedeon, “A Comparative Analysis of Static Word Embeddings for Hungarian,” May 2025, doi: https://doi.org/10.48550/arXiv.2505.07809.
- H. A. Almuzaini and A. M. Azmi, “Impact of Stemming and Word Embedding on Deep Learning-Based Arabic Text Categorization,” IEEE Access, vol. 8, pp. 127913–127928, 2020, doi: https://doi.org/10.1109/ACCESS.2020.3009217.
- J. Pennington, R. Socher, and C. D. Manning, “GloVe: Global Vectors for Word Representation.” doi: https://doi.org/10.3115/v1/D14-1162.
- F. K. Khattak, S. Jeblee, C. Pou-Prom, M. Abdalla, C. Meaney, and F. Rudzicz, “A survey of word embeddings for clinical text,” Dec. 01, 2019, Academic Press Inc. doi: https://doi.org/10.1016/j.yjbinx.2019.100057.
- A. Nurdin, B. Anggo, S. Aji, A. Bustamin, and Z. Abidin, “PERBANDINGAN KINERJA WORD EMBEDDING WORD2VEC, GLOVE, DAN FASTTEXT PADA KLASIFIKASI TEKS,” Jurnal TEKNOKOMPAK, vol. 14, no. 2, p. 74, 2020, doi: https://doi.org/10.33365/jtk.v14i2.732.
- M. G. Adrian, S. S. Prasetyowati, and Y. Sibaroni, “Effectiveness of Word Embedding GloVe and Word2Vec within News Detection of Indonesian Using LSTM,” JURNAL MEDIA INFORMATIKA BUDIDARMA, vol. 7, no. 3, p. 1180, Jul. 2023, doi: https://doi.org/10.30865/mib.v7i3.6411.
- G. Xu, Y. Meng, X. Qiu, Z. Yu, and X. Wu, “Sentiment analysis of comment texts based on BiLSTM,” IEEE Access, vol. 7, pp. 51522–51532, 2019, doi: https://doi.org/10.1109/ACCESS.2019.2909919.
- R. Carlson, J. Bauer, and C. D. Manning, “A New Pair of GloVes,” Jul. 2025, doi: https://doi.org/10.48550/arXiv.2507.18103.
References
D. Cordeiro, C. Lopezosa, and J. Guallar, “A Methodological Framework for AI-Driven Textual Data Analysis in Digital Media,” Future Internet, vol. 17, no. 2, Feb. 2025, doi: https://doi.org/10.3390/fi17020059.
N. Newman, A. Ross Arguedas, C. T. Robertson, R. Kleis Nielsen, and R. Fletcher, “Reuters Institute Digital News Report 2025”, doi: https://doi.org/10.60625/risj-8qqf-jt36.
I. Ghozi Zulfikar, Y. Wibisono, and A. Wahyudin, “Intelligent News Aggregation System with Automatic Classification, Clustering, and Summarization,” vol. 5, no. 2, 2025, doi: https://doi.org/10.47709/brilliance.v5i2.6712.
K. Taha, P. D. Yoo, C. Yeun, D. Homouz, and A. Taha, “A comprehensive survey of text classification techniques and their research applications: Observational and experimental insights,” Nov. 01, 2024, Elsevier Ireland Ltd. doi: https://doi.org/10.1016/j.cosrev.2024.100664.
R. V Mohare, S. S. Uparkar, P. Y. Karmore, and V. Vardhan Budati, “Bag of Words to Bag of Concepts : Improving Text Categorization using SVM,” 2024. doi: https://doi.org/10.62441/nano-ntp.vi.1025.
X. Guo, J. Wang, G. Gao, L. Li, J. Zhou, and Y. Li, “Improving Text Classification in Agricultural Expert Systems with a Bidirectional Encoder Recurrent Convolutional Neural Network,” Electronics (Switzerland), vol. 13, no. 20, Oct. 2024, doi: https://doi.org/10.3390/electronics13204054.
E. Prasetio Widhi and D. Hatta Fudholi, “IMPLEMENTATION OF DEEP LEARNING FOR FAKE NEWS CLASSIFICATION IN BAHASA INDONESIA,” vol. 03, no. 02, pp. 370–381, doi: https://doi.org/10.59141/jrssem.v3i2.546.
Z. Li, A. Basit, A. Daraz, and A. Jan, “Deep causal speech enhancement and recognition using efficient long-short term memory Recurrent Neural Network,” PLoS One, vol. 19, no. 1 January, Jan. 2024, doi: https://doi.org/10.1371/journal.pone.0291240.
Z. Hameed and B. Garcia-Zapirain, “Sentiment Classification Using a Single-Layered BiLSTM Model,” IEEE Access, vol. 8, pp. 73992–74001, 2020, doi: https://doi.org/10.1109/ACCESS.2020.2988550.
K. Babić, S. Martinčić-Ipšić, and A. Meštrović, “Survey of neural text representation models,” Nov. 01, 2020, MDPI AG. doi: https://doi.org/10.3390/info11110511.
Y. Zhang, Y. Zhou, and J. T. Yao, “Feature Extraction with TF-IDF and Game-Theoretic Shadowed Sets,” in Communications in Computer and Information Science, Springer, 2020, pp. 722–733. doi: https://doi.org/10.1007/978-3-030-50146-4_53.
H. Peng, Q. Ke, C. Budak, D. M. Romero, and Y.-Y. Ahn, “Neural embeddings of scholarly periodicals reveal complex disciplinary organizations,” 2021. doi: https://doi.org/10.1126/sciadv.abb9004.
C. Galli, C. Cusano, S. Guizzardi, N. Donos, and E. Calciolari, “Embeddings for Efficient Literature Screening: A Primer for Life Science Investigators,” Metrics, vol. 1, no. 1, p. 1, Sep. 2024, doi: https://doi.org/10.3390/metrics1010001.
S. F. Sabbeh and H. A. Fasihuddin, “A Comparative Analysis of Word Embedding and Deep Learning for Arabic Sentiment Classification,” Electronics (Switzerland), vol. 12, no. 6, Mar. 2023, doi: https://doi.org/10.3390/electronics12061425.
D. S. , N. N. K. & S. P. Asudani, “Impact of word embedding models on text analytics in deep learning environment: a review.,” Artif Intell Rev, Sep. 2023, doi: https://doi.org/10.1007/s10462-023-10419-1.
A. Vallebueno, C. Handan-Nader, C. D. Manning, and D. E. Ho, “Statistical Uncertainty in Word Embeddings: GloVe-V,” Jun. 2024, doi: https://doi.org/10.48550/arXiv.2406.12165.
H. Alkaabi, A. K. Jasim, and A. Darroudi, “From Static to Contextual: A Survey of Embedding Advances in NLP,” PERFECT: Journal of Smart Algorithms, vol. 2, no. 2, pp. 57–66, Jul. 2025, doi: https://doi.org/10.62671/perfect.v2i2.77.
T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient Estimation of Word Representations in Vector Space,” Sep. 2013, doi: https://doi.org/10.48550/arXiv.1301.3781.
M. Gedeon, “A Comparative Analysis of Static Word Embeddings for Hungarian,” May 2025, doi: https://doi.org/10.48550/arXiv.2505.07809.
H. A. Almuzaini and A. M. Azmi, “Impact of Stemming and Word Embedding on Deep Learning-Based Arabic Text Categorization,” IEEE Access, vol. 8, pp. 127913–127928, 2020, doi: https://doi.org/10.1109/ACCESS.2020.3009217.
J. Pennington, R. Socher, and C. D. Manning, “GloVe: Global Vectors for Word Representation.” doi: https://doi.org/10.3115/v1/D14-1162.
F. K. Khattak, S. Jeblee, C. Pou-Prom, M. Abdalla, C. Meaney, and F. Rudzicz, “A survey of word embeddings for clinical text,” Dec. 01, 2019, Academic Press Inc. doi: https://doi.org/10.1016/j.yjbinx.2019.100057.
A. Nurdin, B. Anggo, S. Aji, A. Bustamin, and Z. Abidin, “PERBANDINGAN KINERJA WORD EMBEDDING WORD2VEC, GLOVE, DAN FASTTEXT PADA KLASIFIKASI TEKS,” Jurnal TEKNOKOMPAK, vol. 14, no. 2, p. 74, 2020, doi: https://doi.org/10.33365/jtk.v14i2.732.
M. G. Adrian, S. S. Prasetyowati, and Y. Sibaroni, “Effectiveness of Word Embedding GloVe and Word2Vec within News Detection of Indonesian Using LSTM,” JURNAL MEDIA INFORMATIKA BUDIDARMA, vol. 7, no. 3, p. 1180, Jul. 2023, doi: https://doi.org/10.30865/mib.v7i3.6411.
G. Xu, Y. Meng, X. Qiu, Z. Yu, and X. Wu, “Sentiment analysis of comment texts based on BiLSTM,” IEEE Access, vol. 7, pp. 51522–51532, 2019, doi: https://doi.org/10.1109/ACCESS.2019.2909919.
R. Carlson, J. Bauer, and C. D. Manning, “A New Pair of GloVes,” Jul. 2025, doi: https://doi.org/10.48550/arXiv.2507.18103.