
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Hate Speech Analysis Using IndoBERT in YouTube Comments on the 2024 Indonesian Presidential Debate Video
Corresponding Author(s) : Agus Sasmito Aribowo
Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control,
Vol. 11, No. 3, August 2026 (Article in Progress)
Abstract
A Hate speech in the digital political space during election campaigns has the potential to cause polarization and undermine the quality of public discussion. This study analyzes hate speech in YouTube comments related to the five stages of the 2024 Indonesian presidential debate. We used IndoBERT, a Transformer-based language model specifically trained in Indonesian, to classify comments into hate speech and non-hate speech categories. The dataset consists of 38,742 comments collected from official debate videos. The dataset was labeled using a combination of manual annotation (20%) and semi-supervised learning (80%) using a pseudo-labeling approach. Experimental results show that IndoBERT achieved an average accuracy of 89.7% and a macro F1-score of 0.89 across all stages. IndoBERT outperformed baseline models such as mBERT, SVM, and Random Forest. These findings suggest that IndoBERT is more effective in capturing the linguistic nuances and distinctive Indonesian political rhetoric than multilingual or classical models. This study contributes an Indonesian-language political dataset and a comprehensive evaluation of relevant hate speech detection models for further research. Keywords: hate speech, IndoBERT, 2024 presidential debate, semi-supervised learning.
Keywords
Download Citation
Endnote/Zotero/Mendeley (RIS)BibTeX
- V. Dewi, L. Ana, and K. Sri, “Media Sosial Sebagai Alat Kampanye Pemilu 2024 : Perspektif Pengguna Tiktok,” Jurnal Komunikasi Nusantara, vol. 6, pp. 30–37, 2024, doi: 10.33366/jkn.v6i1.442.
- L. Geni, E. Yulianti, and D. I. Sensuse, “Sentiment Analysis of Tweets Before the 2024 Elections in Indonesia Using IndoBERT Language Models,” Jurnal Ilmiah Teknik Elektro Komputer dan Informatika (JITEKI), vol. 9, no. 3, pp. 746–757, 2023, doi: 10.26555/jiteki.v9i3.26490.
- V. A. Tricahyo and S. M. Isa, “Classification of indonesian presidential campaign on twitter using word2vec,” International Journal of Advanced Trends in Computer Science and Engineering, vol. 9, no. 4, pp. 5501–5508, 2020, doi: 10.30534/ijatcse/2020/193942020.
- C. D. Wulandari et al., “Fenomena Buzzer Di Media Sosial Jelang Pemilu 2024 Dalam Perspektif Komunikasi Politik,” Avant Garde, vol. 11, no. 01, pp. 134–146, 2024, doi: 10.36080/ag.v11i1.2380.
- D. Volume, J. Bahasa, P. Densa, N. Mevia, and G. K. Assidik, “Sentimen Ujaran Kebencian Pasca Pemilu 2024 di Media Sosial,” Jurnal Bahasa, Sastra, Pembelajarannya, vol. 8, no. 2, 2025.
- A. Al-Laith, M. Shahbaz, H. F. Alaskar, and A. Rehmat, “Arasencorpus: A semi-supervised approach for sentiment annotation of a large arabic text corpus,” Applied Sciences (Switzerland), vol. 11, no. 5, 2021, doi: 10.3390/app11052434.
- M. A. Qureshi et al., “A novel auto-annotation technique for aspect level sentiment analysis,” Computers, Materials and Continua, vol. 70, no. 3, pp. 4987–5004, 2022, doi: 10.32604/cmc.2022.020544.
- A. S. Aribowo, N. H. Cahyana, and Y. Fauziah, “Enhancing Semi-Supervised Sentiment Analysis Through Hyperparameter Tuning Within Iterations: A Comparative Study Using Grid Search and Random Search,” in Proceedings of the 2023 1st International Conference on Advanced Informatics and Intelligent Information Systems (ICAI3S 2023), 2024, no. Icai3s, pp. 248–260, doi: 10.2991/978-94-6463-366-5_23.
- D. Y. Choi and B. C. Song, “Semi-Supervised Learning for Continuous Emotion Recognition Based on Metric Learning,” IEEE Access, vol. 8, pp. 113443–113455, 2020, doi: 10.1109/ACCESS.2020.3003125.
- N. H. Cahyana, S. Saifullah, Y. Fauziah, A. S. Aribowo, and R. Drezewski, “Semi-supervised Text Annotation for Hate Speech Detection using K-Nearest Neighbors and Term Frequency-Inverse Document Frequency,” International Journal of Advanced Computer Science and Applications, vol. 13, no. 10, pp. 147–151, 2022, doi: 10.14569/IJACSA.2022.0131020.
- S. Saifullah, R. Dreżewski, F. A. Dwiyanto, A. S. Aribowo, Y. Fauziah, and N. H. Cahyana, “Automated Text Annotation Using a Semi-Supervised Approach with Meta Vectorizer and Machine Learning Algorithms for Hate Speech Detection,” Applied Sciences, vol. 14, no. 3, p. 1078, 2024, doi: 10.3390/app14031078.
- P. Sayarizki and H. Nurrahmi, “Implementation of IndoBERT for Sentiment Analysis of Indonesian Presidential Candidates,” Indonesia Journal On Computing, vol. 9, no. August, pp. 61–72, 2024, doi: 10.34818/indojc.2024.9.2.934.
- R. I. Yulfa, B. H. Setiawan, G. G. Lourensius, and K. Purwandari, “Enhancing Hate Speech Detection in Social Media Using IndoBERT Model : A Study of Sentiment Analysis during the 2024 Indonesia Presidential Election,” 2023, doi: 10.1109/ICCA59364.2023.10401700.
- F. V. P. Samosir and S. Riyaldi, “Sentiment Analysis of TikTok Comments on Indonesian Presidential Elections Using IndoBERT,” 2024, doi: 10.1109/ICCIT62134.2024.10701256.
- R. N. Tanaja, A. Widjaya, Johnny, A. A. S. Gunawan, and K. E. Setiawan, “Evaluating Public Opinion on the 2024 Indonesian Presidential Election Candidate : An IndoBERT Approach to Twitter Sentiment Analysis,” 2024, doi: 10.1109/ICSCC62041.2024.10690796.
- A. Jazuli and R. Kusumaningrum, “Aspect-based sentiment analysis on student reviews using the Indo-Bert base model .,” in ICENIS 2023, 2023, vol. 04, pp. 1–10, doi: 10.1051/e3sconf/202344802004.
- Y. A. Singgalen, “IndoBERT-Based Sentiment Analysis for Understanding Hotel Guests ’ Preferences,” Journal of Computer System and Informatics, vol. 6, no. 2, pp. 508–520, 2025, doi: 10.47065/josyc.v6i2.6864.
- Enrico Fernandez, Anderies, M. G. Winata, F. H. Fasya, and A. A. S. Gunawan, “Improving IndoBERT for Sentiment Analysis on Indonesian Stock Trader Slang Language,” 2022, doi: 10.1109/IoTaIS56727.2022.9975975.
- M. A. K. Fata, S. Sumpeno, A. D. Wibawa, and D. A. Feryando, “Evaluating the Sentiment Analysis from Auto-Generated Summary Text Using IndoBERT Fine-Tuning Model in Indonesian News Text,” 2023, doi: 10.1109/CICN59264.2023.10402345.
- S. Khomsah and A. S. Aribowo, “Model text-preprocessing komentar Youtube dalam bahasa Indonesia,” Rekayasa Sistem dan Teknologi Informasi, RESTI, vol. 4, no. 4, pp. 648–654, 2020.
- A. S. Aribowo, H. Basiron, N. S. Herman, and S. Khomsah, “An Evaluation of Preprocessing Steps and Tree-based Ensemble Machine Learning for Analysing Sentiment on Indonesian YouTube Comments,” International Journal of Advanced Trends in Computer Science and Engineering, vol. 9, no. 5, pp. 7078–7086, 2020, doi: 10.30534/ijatcse/2020/29952020.
- J. L. Cruz Paulino, L. C. Antoja Almirol, J. M. Cruz Favila, K. A. G. Loria Aquino, A. Hernandez De La Cruz, and R. E. Roxas, “Multilingual Sentiment Analysis on Short Text Document Using Semi-Supervised Machine Learning,” ACM International Conference Proceeding Series, pp. 164–170, 2021, doi: 10.1145/3485768.3485775.
- A. S. Aribowo, H. Basiron, and N. F. A. Yusof, “Semi-supervised learning for sentiment classification with ensemble multi-classifier approach,” International Journal of Advances in Intelligent Informatics, vol. 8, no. 3, pp. 349–361, 2022, doi: 10.26555/ijain.v8i3.929.
- S. Saifullah, R. Dreżewski, F. A. Dwiyanto, A. S. Aribowo, and Y. Fauziah, “Sentiment Analysis Using Machine Learning Approach Based on Feature Extraction for Anxiety Detection,” Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 14074 LNCS, no. July, pp. 365–372, 2023, doi: 10.1007/978-3-031-36021-3_38.
- S. Khomsah, A. F. Hidayatullah, and A. S. Aribowo, “Comparison of the Effects of Feature Selection and Tree-Based Ensemble Machine Learning for Sentiment Analysis on Indonesian YouTube Comments,” in Lecture Notes in Electrical Engineering, vol. 746 LNEE, 2021, pp. 269–279.
References
V. Dewi, L. Ana, and K. Sri, “Media Sosial Sebagai Alat Kampanye Pemilu 2024 : Perspektif Pengguna Tiktok,” Jurnal Komunikasi Nusantara, vol. 6, pp. 30–37, 2024, doi: 10.33366/jkn.v6i1.442.
L. Geni, E. Yulianti, and D. I. Sensuse, “Sentiment Analysis of Tweets Before the 2024 Elections in Indonesia Using IndoBERT Language Models,” Jurnal Ilmiah Teknik Elektro Komputer dan Informatika (JITEKI), vol. 9, no. 3, pp. 746–757, 2023, doi: 10.26555/jiteki.v9i3.26490.
V. A. Tricahyo and S. M. Isa, “Classification of indonesian presidential campaign on twitter using word2vec,” International Journal of Advanced Trends in Computer Science and Engineering, vol. 9, no. 4, pp. 5501–5508, 2020, doi: 10.30534/ijatcse/2020/193942020.
C. D. Wulandari et al., “Fenomena Buzzer Di Media Sosial Jelang Pemilu 2024 Dalam Perspektif Komunikasi Politik,” Avant Garde, vol. 11, no. 01, pp. 134–146, 2024, doi: 10.36080/ag.v11i1.2380.
D. Volume, J. Bahasa, P. Densa, N. Mevia, and G. K. Assidik, “Sentimen Ujaran Kebencian Pasca Pemilu 2024 di Media Sosial,” Jurnal Bahasa, Sastra, Pembelajarannya, vol. 8, no. 2, 2025.
A. Al-Laith, M. Shahbaz, H. F. Alaskar, and A. Rehmat, “Arasencorpus: A semi-supervised approach for sentiment annotation of a large arabic text corpus,” Applied Sciences (Switzerland), vol. 11, no. 5, 2021, doi: 10.3390/app11052434.
M. A. Qureshi et al., “A novel auto-annotation technique for aspect level sentiment analysis,” Computers, Materials and Continua, vol. 70, no. 3, pp. 4987–5004, 2022, doi: 10.32604/cmc.2022.020544.
A. S. Aribowo, N. H. Cahyana, and Y. Fauziah, “Enhancing Semi-Supervised Sentiment Analysis Through Hyperparameter Tuning Within Iterations: A Comparative Study Using Grid Search and Random Search,” in Proceedings of the 2023 1st International Conference on Advanced Informatics and Intelligent Information Systems (ICAI3S 2023), 2024, no. Icai3s, pp. 248–260, doi: 10.2991/978-94-6463-366-5_23.
D. Y. Choi and B. C. Song, “Semi-Supervised Learning for Continuous Emotion Recognition Based on Metric Learning,” IEEE Access, vol. 8, pp. 113443–113455, 2020, doi: 10.1109/ACCESS.2020.3003125.
N. H. Cahyana, S. Saifullah, Y. Fauziah, A. S. Aribowo, and R. Drezewski, “Semi-supervised Text Annotation for Hate Speech Detection using K-Nearest Neighbors and Term Frequency-Inverse Document Frequency,” International Journal of Advanced Computer Science and Applications, vol. 13, no. 10, pp. 147–151, 2022, doi: 10.14569/IJACSA.2022.0131020.
S. Saifullah, R. Dreżewski, F. A. Dwiyanto, A. S. Aribowo, Y. Fauziah, and N. H. Cahyana, “Automated Text Annotation Using a Semi-Supervised Approach with Meta Vectorizer and Machine Learning Algorithms for Hate Speech Detection,” Applied Sciences, vol. 14, no. 3, p. 1078, 2024, doi: 10.3390/app14031078.
P. Sayarizki and H. Nurrahmi, “Implementation of IndoBERT for Sentiment Analysis of Indonesian Presidential Candidates,” Indonesia Journal On Computing, vol. 9, no. August, pp. 61–72, 2024, doi: 10.34818/indojc.2024.9.2.934.
R. I. Yulfa, B. H. Setiawan, G. G. Lourensius, and K. Purwandari, “Enhancing Hate Speech Detection in Social Media Using IndoBERT Model : A Study of Sentiment Analysis during the 2024 Indonesia Presidential Election,” 2023, doi: 10.1109/ICCA59364.2023.10401700.
F. V. P. Samosir and S. Riyaldi, “Sentiment Analysis of TikTok Comments on Indonesian Presidential Elections Using IndoBERT,” 2024, doi: 10.1109/ICCIT62134.2024.10701256.
R. N. Tanaja, A. Widjaya, Johnny, A. A. S. Gunawan, and K. E. Setiawan, “Evaluating Public Opinion on the 2024 Indonesian Presidential Election Candidate : An IndoBERT Approach to Twitter Sentiment Analysis,” 2024, doi: 10.1109/ICSCC62041.2024.10690796.
A. Jazuli and R. Kusumaningrum, “Aspect-based sentiment analysis on student reviews using the Indo-Bert base model .,” in ICENIS 2023, 2023, vol. 04, pp. 1–10, doi: 10.1051/e3sconf/202344802004.
Y. A. Singgalen, “IndoBERT-Based Sentiment Analysis for Understanding Hotel Guests ’ Preferences,” Journal of Computer System and Informatics, vol. 6, no. 2, pp. 508–520, 2025, doi: 10.47065/josyc.v6i2.6864.
Enrico Fernandez, Anderies, M. G. Winata, F. H. Fasya, and A. A. S. Gunawan, “Improving IndoBERT for Sentiment Analysis on Indonesian Stock Trader Slang Language,” 2022, doi: 10.1109/IoTaIS56727.2022.9975975.
M. A. K. Fata, S. Sumpeno, A. D. Wibawa, and D. A. Feryando, “Evaluating the Sentiment Analysis from Auto-Generated Summary Text Using IndoBERT Fine-Tuning Model in Indonesian News Text,” 2023, doi: 10.1109/CICN59264.2023.10402345.
S. Khomsah and A. S. Aribowo, “Model text-preprocessing komentar Youtube dalam bahasa Indonesia,” Rekayasa Sistem dan Teknologi Informasi, RESTI, vol. 4, no. 4, pp. 648–654, 2020.
A. S. Aribowo, H. Basiron, N. S. Herman, and S. Khomsah, “An Evaluation of Preprocessing Steps and Tree-based Ensemble Machine Learning for Analysing Sentiment on Indonesian YouTube Comments,” International Journal of Advanced Trends in Computer Science and Engineering, vol. 9, no. 5, pp. 7078–7086, 2020, doi: 10.30534/ijatcse/2020/29952020.
J. L. Cruz Paulino, L. C. Antoja Almirol, J. M. Cruz Favila, K. A. G. Loria Aquino, A. Hernandez De La Cruz, and R. E. Roxas, “Multilingual Sentiment Analysis on Short Text Document Using Semi-Supervised Machine Learning,” ACM International Conference Proceeding Series, pp. 164–170, 2021, doi: 10.1145/3485768.3485775.
A. S. Aribowo, H. Basiron, and N. F. A. Yusof, “Semi-supervised learning for sentiment classification with ensemble multi-classifier approach,” International Journal of Advances in Intelligent Informatics, vol. 8, no. 3, pp. 349–361, 2022, doi: 10.26555/ijain.v8i3.929.
S. Saifullah, R. Dreżewski, F. A. Dwiyanto, A. S. Aribowo, and Y. Fauziah, “Sentiment Analysis Using Machine Learning Approach Based on Feature Extraction for Anxiety Detection,” Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 14074 LNCS, no. July, pp. 365–372, 2023, doi: 10.1007/978-3-031-36021-3_38.
S. Khomsah, A. F. Hidayatullah, and A. S. Aribowo, “Comparison of the Effects of Feature Selection and Tree-Based Ensemble Machine Learning for Sentiment Analysis on Indonesian YouTube Comments,” in Lecture Notes in Electrical Engineering, vol. 746 LNEE, 2021, pp. 269–279.