Issue

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Maleo Emotion Audio Dataset Indonesia for Emotion Classification
Corresponding Author(s) : Sri Mentari Widya Ningrum Permana
Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control,
Vol. 11, No. 2, May 2026
Abstract
The limited availability of voice emotion corpora in Indonesian poses a challenge for the development of Speech Emotion Recognition (SER) systems, despite growing needs in sectors such as customer service and human-computer interaction. To address this, we developed the Maleo Emotion Audio Corpus, a collection of three-second audio clips with seven emotion labels (angry, neutral, disgusted, sad, happy, afraid, and surprised), sourced from YouTube. The audio data underwent preprocessing, feature extraction (MFCC, ZCR, energy, spectral roll-off, and spectral flux), and augmentation. The classification model was built using a 1D Convolutional Neural Network (CNN) architecture specifically adapted for the 3-second audio features, comprising four convolutional layers. Evaluation showed the model achieved 94.48% accuracy on the test data. The claim of balanced performance is supported by high F1-scores across all classes, ranging from 0.87 for 'sad' to 0.98 for 'neutral', indicating no single class dominated the results. These findings demonstrate that the developed corpus and model architecture have strong capability for recognizing emotions from Indonesian speech in a locally relevant context. Maleo Emotion collection is available at https://doi.org/10.57967/hf/6144.
Keywords
Download Citation
Endnote/Zotero/Mendeley (RIS)BibTeX
- H. N. Zahra, M. O. Ibrohim, J. Fahmi, R. Adelia, F. A. Nur Febryanto, and O. Riandi, “Speech emotion recognition on indonesian youtube web series using deep learning approach,” 2020 5th Int. Conf. Informatics Comput. ICIC 2020, 2020. https://doi.org/10.1109/ICIC50835.2020.9288650
- A. Bustamin, A. M. Rizky, E. Warni, I. S. Areni, and I. Indrabayu, “IndoWaveSentiment: Indonesian Audio Dataset for Emotion Classification,” Mendeley Data, vol. 1, 2024. https://doi.org/10.1016/j.dib.2024.111138
- D. Naresh Kumar, G. Deepak, and A. Santhanavijayan, “A Novel Semantic Approach for Intelligent Response Generation using Emotion Detection Incorporating NPMI Measure,” Procedia Comput. Sci., vol. 167, pp. 571–579, 2020. https://doi.org/10.1016/j.procs.2020.03.320
- D. Ardiyansyah and Jayanta, “Model Klasifikasi Emosi Berdasarkan Suara Manusia Dengan Metoode Multilater Perceptron,” Semin. Nas. Mhs. Ilmu Komput. dan Apl. Jakarta-Indonesia, no. April, pp. 689–702, 2021.
- T. B. Putri, S. Saidah, B. Hidayat, F. Qothrunnada, and D. Darwindra, “Deteksi Emosi Berdasarkan Sinyal Suara Manusia Menggunakan Discrete Wavelet Transform (DWT) Dengan Klasifikasi Support Vector Machine (SVM),” J. Ilmu Komput. dan Inform., vol. 3, no. 1, pp. 1–10, 2023. https://doi.org/10.54082/jiki.45
- D. Demszky, D. Movshovitz-Attias, J. Ko, A. Cowen, G. Nemade, and S. Ravi, “GoEmotions: A dataset of fine-grained emotions,” Proc. Annu. Meet. Assoc. Comput. Linguist., pp. 4040–4054, 2020. https://doi.org/10.18653/v1/2020.acl-main.372
- F. Kasyidi, R. Ilyas, and N. M. Annisa, “Peningkatan Kemampuan Pengenalan Emosi Melalui Suara dalam Bahasa Indonesia,” MIND J., vol. 6, no. 2, pp. 194–204, 2021. https://doi.org/10.26760/mindjournal.v6i2.194-204
- S. K. Girija Deshmukh, Apurva Gaonkar, Gauri Golwalkar, “Speech based Emotion Recognition using Machine Learning,” 2021 IEEE Mysore Sub Sect. Int. Conf. MysuruCon 2021, no. Iccmc, pp. 613–617, 2019. https://doi.org/10.1109/ICCMC.2019.8819858
- M. B. Akçay and K. Oğuz, “Speech emotion recognition: Emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers,” Speech Commun., vol. 116, pp. 56–76, 2020. https://doi.org/10.1016/j.specom.2019.12.001
- O. U. Kumala and A. Zahra, “Indonesian Speech Emotion Recognition using Cross-Corpus Method with the Combination of MFCC and Teager Energy Features,” Int. J. Adv. Comput. Sci. Appl., vol. 12, no. 4, pp. 163–168, 2021. https://doi.org/10.14569/IJACSA.2021.0120422
- F. R. K. Andre Julio Sumurung Marbun, Heriyanto, “Implementation of Mel-Frequency Cepstral Coefficient as Feature Extraction Method On Speech Audio Data,” vol. 21, no. 3, pp. 260–270, 2024. https://doi.org/10.31315/telematika.v21i3.12339
- A. G. Jondya and B. H. Iswanto, “Analisis dan Seleksi Fitur Audio pada Musik Tradisional Indonesia,” J. CoreIT J. Has. Penelit. Ilmu Komput. dan Teknol. Inf., vol. 4, no. 2, p. 77, 2018. https://doi.org/10.24014/coreit.v4i2.6506
- S. Helmiyah, A. Fadlil, and A. Yudhana, “Pengenalan Pola Emosi Manusia Berdasarkan Ucapan Menggunakan Ekstraksi Fitur Mel-Frequency Cepstral Coefficients (MFCC),” CogITo Smart J., vol. 4, no. 2, pp. 372–381, 2019. https://doi.org/10.31154/cogito.v4i2.129.372-381
- M. M. Billah, M. L. Sarker, and M. A. H. Akhand, “KBES: A dataset for realistic Bangla speech emotion recognition with intensity level,” Data Br., vol. 51, p. 109741, 2023. https://doi.org/10.1016/j.dib.2023.109741
- V. Sareen and K. R. Seeja, “Speech Emotion Recognition using Mel Spectrogram and Convolutional Neural Networks (CNN),” Procedia Comput. Sci., vol. 258, pp. 3693–3702, 2025. https://doi.org/10.1016/j.procs.2025.04.624
- R. Y. Rumagit, G. Alexander, and I. F. Saputra, “Model Comparison in Speech Emotion Recognition for Indonesian Language,” Procedia Comput. Sci., vol. 179, no. 2020, pp. 789–797, 2021. https://doi.org/10.1016/j.procs.2021.01.098
- F. Fahmi, M. A. Jiwanggi, and M. Adriani, “Speech-Emotion Detection in an {I}ndonesian Movie,” Proc. 1st Jt. Work. Spok. Lang. Technol. Under-resourced Lang. Collab. Comput. Under-Resourced Lang., no. May, pp. 185–193, 2020.
- G. Liu, S. Cai, and C. Wang, “Speech emotion recognition based on emotion perception,” Eurasip J. Audio, Speech, Music Process., vol. 2023, no. 1, 2023. https://doi.org/10.1186/s13636-023-00289-4
- I. Dewa Agung Adwitya Prawangsa and A. Eka Karyawati, “Penerapan Metode MFCC dan LSTM untuk Speech Emotion Recognition,” J. Elektron. Ilmu Komput. Udayana, vol. 12, no. 4, pp. 2654–5101, 2024.
- A. T. Puspasari and A. Sardjono, “Pembatasan Hak Cipta Terkait Remix Lagu Berdasarkan Doktrin Fair Use Dan Undang- Undang Nomor 28 Tahun 2014 Tentang Hak Cipta,” J. Huk. Pembang., vol. 2, no. 2, 2023. https://doi.org/10.21143/telj.vol2.no2.1040
- S. Kakuba and D. S. Han, “Addressing data scarcity in speech emotion recognition: A comprehensive review,” ICT Express, vol. 11, no. 1, pp. 110–123, 2025. https://doi.org/10.1016/j.icte.2024.11.003
- F. Jonatan Tanudjaja, E. Y. Puspaningrum, and V. Via, “Klasifikasi Jenis Emosi Melalui Ucapan Menggunakan Metode Convolutional Neural Network Type Of Emotions Classification Based On Speech Using Convolutional Neural Network Method,” Online) Teknol. J. Ilm. Sist. Inf., vol. 13, no. 2, pp. 1–11, 2023.
- A. Slimi, N. Haffar, M. Zrigui, and H. Nicolas, “Multiple Models Fusion for Multi-label Classification in Speech Emotion Recognition Systems,” Procedia Comput. Sci., vol. 207, no. Kes, pp. 2875–2882, 2022. https://doi.org/10.1016/j.procs.2022.09.345
- Riccosan, K. E. Saputra, G. D. Pratama, and A. Chowanda, “Emotion dataset from Indonesian public opinion,” Data Br., vol. 43, no. June 2024, 2022. https://doi.org/10.1016/j.dib.2022.108465
- Rini Andriani, Rizki Risdah Sitorus, Samuel Anaya Putra Zai, and Yesika Syalomi Pasaribu, “Penggunaan Algoritma CNN untuk Mengidentifikasi Jenis Anjing Menggunakan Metode Supervised Learning,” Mutiara J. Penelit. dan Karya Ilm., vol. 1, no. 6, pp. 393–403, 2023. https://doi.org/10.59059/mutiara.v1i6.741
References
H. N. Zahra, M. O. Ibrohim, J. Fahmi, R. Adelia, F. A. Nur Febryanto, and O. Riandi, “Speech emotion recognition on indonesian youtube web series using deep learning approach,” 2020 5th Int. Conf. Informatics Comput. ICIC 2020, 2020. https://doi.org/10.1109/ICIC50835.2020.9288650
A. Bustamin, A. M. Rizky, E. Warni, I. S. Areni, and I. Indrabayu, “IndoWaveSentiment: Indonesian Audio Dataset for Emotion Classification,” Mendeley Data, vol. 1, 2024. https://doi.org/10.1016/j.dib.2024.111138
D. Naresh Kumar, G. Deepak, and A. Santhanavijayan, “A Novel Semantic Approach for Intelligent Response Generation using Emotion Detection Incorporating NPMI Measure,” Procedia Comput. Sci., vol. 167, pp. 571–579, 2020. https://doi.org/10.1016/j.procs.2020.03.320
D. Ardiyansyah and Jayanta, “Model Klasifikasi Emosi Berdasarkan Suara Manusia Dengan Metoode Multilater Perceptron,” Semin. Nas. Mhs. Ilmu Komput. dan Apl. Jakarta-Indonesia, no. April, pp. 689–702, 2021.
T. B. Putri, S. Saidah, B. Hidayat, F. Qothrunnada, and D. Darwindra, “Deteksi Emosi Berdasarkan Sinyal Suara Manusia Menggunakan Discrete Wavelet Transform (DWT) Dengan Klasifikasi Support Vector Machine (SVM),” J. Ilmu Komput. dan Inform., vol. 3, no. 1, pp. 1–10, 2023. https://doi.org/10.54082/jiki.45
D. Demszky, D. Movshovitz-Attias, J. Ko, A. Cowen, G. Nemade, and S. Ravi, “GoEmotions: A dataset of fine-grained emotions,” Proc. Annu. Meet. Assoc. Comput. Linguist., pp. 4040–4054, 2020. https://doi.org/10.18653/v1/2020.acl-main.372
F. Kasyidi, R. Ilyas, and N. M. Annisa, “Peningkatan Kemampuan Pengenalan Emosi Melalui Suara dalam Bahasa Indonesia,” MIND J., vol. 6, no. 2, pp. 194–204, 2021. https://doi.org/10.26760/mindjournal.v6i2.194-204
S. K. Girija Deshmukh, Apurva Gaonkar, Gauri Golwalkar, “Speech based Emotion Recognition using Machine Learning,” 2021 IEEE Mysore Sub Sect. Int. Conf. MysuruCon 2021, no. Iccmc, pp. 613–617, 2019. https://doi.org/10.1109/ICCMC.2019.8819858
M. B. Akçay and K. Oğuz, “Speech emotion recognition: Emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers,” Speech Commun., vol. 116, pp. 56–76, 2020. https://doi.org/10.1016/j.specom.2019.12.001
O. U. Kumala and A. Zahra, “Indonesian Speech Emotion Recognition using Cross-Corpus Method with the Combination of MFCC and Teager Energy Features,” Int. J. Adv. Comput. Sci. Appl., vol. 12, no. 4, pp. 163–168, 2021. https://doi.org/10.14569/IJACSA.2021.0120422
F. R. K. Andre Julio Sumurung Marbun, Heriyanto, “Implementation of Mel-Frequency Cepstral Coefficient as Feature Extraction Method On Speech Audio Data,” vol. 21, no. 3, pp. 260–270, 2024. https://doi.org/10.31315/telematika.v21i3.12339
A. G. Jondya and B. H. Iswanto, “Analisis dan Seleksi Fitur Audio pada Musik Tradisional Indonesia,” J. CoreIT J. Has. Penelit. Ilmu Komput. dan Teknol. Inf., vol. 4, no. 2, p. 77, 2018. https://doi.org/10.24014/coreit.v4i2.6506
S. Helmiyah, A. Fadlil, and A. Yudhana, “Pengenalan Pola Emosi Manusia Berdasarkan Ucapan Menggunakan Ekstraksi Fitur Mel-Frequency Cepstral Coefficients (MFCC),” CogITo Smart J., vol. 4, no. 2, pp. 372–381, 2019. https://doi.org/10.31154/cogito.v4i2.129.372-381
M. M. Billah, M. L. Sarker, and M. A. H. Akhand, “KBES: A dataset for realistic Bangla speech emotion recognition with intensity level,” Data Br., vol. 51, p. 109741, 2023. https://doi.org/10.1016/j.dib.2023.109741
V. Sareen and K. R. Seeja, “Speech Emotion Recognition using Mel Spectrogram and Convolutional Neural Networks (CNN),” Procedia Comput. Sci., vol. 258, pp. 3693–3702, 2025. https://doi.org/10.1016/j.procs.2025.04.624
R. Y. Rumagit, G. Alexander, and I. F. Saputra, “Model Comparison in Speech Emotion Recognition for Indonesian Language,” Procedia Comput. Sci., vol. 179, no. 2020, pp. 789–797, 2021. https://doi.org/10.1016/j.procs.2021.01.098
F. Fahmi, M. A. Jiwanggi, and M. Adriani, “Speech-Emotion Detection in an {I}ndonesian Movie,” Proc. 1st Jt. Work. Spok. Lang. Technol. Under-resourced Lang. Collab. Comput. Under-Resourced Lang., no. May, pp. 185–193, 2020.
G. Liu, S. Cai, and C. Wang, “Speech emotion recognition based on emotion perception,” Eurasip J. Audio, Speech, Music Process., vol. 2023, no. 1, 2023. https://doi.org/10.1186/s13636-023-00289-4
I. Dewa Agung Adwitya Prawangsa and A. Eka Karyawati, “Penerapan Metode MFCC dan LSTM untuk Speech Emotion Recognition,” J. Elektron. Ilmu Komput. Udayana, vol. 12, no. 4, pp. 2654–5101, 2024.
A. T. Puspasari and A. Sardjono, “Pembatasan Hak Cipta Terkait Remix Lagu Berdasarkan Doktrin Fair Use Dan Undang- Undang Nomor 28 Tahun 2014 Tentang Hak Cipta,” J. Huk. Pembang., vol. 2, no. 2, 2023. https://doi.org/10.21143/telj.vol2.no2.1040
S. Kakuba and D. S. Han, “Addressing data scarcity in speech emotion recognition: A comprehensive review,” ICT Express, vol. 11, no. 1, pp. 110–123, 2025. https://doi.org/10.1016/j.icte.2024.11.003
F. Jonatan Tanudjaja, E. Y. Puspaningrum, and V. Via, “Klasifikasi Jenis Emosi Melalui Ucapan Menggunakan Metode Convolutional Neural Network Type Of Emotions Classification Based On Speech Using Convolutional Neural Network Method,” Online) Teknol. J. Ilm. Sist. Inf., vol. 13, no. 2, pp. 1–11, 2023.
A. Slimi, N. Haffar, M. Zrigui, and H. Nicolas, “Multiple Models Fusion for Multi-label Classification in Speech Emotion Recognition Systems,” Procedia Comput. Sci., vol. 207, no. Kes, pp. 2875–2882, 2022. https://doi.org/10.1016/j.procs.2022.09.345
Riccosan, K. E. Saputra, G. D. Pratama, and A. Chowanda, “Emotion dataset from Indonesian public opinion,” Data Br., vol. 43, no. June 2024, 2022. https://doi.org/10.1016/j.dib.2022.108465
Rini Andriani, Rizki Risdah Sitorus, Samuel Anaya Putra Zai, and Yesika Syalomi Pasaribu, “Penggunaan Algoritma CNN untuk Mengidentifikasi Jenis Anjing Menggunakan Metode Supervised Learning,” Mutiara J. Penelit. dan Karya Ilm., vol. 1, no. 6, pp. 393–403, 2023. https://doi.org/10.59059/mutiara.v1i6.741