
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Maleo Emotion Audio Dataset Indonesia For Emotion Classification
Corresponding Author(s) : Sri Mentari Widya Ningrum Permana
Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control,
Vol. 11, No. 2, May 2026 (Article in Progress)
Abstract
The limited availability of voice emotion datasets in Indonesian poses a challenge in the development of Speech Emotion Recognition (SER) systems, even though the need for such systems continues to grow in various sectors such as customer service, education, and human-computer interaction. To address this challenge, this study developed the Maleo Emotion Audio Dataset, a collection of three-second audio clips labeled with seven emotion categories: angry, neutral, disgusted, sad, happy, afraid, and surprised. The data was collected from the YouTube platform, and the Maleo Emotion Dataset is available at https://huggingface.co/datasets/maleo-ai/maleo-emotion. It was processed through preprocessing, feature extraction, and augmentation stages. The five main features extracted include Zero Crossing Rate, energy, Mel-Frequency Cepstral Coefficients (MFCC), spectral roll-off, and spectral flux. To enhance generalization, augmentation techniques such as pitch shifting, noise injection, and time stretching were applied. The classification model was built using a Convolutional Neural Network (CNN) architecture with TensorFlow-based implementation. Evaluation showed that the model achieved 94.48% accuracy on the test data, with balanced performance across all emotion categories. These results demonstrate that the developed dataset and model architecture have high capability in effectively recognizing emotions from Indonesian speech in a locally relevant context.
Keywords
Download Citation
Endnote/Zotero/Mendeley (RIS)BibTeX
- H. N. Zahra, M. O. Ibrohim, J. Fahmi, R. Adelia, F. A. Nur Febryanto, and O. Riandi, “Speech emotion recognition on indonesian youtube web series using deep learning approach,” 2020 5th Int. Conf. Informatics Comput. ICIC 2020, 2020, doi: 10.1109/ICIC50835.2020.9288650.
- A. Bustamin, A. M. Rizky, E. Warni, I. S. Areni, and I. Indrabayu, “IndoWaveSentiment: Indonesian Audio Dataset for Emotion Classification,” Mendeley Data, vol. 1, 2024, doi: 10.1016/j.dib.2024.111138.
- D. Naresh Kumar, G. Deepak, and A. Santhanavijayan, “A Novel Semantic Approach for Intelligent Response Generation using Emotion Detection Incorporating NPMI Measure,” Procedia Comput. Sci., vol. 167, pp. 571–579, 2020, doi: 10.1016/j.procs.2020.03.320.
- D. Ardiyansyah and Jayanta, “Model Klasifikasi Emosi Berdasarkan Suara Manusia Dengan Metoode Multilater Perceptron,” Semin. Nas. Mhs. Ilmu Komput. dan Apl. Jakarta-Indonesia, no. April, pp. 689–702, 2021.
- T. B. Putri, S. Saidah, B. Hidayat, F. Qothrunnada, and D. Darwindra, “Deteksi Emosi Berdasarkan Sinyal Suara Manusia Menggunakan Discrete Wavelet Transform (DWT) Dengan Klasifikasi Support Vector Machine (SVM),” J. Ilmu Komput. dan Inform., vol. 3, no. 1, pp. 1–10, 2023, doi: 10.54082/jiki.45.
- D. Demszky, D. Movshovitz-Attias, J. Ko, A. Cowen, G. Nemade, and S. Ravi, “GoEmotions: A dataset of fine-grained emotions,” Proc. Annu. Meet. Assoc. Comput. Linguist., pp. 4040–4054, 2020, doi: 10.18653/v1/2020.acl-main.372.
- F. KASYIDI, R. ILYAS, and N. M. ANNISA, “Peningkatan Kemampuan Pengenalan Emosi Melalui Suara dalam Bahasa Indonesia,” MIND J., vol. 6, no. 2, pp. 194–204, 2021, doi: 10.26760/mindjournal.v6i2.194-204.
- S. K. Girija Deshmukh, Apurva Gaonkar, Gauri Golwalkar, “Speech based Emotion Recognition using Machine Learning,” 2021 IEEE Mysore Sub Sect. Int. Conf. MysuruCon 2021, no. Iccmc, pp. 613–617, 2019, doi: 10.1109/MysuruCon52639.2021.9641642.
- M. B. Akçay and K. Oğuz, “Speech emotion recognition: Emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers,” Speech Commun., vol. 116, pp. 56–76, 2020, doi: 10.1016/j.specom.2019.12.001.
- O. U. Kumala and A. Zahra, “Indonesian Speech Emotion Recognition using Cross-Corpus Method with the Combination of MFCC and Teager Energy Features,” Int. J. Adv. Comput. Sci. Appl., vol. 12, no. 4, pp. 163–168, 2021, doi: 10.14569/IJACSA.2021.0120422.
- F. R. K. Andre Julio Sumurung Marbun, Heriyanto, “Implementation of Mel-Frequency Cepstral Coefficient as Feature Extraction Method On Speech Audio Data,” vol. 21, no. 3, pp. 260–270, 2024, doi: 10.31515/telematika.v21i3.12339.
- A. G. Jondya and B. H. Iswanto, “Analisis dan Seleksi Fitur Audio pada Musik Tradisional Indonesia,” J. CoreIT J. Has. Penelit. Ilmu Komput. dan Teknol. Inf., vol. 4, no. 2, p. 77, 2018, doi: 10.24014/coreit.v4i2.6506.
- S. Helmiyah, A. Fadlil, and A. Yudhana, “Pengenalan Pola Emosi Manusia Berdasarkan Ucapan Menggunakan Ekstraksi Fitur Mel-Frequency Cepstral Coefficients (MFCC),” CogITo Smart J., vol. 4, no. 2, pp. 372–381, 2019, doi: 10.31154/cogito.v4i2.129.372-381.
- M. M. Billah, M. L. Sarker, and M. A. H. Akhand, “KBES: A dataset for realistic Bangla speech emotion recognition with intensity level,” Data Br., vol. 51, p. 109741, 2023, doi: 10.1016/j.dib.2023.109741.
- V. Sareen and K. R. Seeja, “Speech Emotion Recognition using Mel Spectrogram and Convolutional Neural Networks (CNN),” Procedia Comput. Sci., vol. 258, pp. 3693–3702, 2025, doi: 10.1016/j.procs.2025.04.624.
- R. Y. Rumagit, G. Alexander, and I. F. Saputra, “Model Comparison in Speech Emotion Recognition for Indonesian Language,” Procedia Comput. Sci., vol. 179, no. 2020, pp. 789–797, 2021, doi: 10.1016/j.procs.2021.01.098.
- F. Fahmi, M. A. Jiwanggi, and M. Adriani, “Speech-Emotion Detection in an {I}ndonesian Movie,” Proc. 1st Jt. Work. Spok. Lang. Technol. Under-resourced Lang. Collab. Comput. Under-Resourced Lang., no. May, pp. 185–193, 2020, [Online]. Available: https://www.aclweb.org/anthology/2020.sltu-1.26
- G. Liu, S. Cai, and C. Wang, “Speech emotion recognition based on emotion perception,” EURASIP J. Audio, Speech, Music Process., vol. 4, 2023, doi: 10.1186/s13636-023-00289-4.
- I. D. Agung, A. Prawangsa, and A. E. Karyawati, “Penerapan Metode MFCC dan LSTM untuk Speech Emotion Recognition,” vol. 12, no. 4, pp. 775–782, 2024.
- A. T. Puspasari and A. Sardjono, “Pembatasan Hak Cipta Terkait Remix Lagu Berdasarkan Doktrin Fair Use Dan Undang- Undang Nomor 28 Tahun 2014 Tentang Hak Cipta,” J. Huk. Pembang., vol. 2, no. 2, 2023, doi: 10.21143/telj.vol2.no2.1040.
- S. Kakuba and D. S. Han, “Addressing data scarcity in speech emotion recognition: A comprehensive review,” ICT Express, vol. 11, no. 1, pp. 110–123, 2024, doi: 10.1016/j.icte.2024.11.003.
- F. Jonatan Tanudjaja, E. Y. Puspaningrum, and V. Via, “Klasifikasi Jenis Emosi Melalui Ucapan Menggunakan Metode Convolutional Neural Network Type Of Emotions Classification Based On Speech Using Convolutional Neural Network Method,” Online) Teknol. J. Ilm. Sist. Inf., vol. 13, no. 2, pp. 1–11, 2023.
- A. Slimi, N. Haffar, M. Zrigui, and H. Nicolas, “Multiple Models Fusion for Multi-label Classification in Speech Emotion Recognition Systems,” Procedia Comput. Sci., vol. 207, no. Kes, pp. 2875–2882, 2022, doi: 10.1016/j.procs.2022.09.345.
- Riccosan, K. E. Saputra, G. D. Pratama, and A. Chowanda, “Emotion dataset from Indonesian public opinion,” Data Br., vol. 43, no. June 2024, 2022, doi: 10.1016/j.dib.2022.108465.
- Rini Andriani, Rizki Risdah Sitorus, Samuel Anaya Putra Zai, and Yesika Syalomi Pasaribu, “Penggunaan Algoritma CNN untuk Mengidentifikasi Jenis Anjing Menggunakan Metode Supervised Learning,” Mutiara J. Penelit. dan Karya Ilm., vol. 1, no. 6, pp. 393–403, 2023, doi: 10.59059/mutiara.v1i6.741.
References
H. N. Zahra, M. O. Ibrohim, J. Fahmi, R. Adelia, F. A. Nur Febryanto, and O. Riandi, “Speech emotion recognition on indonesian youtube web series using deep learning approach,” 2020 5th Int. Conf. Informatics Comput. ICIC 2020, 2020, doi: 10.1109/ICIC50835.2020.9288650.
A. Bustamin, A. M. Rizky, E. Warni, I. S. Areni, and I. Indrabayu, “IndoWaveSentiment: Indonesian Audio Dataset for Emotion Classification,” Mendeley Data, vol. 1, 2024, doi: 10.1016/j.dib.2024.111138.
D. Naresh Kumar, G. Deepak, and A. Santhanavijayan, “A Novel Semantic Approach for Intelligent Response Generation using Emotion Detection Incorporating NPMI Measure,” Procedia Comput. Sci., vol. 167, pp. 571–579, 2020, doi: 10.1016/j.procs.2020.03.320.
D. Ardiyansyah and Jayanta, “Model Klasifikasi Emosi Berdasarkan Suara Manusia Dengan Metoode Multilater Perceptron,” Semin. Nas. Mhs. Ilmu Komput. dan Apl. Jakarta-Indonesia, no. April, pp. 689–702, 2021.
T. B. Putri, S. Saidah, B. Hidayat, F. Qothrunnada, and D. Darwindra, “Deteksi Emosi Berdasarkan Sinyal Suara Manusia Menggunakan Discrete Wavelet Transform (DWT) Dengan Klasifikasi Support Vector Machine (SVM),” J. Ilmu Komput. dan Inform., vol. 3, no. 1, pp. 1–10, 2023, doi: 10.54082/jiki.45.
D. Demszky, D. Movshovitz-Attias, J. Ko, A. Cowen, G. Nemade, and S. Ravi, “GoEmotions: A dataset of fine-grained emotions,” Proc. Annu. Meet. Assoc. Comput. Linguist., pp. 4040–4054, 2020, doi: 10.18653/v1/2020.acl-main.372.
F. KASYIDI, R. ILYAS, and N. M. ANNISA, “Peningkatan Kemampuan Pengenalan Emosi Melalui Suara dalam Bahasa Indonesia,” MIND J., vol. 6, no. 2, pp. 194–204, 2021, doi: 10.26760/mindjournal.v6i2.194-204.
S. K. Girija Deshmukh, Apurva Gaonkar, Gauri Golwalkar, “Speech based Emotion Recognition using Machine Learning,” 2021 IEEE Mysore Sub Sect. Int. Conf. MysuruCon 2021, no. Iccmc, pp. 613–617, 2019, doi: 10.1109/MysuruCon52639.2021.9641642.
M. B. Akçay and K. Oğuz, “Speech emotion recognition: Emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers,” Speech Commun., vol. 116, pp. 56–76, 2020, doi: 10.1016/j.specom.2019.12.001.
O. U. Kumala and A. Zahra, “Indonesian Speech Emotion Recognition using Cross-Corpus Method with the Combination of MFCC and Teager Energy Features,” Int. J. Adv. Comput. Sci. Appl., vol. 12, no. 4, pp. 163–168, 2021, doi: 10.14569/IJACSA.2021.0120422.
F. R. K. Andre Julio Sumurung Marbun, Heriyanto, “Implementation of Mel-Frequency Cepstral Coefficient as Feature Extraction Method On Speech Audio Data,” vol. 21, no. 3, pp. 260–270, 2024, doi: 10.31515/telematika.v21i3.12339.
A. G. Jondya and B. H. Iswanto, “Analisis dan Seleksi Fitur Audio pada Musik Tradisional Indonesia,” J. CoreIT J. Has. Penelit. Ilmu Komput. dan Teknol. Inf., vol. 4, no. 2, p. 77, 2018, doi: 10.24014/coreit.v4i2.6506.
S. Helmiyah, A. Fadlil, and A. Yudhana, “Pengenalan Pola Emosi Manusia Berdasarkan Ucapan Menggunakan Ekstraksi Fitur Mel-Frequency Cepstral Coefficients (MFCC),” CogITo Smart J., vol. 4, no. 2, pp. 372–381, 2019, doi: 10.31154/cogito.v4i2.129.372-381.
M. M. Billah, M. L. Sarker, and M. A. H. Akhand, “KBES: A dataset for realistic Bangla speech emotion recognition with intensity level,” Data Br., vol. 51, p. 109741, 2023, doi: 10.1016/j.dib.2023.109741.
V. Sareen and K. R. Seeja, “Speech Emotion Recognition using Mel Spectrogram and Convolutional Neural Networks (CNN),” Procedia Comput. Sci., vol. 258, pp. 3693–3702, 2025, doi: 10.1016/j.procs.2025.04.624.
R. Y. Rumagit, G. Alexander, and I. F. Saputra, “Model Comparison in Speech Emotion Recognition for Indonesian Language,” Procedia Comput. Sci., vol. 179, no. 2020, pp. 789–797, 2021, doi: 10.1016/j.procs.2021.01.098.
F. Fahmi, M. A. Jiwanggi, and M. Adriani, “Speech-Emotion Detection in an {I}ndonesian Movie,” Proc. 1st Jt. Work. Spok. Lang. Technol. Under-resourced Lang. Collab. Comput. Under-Resourced Lang., no. May, pp. 185–193, 2020, [Online]. Available: https://www.aclweb.org/anthology/2020.sltu-1.26
G. Liu, S. Cai, and C. Wang, “Speech emotion recognition based on emotion perception,” EURASIP J. Audio, Speech, Music Process., vol. 4, 2023, doi: 10.1186/s13636-023-00289-4.
I. D. Agung, A. Prawangsa, and A. E. Karyawati, “Penerapan Metode MFCC dan LSTM untuk Speech Emotion Recognition,” vol. 12, no. 4, pp. 775–782, 2024.
A. T. Puspasari and A. Sardjono, “Pembatasan Hak Cipta Terkait Remix Lagu Berdasarkan Doktrin Fair Use Dan Undang- Undang Nomor 28 Tahun 2014 Tentang Hak Cipta,” J. Huk. Pembang., vol. 2, no. 2, 2023, doi: 10.21143/telj.vol2.no2.1040.
S. Kakuba and D. S. Han, “Addressing data scarcity in speech emotion recognition: A comprehensive review,” ICT Express, vol. 11, no. 1, pp. 110–123, 2024, doi: 10.1016/j.icte.2024.11.003.
F. Jonatan Tanudjaja, E. Y. Puspaningrum, and V. Via, “Klasifikasi Jenis Emosi Melalui Ucapan Menggunakan Metode Convolutional Neural Network Type Of Emotions Classification Based On Speech Using Convolutional Neural Network Method,” Online) Teknol. J. Ilm. Sist. Inf., vol. 13, no. 2, pp. 1–11, 2023.
A. Slimi, N. Haffar, M. Zrigui, and H. Nicolas, “Multiple Models Fusion for Multi-label Classification in Speech Emotion Recognition Systems,” Procedia Comput. Sci., vol. 207, no. Kes, pp. 2875–2882, 2022, doi: 10.1016/j.procs.2022.09.345.
Riccosan, K. E. Saputra, G. D. Pratama, and A. Chowanda, “Emotion dataset from Indonesian public opinion,” Data Br., vol. 43, no. June 2024, 2022, doi: 10.1016/j.dib.2022.108465.
Rini Andriani, Rizki Risdah Sitorus, Samuel Anaya Putra Zai, and Yesika Syalomi Pasaribu, “Penggunaan Algoritma CNN untuk Mengidentifikasi Jenis Anjing Menggunakan Metode Supervised Learning,” Mutiara J. Penelit. dan Karya Ilm., vol. 1, no. 6, pp. 393–403, 2023, doi: 10.59059/mutiara.v1i6.741.