This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Improving Automatic Essay Scoring for Indonesian Language using Simpler Model and Richer Feature
Corresponding Author(s) : Rian Adam Rajagede
Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control,
Vol. 6, No. 1, February 2021
Abstract
Automatic essay scoring is a machine learning task where we create a model that can automatically assess student essay answers. Automated essay scoring will be instrumental when the answer assessment process is on a large scale so that manual correction by humans can cause several problems. In 2019, the Ukara dataset was released for automatic essay scoring in the Indonesian language. The best model that has been published using the dataset produces an F1-score of 0.821 using pre-trained fastText sentence embedding and the stacking model between the neural network and XGBoost. In this study, we propose to use a simpler classifier model using a single hidden layer neural network but using a richer feature, namely BERT sentence embedding. Pre-trained model BERT sentence embedding extracts more information from sentences but has a smaller file size than fastText pre-trained model. The best model we propose manages to get a higher F1-score than the previous models on the Ukara dataset, which is 0.829.
Keywords
Download Citation
Endnote/Zotero/Mendeley (RIS)BibTeX
- B. W. Tuckman, “The Essay Test: A Look at the Advantages and Disadvantages,” NASSP Bulletin, vol. 77, no. 555, pp. 20–26, Oct. 1993. https://doi.org/10.1177%2F019263659307755504
- G. B. Herwanto, Y. Sari, B. N. Prastowo, M. Riasetiawan, I. A. Bustoni, and I. Hidayatulloh, “UKARA: A fast and simple automatic short answer scoring system for Bahasa Indonesia,” in Proceeding Book of 1stInternational Conference on Educational Assessment and Policy, 2018, vol. 2, pp. 48–53. https://doi.org/10.26499/iceap.v2i1.95
- R. A. Rajagede and R. P. Hastuti, “Stacking Neural Network Models for Automatic Short Answer Scoring,” arXiv preprint arXiv:2010.11092, 2020.
- T. Chen and C. Guestrin, “Xgboost: A scalable tree boosting system,” in Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, 2016, pp. 785–794. https://doi.org/10.1145/2939672.2939785
- É. Grave, P. Bojanowski, P. Gupta, A. Joulin, and T. Mikolov, “Learning Word Vectors for 157 Languages,” presented at the the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan, May 07, 2018.
- M. Schuster and K. K. Paliwal, “Bidirectional recurrent neural networks,” Trans. Sig. Proc., vol. 45, no. 11, pp. 2673–2681, Nov. 1997. https://doi.org/10.1109/78.650093
- T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” arXiv preprint arXiv:1301.3781, 2013.
- A. A. Septiandri, Y. A. Winatmoko, and I. F. Putra, “Knowing Right from Wrong: Should We Use More Complex Models for Automatic Short-Answer Scoring in Bahasa Indonesia?,” in Proceedings of SustaiNLP: Workshop on Simple and Efficient Natural Language Processing, 2020, pp. 1–7. http://dx.doi.org/10.18653/v1/2020.sustainlp-1.1
- K. Taghipour and H. T. Ng, “A neural approach to automated essay scoring,” in Proceedings of the 2016 conference on empirical methods in natural language processing, 2016, pp. 1882–1891. https://doi.org/10.18653/v1/d16-1193
- B. Riordan, A. Horbach, A. Cahill, T. Zesch, and C. Lee, “Investigating neural architectures for short answer scoring,” in Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications, 2017, pp. 159–168. http://dx.doi.org/10.18653/v1/W17-5017
- F. Dong, Y. Zhang, and J. Yang, “Attention-based recurrent convolutional neural network for automatic essay scoring,” in Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017), 2017, pp. 153–162. https://doi.org/10.18653/v1/k17-1017
- G. Liang, B.-W. On, D. Jeong, H.-C. Kim, and G. S. Choi, “Automated Essay Scoring: A Siamese Bidirectional LSTM Neural Network Architecture,” Symmetry, vol. 10, no. 12, Art. no. 12, Dec. 2018. https://doi.org/10.3390/sym10120682
- Y. Kim, “Convolutional Neural Networks for Sentence Classification,” in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014, pp. 1746–1751. https://doi.org/10.3115/v1/d14-1181
- A. Hassan and A. Mahmood, “Deep learning for sentence classification,” in 2017 IEEE Long Island Systems, Applications and Technology Conference (LISAT), 2017, pp. 1–5. https://doi.org/10.1109/lisat.2017.8001979
- A. F. Hidayatullah, S. Cahyaningtyas, and R. D. Pamungkas, “Attention-based CNN-BiLSTM for Dialect Identification on Javanese Text,” KINETIK, pp. 317–324, Nov. 2020. https://doi.org/10.22219/kinetik.v5i4.1121
- Y. Kumar, S. Aggarwal, D. Mahata, R. R. Shah, P. Kumaraguru, and R. Zimmermann, “Get IT Scored Using AutoSAS—An Automated System for Scoring Short Answers,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2019, vol. 33, pp. 9662–9669. https://doi.org/10.1609/aaai.v33i01.33019662
- M. A. Sultan, C. Salazar, and T. Sumner, “Fast and easy short answer grading with high accuracy,” in Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2016, pp. 1070–1075. http://dx.doi.org/10.18653/v1/N16-1123
- A. A. P. Ratna, B. Budiardjo, and D. Hartanto, “SIMPLE: System Automatic Essay Assessment for Indonesian Language Subject Examination,” Makara Journal of Technology, vol. 11, no. 1, p. 2, 2007. https://doi.org/10.7454/mst.v11i1.435
- T. A. Roshinta and F. Rahutomo, “Analisis Aspek-Aspek Ujian Esai Daring Berbahasa Indonesia,” Prosiding Sentrinov (Seminar Nasional Terapan Riset Inovatif), vol. 2, no. 1, Art. no. 1, Oct. 2016.
- F. Rahutomo and T. Roshinta, “Indonesian Query Answering Dataset for Online Essay Test System,” vol. 1, Aug. 2018. http://dx.doi.org/10.17632/6gp8m72s9p.1
- H. R. Acharya, A. D. Bhat, K. Avinash, and R. Srinath, “LegoNet - classification and extractive summarization of Indian legal judgments with Capsule Networks and Sentence Embeddings,” Journal of Intelligent & Fuzzy Systems, vol. 39, no. 2, pp. 2037–2046, Aug. 2020. https://doi.org/10.3233/JIFS-179870
- J. YU and J. JIANG, “Learning sentence embeddings with auxiliary tasks for cross-domain sentiment classification,” EMNLP 2016: Proceedings of the Conference on Empirical Methods in Natural Language Processing: Austin, Texas, November 1-5, pp. 236–246, Nov. 2016. http://dx.doi.org/10.18653/v1/D16-1023
- V. Indurthi, B. Syed, M. Shrivastava, N. Chakravartula, M. Gupta, and V. Varma, “FERMI at SemEval-2019 Task 5: Using Sentence embeddings to Identify Hate Speech Against Immigrants and Women in Twitter,” in Proceedings of the 13th International Workshop on Semantic Evaluation, Minneapolis, Minnesota, USA, 2019, pp. 70–74. http://dx.doi.org/10.18653/v1/S19-2009
- J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 2019, pp. 4171–4186.
- N. Reimers and I. Gurevych, “Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks,” in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019, pp. 3973–3983. http://dx.doi.org/10.18653/v1/D19-1410
- N. Reimers and I. Gurevych, “Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation,” in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020, pp. arXiv-2004.
- N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: synthetic minority over-sampling technique,” Journal of artificial intelligence research, vol. 16, pp. 321–357, 2002. https://doi.org/10.1613/jair.953
- D. P. Kingma, “Adam: A Method for Stochastic Optimization.,” presented at the 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings, San Diego, CA, USA, 2015.
- A. Paszke et al., “PyTorch: An Imperative Style, High-Performance Deep Learning Library,” in Advances in Neural Information Processing Systems 32, 2019, pp. 8026–8037.
- N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: a simple way to prevent neural networks from overfitting,” The journal of machine learning research, vol. 15, no. 1, Art. no. 1, 2014.
- I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” arXiv preprint arXiv:1711.05101, 2017.
- S. L. Smith, P.-J. Kindermans, C. Ying, and Q. V. Le, “Don’t Decay the Learning Rate, Increase the Batch Size,” presented at the 6th International Conference on Learning Representations, {ICLR} 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings, 2018.
- T. Akiba, S. Sano, T. Yanase, T. Ohta, and M. Koyama, “Optuna: A Next-generation Hyperparameter Optimization Framework,” in Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, New York, NY, USA, Jul. 2019, pp. 2623–2631. https://doi.org/10.1145/3292500.3330701
- J. Bergstra, D. Yamins, and D. D. Cox, “Hyperopt: A python library for optimizing the hyperparameters of machine learning algorithms,” in Proceedings of the 12th Python in science conference, 2013, vol. 13, pp. 20. https://doi.org/10.25080/Majora-8b375195-003
References
B. W. Tuckman, “The Essay Test: A Look at the Advantages and Disadvantages,” NASSP Bulletin, vol. 77, no. 555, pp. 20–26, Oct. 1993. https://doi.org/10.1177%2F019263659307755504
G. B. Herwanto, Y. Sari, B. N. Prastowo, M. Riasetiawan, I. A. Bustoni, and I. Hidayatulloh, “UKARA: A fast and simple automatic short answer scoring system for Bahasa Indonesia,” in Proceeding Book of 1stInternational Conference on Educational Assessment and Policy, 2018, vol. 2, pp. 48–53. https://doi.org/10.26499/iceap.v2i1.95
R. A. Rajagede and R. P. Hastuti, “Stacking Neural Network Models for Automatic Short Answer Scoring,” arXiv preprint arXiv:2010.11092, 2020.
T. Chen and C. Guestrin, “Xgboost: A scalable tree boosting system,” in Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, 2016, pp. 785–794. https://doi.org/10.1145/2939672.2939785
É. Grave, P. Bojanowski, P. Gupta, A. Joulin, and T. Mikolov, “Learning Word Vectors for 157 Languages,” presented at the the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan, May 07, 2018.
M. Schuster and K. K. Paliwal, “Bidirectional recurrent neural networks,” Trans. Sig. Proc., vol. 45, no. 11, pp. 2673–2681, Nov. 1997. https://doi.org/10.1109/78.650093
T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” arXiv preprint arXiv:1301.3781, 2013.
A. A. Septiandri, Y. A. Winatmoko, and I. F. Putra, “Knowing Right from Wrong: Should We Use More Complex Models for Automatic Short-Answer Scoring in Bahasa Indonesia?,” in Proceedings of SustaiNLP: Workshop on Simple and Efficient Natural Language Processing, 2020, pp. 1–7. http://dx.doi.org/10.18653/v1/2020.sustainlp-1.1
K. Taghipour and H. T. Ng, “A neural approach to automated essay scoring,” in Proceedings of the 2016 conference on empirical methods in natural language processing, 2016, pp. 1882–1891. https://doi.org/10.18653/v1/d16-1193
B. Riordan, A. Horbach, A. Cahill, T. Zesch, and C. Lee, “Investigating neural architectures for short answer scoring,” in Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications, 2017, pp. 159–168. http://dx.doi.org/10.18653/v1/W17-5017
F. Dong, Y. Zhang, and J. Yang, “Attention-based recurrent convolutional neural network for automatic essay scoring,” in Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017), 2017, pp. 153–162. https://doi.org/10.18653/v1/k17-1017
G. Liang, B.-W. On, D. Jeong, H.-C. Kim, and G. S. Choi, “Automated Essay Scoring: A Siamese Bidirectional LSTM Neural Network Architecture,” Symmetry, vol. 10, no. 12, Art. no. 12, Dec. 2018. https://doi.org/10.3390/sym10120682
Y. Kim, “Convolutional Neural Networks for Sentence Classification,” in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014, pp. 1746–1751. https://doi.org/10.3115/v1/d14-1181
A. Hassan and A. Mahmood, “Deep learning for sentence classification,” in 2017 IEEE Long Island Systems, Applications and Technology Conference (LISAT), 2017, pp. 1–5. https://doi.org/10.1109/lisat.2017.8001979
A. F. Hidayatullah, S. Cahyaningtyas, and R. D. Pamungkas, “Attention-based CNN-BiLSTM for Dialect Identification on Javanese Text,” KINETIK, pp. 317–324, Nov. 2020. https://doi.org/10.22219/kinetik.v5i4.1121
Y. Kumar, S. Aggarwal, D. Mahata, R. R. Shah, P. Kumaraguru, and R. Zimmermann, “Get IT Scored Using AutoSAS—An Automated System for Scoring Short Answers,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2019, vol. 33, pp. 9662–9669. https://doi.org/10.1609/aaai.v33i01.33019662
M. A. Sultan, C. Salazar, and T. Sumner, “Fast and easy short answer grading with high accuracy,” in Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2016, pp. 1070–1075. http://dx.doi.org/10.18653/v1/N16-1123
A. A. P. Ratna, B. Budiardjo, and D. Hartanto, “SIMPLE: System Automatic Essay Assessment for Indonesian Language Subject Examination,” Makara Journal of Technology, vol. 11, no. 1, p. 2, 2007. https://doi.org/10.7454/mst.v11i1.435
T. A. Roshinta and F. Rahutomo, “Analisis Aspek-Aspek Ujian Esai Daring Berbahasa Indonesia,” Prosiding Sentrinov (Seminar Nasional Terapan Riset Inovatif), vol. 2, no. 1, Art. no. 1, Oct. 2016.
F. Rahutomo and T. Roshinta, “Indonesian Query Answering Dataset for Online Essay Test System,” vol. 1, Aug. 2018. http://dx.doi.org/10.17632/6gp8m72s9p.1
H. R. Acharya, A. D. Bhat, K. Avinash, and R. Srinath, “LegoNet - classification and extractive summarization of Indian legal judgments with Capsule Networks and Sentence Embeddings,” Journal of Intelligent & Fuzzy Systems, vol. 39, no. 2, pp. 2037–2046, Aug. 2020. https://doi.org/10.3233/JIFS-179870
J. YU and J. JIANG, “Learning sentence embeddings with auxiliary tasks for cross-domain sentiment classification,” EMNLP 2016: Proceedings of the Conference on Empirical Methods in Natural Language Processing: Austin, Texas, November 1-5, pp. 236–246, Nov. 2016. http://dx.doi.org/10.18653/v1/D16-1023
V. Indurthi, B. Syed, M. Shrivastava, N. Chakravartula, M. Gupta, and V. Varma, “FERMI at SemEval-2019 Task 5: Using Sentence embeddings to Identify Hate Speech Against Immigrants and Women in Twitter,” in Proceedings of the 13th International Workshop on Semantic Evaluation, Minneapolis, Minnesota, USA, 2019, pp. 70–74. http://dx.doi.org/10.18653/v1/S19-2009
J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 2019, pp. 4171–4186.
N. Reimers and I. Gurevych, “Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks,” in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019, pp. 3973–3983. http://dx.doi.org/10.18653/v1/D19-1410
N. Reimers and I. Gurevych, “Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation,” in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020, pp. arXiv-2004.
N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: synthetic minority over-sampling technique,” Journal of artificial intelligence research, vol. 16, pp. 321–357, 2002. https://doi.org/10.1613/jair.953
D. P. Kingma, “Adam: A Method for Stochastic Optimization.,” presented at the 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings, San Diego, CA, USA, 2015.
A. Paszke et al., “PyTorch: An Imperative Style, High-Performance Deep Learning Library,” in Advances in Neural Information Processing Systems 32, 2019, pp. 8026–8037.
N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: a simple way to prevent neural networks from overfitting,” The journal of machine learning research, vol. 15, no. 1, Art. no. 1, 2014.
I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” arXiv preprint arXiv:1711.05101, 2017.
S. L. Smith, P.-J. Kindermans, C. Ying, and Q. V. Le, “Don’t Decay the Learning Rate, Increase the Batch Size,” presented at the 6th International Conference on Learning Representations, {ICLR} 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings, 2018.
T. Akiba, S. Sano, T. Yanase, T. Ohta, and M. Koyama, “Optuna: A Next-generation Hyperparameter Optimization Framework,” in Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, New York, NY, USA, Jul. 2019, pp. 2623–2631. https://doi.org/10.1145/3292500.3330701
J. Bergstra, D. Yamins, and D. D. Cox, “Hyperopt: A python library for optimizing the hyperparameters of machine learning algorithms,” in Proceedings of the 12th Python in science conference, 2013, vol. 13, pp. 20. https://doi.org/10.25080/Majora-8b375195-003