Evaluation of Stratified K-Fold Cross Validation for Predicting Bug Severity in Game Review Classification

Mustika Kurnia Mayangsari; Iwan Syarif; Aliridho Barakbah

doi:10.22219/kinetik.v8i3.1740

Issue

Vol. 8, No. 3, August 2023

Issue Published : Aug 31, 2023

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Evaluation of Stratified K-Fold Cross Validation for Predicting Bug Severity in Game Review Classification

https://doi.org/10.22219/kinetik.v8i3.1740

Mustika Kurnia Mayangsari

Politeknik Elektronika Negeri Surabaya

Iwan Syarif

Politeknik Elektronika Negeri Surabaya

Aliridho Barakbah

Politeknik Elektronika Negeri Surabaya

Corresponding Author(s) : Mustika Kurnia Mayangsari

mustikakurniam@gmail.com

Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control, Vol. 8, No. 3, August 2023
Article Published : Aug 31, 2023

Abstract

Steam review data provides a lot of information for the game development team, either positive or negative reviews. It is essential as negative and positive reviews provide crucial information, and 7% of positive reviews contains bug reports. These bug reports were captured after the game was released, and many reports of common problems still exist. If players found an issue in the game, they could report it directly through the review feature provided by the online game platform. However, it took a long time for the development team to manually analyze and classify the reviews. This study proposed a new approach to automatically classify the reviews on Steam based on the bug severity level. Therefore, to solve this problem, we recommend a solution based on the research background indicated above. For this experiment, we analyzed reviews on two popular game titles namely, FIFA 23 and Apex Legends. We implemented three different classifiers, namely KNN, Decision Tree, and Naïve Bayes, which would be used to train a dataset to classify the bug severity level. Due to the imbalanced dataset, we performed cross-validation to reduce bias in the dataset. Performance in this model would be evaluated using accuracy rate, precision, recall, and F1 score. As a result, the experiment showed that game reviews of different game titles achieved different accuracy scores. The game review classification for FIFA 23 performed better than the game review classification for Apex Legends. The mean accuracy score of FIFA 23 was 72% with Decision Tree and Apex Legend was 64% with KNN.

Keywords

Steam Bug Severity Level KNN Decision Tree Naïve Bayes Text Classification N-Gram Game Review SKCV

Mayangsari, M. K., Syarif, I., & Barakbah, A. (2023). Evaluation of Stratified K-Fold Cross Validation for Predicting Bug Severity in Game Review Classification. Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control, 8(3). https://doi.org/10.22219/kinetik.v8i3.1740

Download Citation

References

Steamworks Development, “Steam - 2020 Year in Review.”.
B. Dean, “Steam Usage and Catalog Stats for 2022.”.
D. Lin, C.-P. Bezemer, Y. Zou, and A. E. Hassan, “An empirical study of game reviews on the Steam platform,” Empir Softw Eng, vol. 24, no. 1, pp. 170–207, Feb. 2019. https://doi.org/10.1007/s10664-018-9627-4
I. J. Livingston, L. E. Nacke, and R. L. Mandryk, “The impact of negative game reviews and user comments on player experience,” in ACM SIGGRAPH 2011 Game Papers, New York, NY, USA: ACM, Aug. 2011, pp. 1–5. https://doi.org/10.1145/2037692.2037697
M. Washburn, P. Sathiyanarayanan, M. Nagappan, T. Zimmermann, and C. Bird, “What went right and what went wrong,” in Proceedings of the 38th International Conference on Software Engineering Companion, New York, NY, USA: ACM, May 2016, pp. 280–289. https://doi.org/10.1145/2889160.2889253
Valve, “Steam Chart - Apex Legends.”.
L. Levy and J. Novak, “Planning Your Strategy: Bug Categories, Tools & Documentation,” in Game Development Essentials: Game QA & Testing, New York: Delmar, Cengage Learning, 2010, p. 77.
A. Lamkanfi, S. Demeyer, E. Giger, and B. Goethals, “Predicting the severity of a reported bug,” in 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010), IEEE, May 2010, pp. 1–10. https://doi.org/10.1109/MSR.2010.5463284
W. Maalej and H. Nabil, “Bug report, feature request, or simply praise? On automatically classifying app reviews,” in 2015 IEEE 23rd International Requirements Engineering Conference (RE), IEEE, Aug. 2015, pp. 116–125. https://doi.org/10.1109/RE.2015.7320414
I. M. Mika Parwita and D. Siahaan, “Classification of Mobile Application Reviews using Word Embedding and Convolutional Neural Network,” Lontar Komputer : Jurnal Ilmiah Teknologi Informasi, p. 1, May 2019. https://doi.org/10.24843/LKJITI.2019.v10.i01.p01
K. Phetrungnapha and T. Senivongse, “Classification of Mobile Application User Reviews for Generating Tickets on Issue Tracking System,” in 2019 12th International Conference on Information & Communication Technology and System (ICTS), IEEE, Jul. 2019, pp. 229–234. https://doi.org/10.1109/ICTS.2019.8850962
H. Zhu, E. Chen, H. Xiong, H. Cao, and J. Tian, “Mobile App Classification with Enriched Contextual Information,” IEEE Trans Mob Comput, vol. 13, no. 7, pp. 1550–1563, Jul. 2014. https://doi.org/10.1109/TMC.2013.113
A. F. Hidayatullah and M. R. Ma’arif, “Pre-processing Tasks in Indonesian Twitter Messages,” J Phys Conf Ser, vol. 801, p. 012072, Jan. 2017. https://doi.org/10.1088/1742-6596/801/1/012072
L. Hickman, S. Thapa, L. Tay, M. Cao, and P. Srinivasan, “Text Preprocessing for Text Mining in Organizational Research: Review and Recommendations,” Organ Res Methods, vol. 25, no. 1, pp. 114–146, Jan. 2022. https://doi.org/10.1177/1094428120971683
T. Kolajo, O. Daramola, A. Adebiyi, and A. Seth, “A framework for pre-processing of social media feeds based on integrated local knowledge base,” Inf Process Manag, vol. 57, no. 6, p. 102348, Nov. 2020. https://doi.org/10.1016/j.ipm.2020.102348
M. Işık and H. Dag, “The impact of text preprocessing on the prediction of review ratings,” Turkish Journal of Electrical Engineering & Computer Sciences, vol. 28, no. 3, pp. 1405–1421, May 2020. https://doi.org/10.3906/elk-1907-46
D. Sarkar, Text Analytics with Python. Berkeley, CA: Apress, 2016. doi: 10.1007/978-1-4842-2388-8.
M. Xu, L. He, and X. Lin, “A Refined TF-IDF Algorithm Based on Channel Distribution Information for Web News Feature Extraction,” in 2010 Second International Workshop on Education Technology and Computer Science, IEEE, 2010, pp. 15–19. https://doi.org/10.1109/ETCS.2010.130
M. Alodadi and V. P. Janeja, “Similarity in Patient Support Forums Using TF-IDF and Cosine Similarity Metrics,” in 2015 International Conference on Healthcare Informatics, IEEE, Oct. 2015, pp. 521–522. https://doi.org/10.1109/ICHI.2015.99
M. Alfian, A. R. Barakbah, and I. Winarno, “Indonesian Online News Extraction and Clustering Using Evolving Clustering,” JOIV : International Journal on Informatics Visualization, vol. 5, no. 3, p. 280, Sep. 2021. http://dx.doi.org/10.30630/joiv.5.3.537
R. Bey, R. Goussault, F. Grolleau, M. Benchoufi, and R. Porcher, “Fold-stratified cross-validation for unbiased and privacy-preserving federated learning,” Journal of the American Medical Informatics Association, vol. 27, no. 8, pp. 1244–1251, Aug. 2020. https://doi.org/10.1093/jamia/ocaa096
S. Prusty, S. Patnaik, and S. K. Dash, “SKCV: Stratified K-fold cross-validation on ML classifiers for predicting cervical cancer,” Frontiers in Nanotechnology, vol. 4, Aug. 2022. https://doi.org/10.3389/fnano.2022.972421
G. Alfian et al., “Predicting Breast Cancer from Risk Factors Using SVM and Extra-Trees-Based Feature Selection Method,” Computers, vol. 11, no. 9, p. 136, Sep. 2022. https://doi.org/10.3390/computers11090136
G. Guo, H. Wang, D. Bell, Y. Bi, and K. Greer, “KNN Model-Based Approach in Classification,” in n The Move to Meaningful Internet Systems 2003: CoopIS, DOA, and ODBASE, 2003, pp. 986–996. https://doi.org/10.1007/978-3-540-39964-3_62
B. Charbuty and A. Abdulazeez, “Classification Based on Decision Tree Algorithm for Machine Learning,” Journal of Applied Science and Technology Trends, vol. 2, no. 01, pp. 20–28, Mar. 2021.
W. Zhang and F. Gao, “An Improvement to Naive Bayes for Text Classification,” Procedia Eng, vol. 15, pp. 2160–2164, 2011. https://doi.org/10.1016/j.proeng.2011.08.404
S. A. Hicks et al., “On evaluation metrics for medical applications of artificial intelligence,” Sci Rep, vol. 12, no. 1, p. 5979, Apr. 2022. https://doi.org/10.1038/s41598-022-09954-8
I. E. Tiffani, “Optimization of Naïve Bayes Classifier By Implemented Unigram, Bigram, Trigram for Sentiment Analysis of Hotel Review,” Journal of Soft Computing Exploration, vol. 1, no. 1, Sep. 2020. https://doi.org/10.52465/joscex.v1i1.4
T. Pranckevičius and V. Marcinkevičius, “Comparison of Naive Bayes, Random Forest, Decision Tree, Support Vector Machines, and Logistic Regression Classifiers for Text Reviews Classification,” Baltic Journal of Modern Computing, vol. 5, no. 2, 2017. http://dx.doi.org/10.22364/bjmc.2017.5.2.05
A. Tye, “FIFA 23 Career Mode Not Working, How to Fix?,” Apr. 03, 2023.
A. Mullins, “Apex Legends fans slam ‘garbage’ anti-cheat after embarrassing hacker clip,” Oct. 21, 2021.

References

Steamworks Development, “Steam - 2020 Year in Review.”.

B. Dean, “Steam Usage and Catalog Stats for 2022.”.

D. Lin, C.-P. Bezemer, Y. Zou, and A. E. Hassan, “An empirical study of game reviews on the Steam platform,” Empir Softw Eng, vol. 24, no. 1, pp. 170–207, Feb. 2019. https://doi.org/10.1007/s10664-018-9627-4

I. J. Livingston, L. E. Nacke, and R. L. Mandryk, “The impact of negative game reviews and user comments on player experience,” in ACM SIGGRAPH 2011 Game Papers, New York, NY, USA: ACM, Aug. 2011, pp. 1–5. https://doi.org/10.1145/2037692.2037697

M. Washburn, P. Sathiyanarayanan, M. Nagappan, T. Zimmermann, and C. Bird, “What went right and what went wrong,” in Proceedings of the 38th International Conference on Software Engineering Companion, New York, NY, USA: ACM, May 2016, pp. 280–289. https://doi.org/10.1145/2889160.2889253

Valve, “Steam Chart - Apex Legends.”.

L. Levy and J. Novak, “Planning Your Strategy: Bug Categories, Tools & Documentation,” in Game Development Essentials: Game QA & Testing, New York: Delmar, Cengage Learning, 2010, p. 77.

A. Lamkanfi, S. Demeyer, E. Giger, and B. Goethals, “Predicting the severity of a reported bug,” in 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010), IEEE, May 2010, pp. 1–10. https://doi.org/10.1109/MSR.2010.5463284

W. Maalej and H. Nabil, “Bug report, feature request, or simply praise? On automatically classifying app reviews,” in 2015 IEEE 23rd International Requirements Engineering Conference (RE), IEEE, Aug. 2015, pp. 116–125. https://doi.org/10.1109/RE.2015.7320414

I. M. Mika Parwita and D. Siahaan, “Classification of Mobile Application Reviews using Word Embedding and Convolutional Neural Network,” Lontar Komputer : Jurnal Ilmiah Teknologi Informasi, p. 1, May 2019. https://doi.org/10.24843/LKJITI.2019.v10.i01.p01

K. Phetrungnapha and T. Senivongse, “Classification of Mobile Application User Reviews for Generating Tickets on Issue Tracking System,” in 2019 12th International Conference on Information & Communication Technology and System (ICTS), IEEE, Jul. 2019, pp. 229–234. https://doi.org/10.1109/ICTS.2019.8850962

H. Zhu, E. Chen, H. Xiong, H. Cao, and J. Tian, “Mobile App Classification with Enriched Contextual Information,” IEEE Trans Mob Comput, vol. 13, no. 7, pp. 1550–1563, Jul. 2014. https://doi.org/10.1109/TMC.2013.113

A. F. Hidayatullah and M. R. Ma’arif, “Pre-processing Tasks in Indonesian Twitter Messages,” J Phys Conf Ser, vol. 801, p. 012072, Jan. 2017. https://doi.org/10.1088/1742-6596/801/1/012072

L. Hickman, S. Thapa, L. Tay, M. Cao, and P. Srinivasan, “Text Preprocessing for Text Mining in Organizational Research: Review and Recommendations,” Organ Res Methods, vol. 25, no. 1, pp. 114–146, Jan. 2022. https://doi.org/10.1177/1094428120971683

T. Kolajo, O. Daramola, A. Adebiyi, and A. Seth, “A framework for pre-processing of social media feeds based on integrated local knowledge base,” Inf Process Manag, vol. 57, no. 6, p. 102348, Nov. 2020. https://doi.org/10.1016/j.ipm.2020.102348

M. Işık and H. Dag, “The impact of text preprocessing on the prediction of review ratings,” Turkish Journal of Electrical Engineering & Computer Sciences, vol. 28, no. 3, pp. 1405–1421, May 2020. https://doi.org/10.3906/elk-1907-46

D. Sarkar, Text Analytics with Python. Berkeley, CA: Apress, 2016. doi: 10.1007/978-1-4842-2388-8.

M. Xu, L. He, and X. Lin, “A Refined TF-IDF Algorithm Based on Channel Distribution Information for Web News Feature Extraction,” in 2010 Second International Workshop on Education Technology and Computer Science, IEEE, 2010, pp. 15–19. https://doi.org/10.1109/ETCS.2010.130

M. Alodadi and V. P. Janeja, “Similarity in Patient Support Forums Using TF-IDF and Cosine Similarity Metrics,” in 2015 International Conference on Healthcare Informatics, IEEE, Oct. 2015, pp. 521–522. https://doi.org/10.1109/ICHI.2015.99

M. Alfian, A. R. Barakbah, and I. Winarno, “Indonesian Online News Extraction and Clustering Using Evolving Clustering,” JOIV : International Journal on Informatics Visualization, vol. 5, no. 3, p. 280, Sep. 2021. http://dx.doi.org/10.30630/joiv.5.3.537

R. Bey, R. Goussault, F. Grolleau, M. Benchoufi, and R. Porcher, “Fold-stratified cross-validation for unbiased and privacy-preserving federated learning,” Journal of the American Medical Informatics Association, vol. 27, no. 8, pp. 1244–1251, Aug. 2020. https://doi.org/10.1093/jamia/ocaa096

S. Prusty, S. Patnaik, and S. K. Dash, “SKCV: Stratified K-fold cross-validation on ML classifiers for predicting cervical cancer,” Frontiers in Nanotechnology, vol. 4, Aug. 2022. https://doi.org/10.3389/fnano.2022.972421

G. Alfian et al., “Predicting Breast Cancer from Risk Factors Using SVM and Extra-Trees-Based Feature Selection Method,” Computers, vol. 11, no. 9, p. 136, Sep. 2022. https://doi.org/10.3390/computers11090136

G. Guo, H. Wang, D. Bell, Y. Bi, and K. Greer, “KNN Model-Based Approach in Classification,” in n The Move to Meaningful Internet Systems 2003: CoopIS, DOA, and ODBASE, 2003, pp. 986–996. https://doi.org/10.1007/978-3-540-39964-3_62

B. Charbuty and A. Abdulazeez, “Classification Based on Decision Tree Algorithm for Machine Learning,” Journal of Applied Science and Technology Trends, vol. 2, no. 01, pp. 20–28, Mar. 2021.

W. Zhang and F. Gao, “An Improvement to Naive Bayes for Text Classification,” Procedia Eng, vol. 15, pp. 2160–2164, 2011. https://doi.org/10.1016/j.proeng.2011.08.404

S. A. Hicks et al., “On evaluation metrics for medical applications of artificial intelligence,” Sci Rep, vol. 12, no. 1, p. 5979, Apr. 2022. https://doi.org/10.1038/s41598-022-09954-8

I. E. Tiffani, “Optimization of Naïve Bayes Classifier By Implemented Unigram, Bigram, Trigram for Sentiment Analysis of Hotel Review,” Journal of Soft Computing Exploration, vol. 1, no. 1, Sep. 2020. https://doi.org/10.52465/joscex.v1i1.4

T. Pranckevičius and V. Marcinkevičius, “Comparison of Naive Bayes, Random Forest, Decision Tree, Support Vector Machines, and Logistic Regression Classifiers for Text Reviews Classification,” Baltic Journal of Modern Computing, vol. 5, no. 2, 2017. http://dx.doi.org/10.22364/bjmc.2017.5.2.05

A. Tye, “FIFA 23 Career Mode Not Working, How to Fix?,” Apr. 03, 2023.

A. Mullins, “Apex Legends fans slam ‘garbage’ anti-cheat after embarrassing hacker clip,” Oct. 21, 2021.

Issue

Vol. 8, No. 3, August 2023

Evaluation of Stratified K-Fold Cross Validation for Predicting Bug Severity in Game Review Classification

Corresponding Author(s) : Mustika Kurnia Mayangsari

Abstract

Keywords

Download Citation

References

Downloads