This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Opinion Spam Classification on Steam Review using Support Vector Machine with Lexicon-Based Features
Corresponding Author(s) : Rafif Taqiuddin
Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control,
Vol. 6, No. 4, November 2021
Abstract
Steam is a video game digital distribution platform developed by Valve Software. Steam provides a user review feature, where users can write about criticism or comments on games that can contain positive or negative sentiments. Based on the questionnaire that the author conducted to Steam users from all over Indonesia, the user review feature provided by Steam was not sufficient. This is because there are fake reviews that allow biased opinions from certain parties so that a phenomenon called review bombing often occurs where users review only to drop or raise the image of a product, not to review it sincerely. From these problems, a solution design is needed that can classify fake reviews on the Steam service. The Support Vector Machine (SVM) classification method was chosen as the model in combination with lexicon-based feature retrieval and Term Frequency – Inverse Document Frequency (TF-IDF) weighting. Of the 236 classification test data conducted by SVM, it produced 105 reviews which were categorized as Valid Reviews. Meanwhile, those categorized as Opinion Spam by SVM are 131 reviews. The accuracy level of the data classification model using Support Vector Machine method is of 81% by dividing training data by 70% and test data by 30% with a random state level of 109. A dashboard in the form of a web application has also been made that contains the classification model to be used for buying reference for Steam user.
Keywords
Download Citation
Endnote/Zotero/Mendeley (RIS)BibTeX
- Jindal, N., & Liu, B. (2008). Opinion spam and analysis. ACM Press. https://doi.org/10.1145/1341531.1341560
- Lin, D., Bezemer, C. P., & Hassan, A. E. (2017). Studying the urgent updates of popular games on the steam platform. Empirical Software Engineering, 22(4), 2095-2126. https://doi.org/10.1007/s10664-016-9480-2
- Bulygin, D. (2020). Game Experience Evaluation. A Study of Game Reviews on the Steam Platform. In Digital Transformation and Global Society: 5th International Conference, DTGS 2020, St. Petersburg, Russia, June 17-19, 2020, Revised Selected Papers (Vol. 1242, p. 117). Springer Nature. https://doi.org/10.1007/978-3-030-65218-0_9
- Bian, P., Liu, L., & Sweetser Kyburz, P. (2021). Detecting Spam Game Reviews on Steam with a Semi-Supervised Approach. In International Conference on the Foundations of Digital Game. ACM.
- Tomaselli, V., Cantone, G. G., & Mazzeo, V. (2021). The Polarising Effect of Review Bomb. arXiv preprint .
- Pasaribu, B. E., Herdiani, A., & Astuti, W. (2019). Deteksi fake reviews menggunakan support vector machine. E-Proceeding of Engineering, 6(2), 8788.
- Li, F., Huang, M., & Zhu, X. (Eds.). (2011). Learning to identify review spam (Issue IJCAI International Joint Conference on Artificial Intelligence). https://doi.org/10.5591/978-1-57735-516-8/IJCAI11-414
- Cho, H., Kim, S., Lee, J., & Lee, J. S. (2014). Data-driven integration of multiple sentiment dictionaries for lexicon-based sentiment classification of product reviews. Knowledge-Based Systems, 71, 61-71. https://doi.org/10.1016/j.knosys.2014.06.001
- Ferlin, J., Bachtiar, F. A., & Rusydi, A. N. (2020). Klasifikasi Customer Intent Untuk Mengetahui Tingkat Kepuasan Pelanggan Menggunakan Metode Support Vector Machine Pada Restoran Bakso President. Jurnal Pengembangan Teknologi Informasi Dan Ilmu Komputer, 3(7), 9867–9875.
- Ruslim, K. I., Adikara, P., & Indriati. (2019). Analisis sentimen pada ulasan aplikasi mobile banking menggunakan metode support vector machine dan lexicon based features. Jurnal Pengembangan Teknologi Informasi Dan Ilmu Komputer, 3(7), 6694–6702.
- Campbell, C., & Ying, Y. (2011). Learning with Support Vector Machines. Morgan & Claypool. https://doi.org/10.2200/S00324ED1V01Y201102AIM010
- Augustyniak, L., Kajdanowicz, T., Szymański, P., Tuligłowicz, W., Kazienko, P., Alhajj, R., & Szymanski, B. (2014, August). Simpler is better? Lexicon-based ensemble sentiment classification beats supervised methods. In 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2014) (pp. 924-929). IEEE. https://doi.org/10.1109/ASONAM.2014.6921696
- Taboada, M., Brooke, J., Tofiloski, M., Voll, K., & Stede, M. (2011). Lexicon-Based Methods for Sentiment Analysis. Computational Linguistics, 37(2), 267–307. https://doi.org/10.1162/coli_a_00049
- Bafna, P., Pramod, D., & Vaidya, A. (2016, March). Document clustering: TF-IDF approach. In 2016 International Conference on Electrical, Electronics, and Optimization Techniques (ICEEOT) (pp. 61-66). IEEE. https://doi.org/10.1109/ICEEOT.2016.7754750
- Singh, M., & Pamula, R. (2018, September). Email spam classification by support vector machine. In 2018 International Conference on Computing, Power and Communication Technologies (GUCON) (pp. 878-882). IEEE. https://doi.org/10.1109/GUCON.2018.8674973
- Colhon, M., Vlăduţescu, T., & Negrea, X. (2017). How Objective a Neutral Word Is? A Neutrosophic Approach for the Objectivity Degrees of Neutral Words. Symmetry, 9(11), 280. https://doi.org/10.3390/sym9110280
- Fusilier, D. H., Montes-y-Gómez, M., Rosso, P., & Cabrera, R. G. (2015). Detection of Opinion Spam with Character n-grams. Computational Linguistics and Intelligent Text Processing, 285–294. https://doi.org/10.1007/978-3-319-18117-2_21
- Guillet, F., & Hamilton, H. J. (2010). Quality Measures in Data Mining (Studies in Computational Intelligence, 43) (Softcover reprint of hardcover 1st ed. 2007 ed.). Springer. https://doi.org/10.1007/978-3-540-44918-8
- Kao, A., & Poteet, S. R. (2010). Natural Language Processing and Text Mining (Softcover reprint of hardcover 1st ed. 2007 ed.). Springer. https://doi.org/10.1007/978-1-84628-754-1
- Anandarajan, M., Hill, C., & Nolan, T. (2018). Practical Text Analytics: Maximizing the Value of Text Data (Advances in Analytics and Data Science, 2) (Softcover reprint of the original 1st ed. 2019 ed.). Springer. https://doi.org/10.1007/978-3-319-95663-3_1
- Yun-tao, Z., Ling, G. & Yong-cheng, W. An improved TF-IDF approach for text classification. J. Zheijang Univ.-Sci. A 6, 49–55 (2005). https://doi.org/10.1007/BF02842477
- Ahuja, S., & Dubey, G. (2017, August). Clustering and sentiment analysis on Twitter data. In 2017 2nd International Conference on Telecommunication and Networks (TEL-NET) (pp. 1-5). IEEE. https://doi.org/10.1109/TEL-NET.2017.8343568
- Laksono, R. A., Sungkono, K. R., Sarno, R., & Wahyuni, C. S. (2019, July). Sentiment analysis of restaurant customer reviews on tripadvisor using naïve bayes. In 2019 12th International Conference on Information & Communication Technology and System (ICTS) (pp. 49-54). IEEE. https://doi.org/10.1109/ICTS.2019.8850982
- Gharehchopogh, F. S., & Khalifelu, Z. A. (2011, October). Analysis and evaluation of unstructured data: text mining versus natural language processing. In 2011 5th International Conference on Application of Information and Communication Technologies (AICT) (pp. 1-4). IEEE. https://doi.org/10.1109/ICAICT.2011.6111017
- Stone, M. (1974). Cross-Validatory Choice and Assessment of Statistical Predictions. Journal of the Royal Statistical Society. Series B (Methodological), 36(2), 111–147.
- Gharehchopogh, F. S., & Khalifelu, Z. A. (2011, October). Analysis and evaluation of unstructured data: text mining versus natural language processing. In 2011 5th International Conference on Application of Information and Communication Technologies (AICT) (pp. 1-4). IEEE. https://doi.org/10.1109/ICAICT.2011.6111017
- Ahmed H, Traore I, Saad S (2018). Detecting opinion spams and fake news using text classification, Security and Privacy, 2018;1:e9. https://doi.org/10.1001/spy2.9
References
Jindal, N., & Liu, B. (2008). Opinion spam and analysis. ACM Press. https://doi.org/10.1145/1341531.1341560
Lin, D., Bezemer, C. P., & Hassan, A. E. (2017). Studying the urgent updates of popular games on the steam platform. Empirical Software Engineering, 22(4), 2095-2126. https://doi.org/10.1007/s10664-016-9480-2
Bulygin, D. (2020). Game Experience Evaluation. A Study of Game Reviews on the Steam Platform. In Digital Transformation and Global Society: 5th International Conference, DTGS 2020, St. Petersburg, Russia, June 17-19, 2020, Revised Selected Papers (Vol. 1242, p. 117). Springer Nature. https://doi.org/10.1007/978-3-030-65218-0_9
Bian, P., Liu, L., & Sweetser Kyburz, P. (2021). Detecting Spam Game Reviews on Steam with a Semi-Supervised Approach. In International Conference on the Foundations of Digital Game. ACM.
Tomaselli, V., Cantone, G. G., & Mazzeo, V. (2021). The Polarising Effect of Review Bomb. arXiv preprint .
Pasaribu, B. E., Herdiani, A., & Astuti, W. (2019). Deteksi fake reviews menggunakan support vector machine. E-Proceeding of Engineering, 6(2), 8788.
Li, F., Huang, M., & Zhu, X. (Eds.). (2011). Learning to identify review spam (Issue IJCAI International Joint Conference on Artificial Intelligence). https://doi.org/10.5591/978-1-57735-516-8/IJCAI11-414
Cho, H., Kim, S., Lee, J., & Lee, J. S. (2014). Data-driven integration of multiple sentiment dictionaries for lexicon-based sentiment classification of product reviews. Knowledge-Based Systems, 71, 61-71. https://doi.org/10.1016/j.knosys.2014.06.001
Ferlin, J., Bachtiar, F. A., & Rusydi, A. N. (2020). Klasifikasi Customer Intent Untuk Mengetahui Tingkat Kepuasan Pelanggan Menggunakan Metode Support Vector Machine Pada Restoran Bakso President. Jurnal Pengembangan Teknologi Informasi Dan Ilmu Komputer, 3(7), 9867–9875.
Ruslim, K. I., Adikara, P., & Indriati. (2019). Analisis sentimen pada ulasan aplikasi mobile banking menggunakan metode support vector machine dan lexicon based features. Jurnal Pengembangan Teknologi Informasi Dan Ilmu Komputer, 3(7), 6694–6702.
Campbell, C., & Ying, Y. (2011). Learning with Support Vector Machines. Morgan & Claypool. https://doi.org/10.2200/S00324ED1V01Y201102AIM010
Augustyniak, L., Kajdanowicz, T., Szymański, P., Tuligłowicz, W., Kazienko, P., Alhajj, R., & Szymanski, B. (2014, August). Simpler is better? Lexicon-based ensemble sentiment classification beats supervised methods. In 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2014) (pp. 924-929). IEEE. https://doi.org/10.1109/ASONAM.2014.6921696
Taboada, M., Brooke, J., Tofiloski, M., Voll, K., & Stede, M. (2011). Lexicon-Based Methods for Sentiment Analysis. Computational Linguistics, 37(2), 267–307. https://doi.org/10.1162/coli_a_00049
Bafna, P., Pramod, D., & Vaidya, A. (2016, March). Document clustering: TF-IDF approach. In 2016 International Conference on Electrical, Electronics, and Optimization Techniques (ICEEOT) (pp. 61-66). IEEE. https://doi.org/10.1109/ICEEOT.2016.7754750
Singh, M., & Pamula, R. (2018, September). Email spam classification by support vector machine. In 2018 International Conference on Computing, Power and Communication Technologies (GUCON) (pp. 878-882). IEEE. https://doi.org/10.1109/GUCON.2018.8674973
Colhon, M., Vlăduţescu, T., & Negrea, X. (2017). How Objective a Neutral Word Is? A Neutrosophic Approach for the Objectivity Degrees of Neutral Words. Symmetry, 9(11), 280. https://doi.org/10.3390/sym9110280
Fusilier, D. H., Montes-y-Gómez, M., Rosso, P., & Cabrera, R. G. (2015). Detection of Opinion Spam with Character n-grams. Computational Linguistics and Intelligent Text Processing, 285–294. https://doi.org/10.1007/978-3-319-18117-2_21
Guillet, F., & Hamilton, H. J. (2010). Quality Measures in Data Mining (Studies in Computational Intelligence, 43) (Softcover reprint of hardcover 1st ed. 2007 ed.). Springer. https://doi.org/10.1007/978-3-540-44918-8
Kao, A., & Poteet, S. R. (2010). Natural Language Processing and Text Mining (Softcover reprint of hardcover 1st ed. 2007 ed.). Springer. https://doi.org/10.1007/978-1-84628-754-1
Anandarajan, M., Hill, C., & Nolan, T. (2018). Practical Text Analytics: Maximizing the Value of Text Data (Advances in Analytics and Data Science, 2) (Softcover reprint of the original 1st ed. 2019 ed.). Springer. https://doi.org/10.1007/978-3-319-95663-3_1
Yun-tao, Z., Ling, G. & Yong-cheng, W. An improved TF-IDF approach for text classification. J. Zheijang Univ.-Sci. A 6, 49–55 (2005). https://doi.org/10.1007/BF02842477
Ahuja, S., & Dubey, G. (2017, August). Clustering and sentiment analysis on Twitter data. In 2017 2nd International Conference on Telecommunication and Networks (TEL-NET) (pp. 1-5). IEEE. https://doi.org/10.1109/TEL-NET.2017.8343568
Laksono, R. A., Sungkono, K. R., Sarno, R., & Wahyuni, C. S. (2019, July). Sentiment analysis of restaurant customer reviews on tripadvisor using naïve bayes. In 2019 12th International Conference on Information & Communication Technology and System (ICTS) (pp. 49-54). IEEE. https://doi.org/10.1109/ICTS.2019.8850982
Gharehchopogh, F. S., & Khalifelu, Z. A. (2011, October). Analysis and evaluation of unstructured data: text mining versus natural language processing. In 2011 5th International Conference on Application of Information and Communication Technologies (AICT) (pp. 1-4). IEEE. https://doi.org/10.1109/ICAICT.2011.6111017
Stone, M. (1974). Cross-Validatory Choice and Assessment of Statistical Predictions. Journal of the Royal Statistical Society. Series B (Methodological), 36(2), 111–147.
Gharehchopogh, F. S., & Khalifelu, Z. A. (2011, October). Analysis and evaluation of unstructured data: text mining versus natural language processing. In 2011 5th International Conference on Application of Information and Communication Technologies (AICT) (pp. 1-4). IEEE. https://doi.org/10.1109/ICAICT.2011.6111017
Ahmed H, Traore I, Saad S (2018). Detecting opinion spams and fake news using text classification, Security and Privacy, 2018;1:e9. https://doi.org/10.1001/spy2.9