Issue
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Optimizing Android Program Malware Classification Using GridSearchCV Optimized Random Forest
Corresponding Author(s) : Luqman Hakim
Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control,
Vol. 9, No. 2, May 2024
Abstract
The growing number of smartphones, particularly Android powered ones, has increased public awareness of the security concerns posed by malware and viruses. While machine learning models have been studied for malware prediction in this field, methods for precise identification and classification still require improvement for the perfect detection of malwares and minimizing the cracks on machine learning based classification. Detection accuracy that ranges from 93% to 95% has been observed in prior research, indicates room for improvement. In order to maximize the hyperparameters, this paper suggests improving the Random Forest method by introducing the grid search algorithm which isn’t present in previous studies. A significant increase in classification accuracy is the main aim of the research. We exhibit an outstanding 99% accuracy rate in detecting malware contaminated programs, demonstrating the significance of our technique. The proposed method can be seen as a huge improvement over existing models, achieving near perfection in detection, in contrast to which typically obtained by previous models with the accuracy rate of 95% max on the same dataset. Our approach achieves such high accuracy and provides a novel remedy for the limits of the Android based platforms, particularly when program processing resources are limited. This study confirms the effectiveness of our improved Random Forest algorithm, points to a paradigm shift in malware detection, and heightened cybersecurity measures for the rapidly growing smartphone market.
Keywords
Download Citation
Endnote/Zotero/Mendeley (RIS)BibTeX
- M. S. Akhtar and T. Feng, “Malware Analysis and Detection Using Machine Learning Algorithms,” Symmetry 2022, Vol. 14, Page 2304, vol. 14, no. 11, p. 2304, Nov. 2022. https://doi.org/10.3390/sym14112304
- R. Sinha, “STUDY OF MALWARE DETECTION USING MACHINE LEARNING Cyber Crime View project Comparative Analysis of Digital Marketing vs Traditional Marketing View project,” University Grant Commision Consortium for Academic and Research Ethics, vol. 51, no. 1, pp. 145–154, 2021. https://doi.org/10.13140/RG.2.2.11478.16963
- M. Yang, X. Chen, Y. Luo, and H. Zhang, “An Android Malware Detection Model Based on DT-SVM,” Security and Communication Networks, vol. 2020, 2020. https://doi.org/10.1155/2020/8841233
- S. K. Sasidharan and C. Thomas, “ProDroid — An Android malware detection framework based on profile hidden Markov model,” Pervasive Mob Comput, vol. 72, Apr. 2021. https://doi.org/10.1016/j.pmcj.2021.101336
- P. M. Kavitha and B. Muruganantham, “A study on deep learning approaches over malware detection,” Proceedings of 2020 IEEE International Conference on Advances and Developments in Electrical and Electronics Engineering, ICADEE 2020, Dec. 2020. https://doi.org/10.1109/ICADEE51157.2020.9368924
- M. S. Saleem, J. Mišić, and V. B. Mišić, “Android Malware Detection using Feature Ranking of Permissions,” Jan. 2022. https://doi.org/10.48550/arXiv.2201.08468
- L. N. Vu and S. Jung, “AdMat: A CNN-on-Matrix Approach to Android Malware Detection and Classification,” IEEE Access, vol. 9, pp. 39680–39694, 2021. https://doi.org/10.1109/ACCESS.2021.3063748
- I. H. Sarker, “Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions,” SN Comput Sci, vol. 2, no. 6, pp. 1–20, Nov. 2021. https://doi.org/10.1007/s42979-021-00815-1
- E. Odat and Q. M. Yaseen, “A Novel Machine Learning Approach for Android Malware Detection Based on the Co-Existence of Features,” IEEE Access, vol. 11, pp. 15471–15484, 2023. https://doi.org/10.1109/ACCESS.2023.3244656
- A. Arora, S. K. Peddoju, and M. Conti, “PermPair: Android Malware Detection Using Permission Pairs,” IEEE Transactions on Information Forensics and Security, vol. 15, pp. 1968–1982, 2019. https://doi.org/10.1109/TIFS.2019.2950134
- H. Rathore, S. K. Sahay, S. Thukral, and M. Sewak, “Detection of Malicious Android Applications: Classical Machine Learning vs. Deep Neural Network Integrated with Clustering,” Feb. 2021. https://doi.org/10.48550/arXiv.2103.00637
- S. Lu, Q. Li, and X. Zhu, “Stealthy Malware Detection Based on Deep Neural Network,” in Journal of Physics: Conference Series, Institute of Physics Publishing, Jan. 2020. https://doi.org/10.1088/1742-6596/1437/1/012123
- S. Yerima, “Android malware dataset for machine learning 2.” Jun. 2018. https://doi.org/10.6084/m9.figshare.5854653.v1
- H. Rafiq, N. Aslam, M. Aleem, B. Issac, and R. H. Randhawa, “AndroMalPack: enhancing the ML-based malware classification by detection and removal of repacked apps for Android systems,” Scientific Reports 2022 12:1, vol. 12, no. 1, pp. 1–18, Nov. 2022. https://doi.org/10.1038/s41598-022-23766-w
- E. Puyol-Antón et al., “Fairness in Cardiac MR Image Analysis: An Investigation of Bias Due to Data Imbalance in Deep Learning Based Segmentation,” Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 12903 LNCS, pp. 413–423, 2021. https://doi.org/10.1007/978-3-030-87199-4_39
- N. Daoudi, K. Allix, T. F. Bissyandé, and J. Klein, “A Deep Dive Inside DREBIN: An Explorative Analysis beyond Android Malware Detection Scores,” ACM Transactions on Privacy and Security, vol. 25, no. 2, May 2022. https://doi.org/10.1145/3503463
- K. Khariwal, R. Gupta, J. Singh, and A. Arora, “R MF Droid Android Malware Detection using Ranked Manifest File Components,” International Journal of Innovative Technology and Exploring Engineering, vol. 10, no. 7, pp. 55–64, May 2021. https://doi.org/10.35940/ijitee.G8951.0510721
- D. Arp, M. Spreitzenbarth, M. Hübner, H. Gascon, and K. Rieck, “Drebin: Effective and Explainable Detection of Android Malware in Your Pocket,” in NDSS Symposium 2014, 2014. https://doi.org/10.14722/ndss.2014.23247
- D. Dablain, B. Krawczyk, and N. V. Chawla, “DeepSMOTE: Fusing Deep Learning and SMOTE for Imbalanced Data,” IEEE Trans Neural Netw Learn Syst, 2022. https://doi.org/10.1109/TNNLS.2021.3136503
- J. Qiu, J. Zhang, W. Luo, L. Pan, S. Nepal, and Y. Xiang, “A Survey of Android Malware Detection with Deep Neural Models,” ACM Computing Surveys, vol. 53, no. 6. Association for Computing Machinery, Feb. 01, 2021. https://doi.org/10.1145/3417978
- A. Yokoyama and N. Yamaguchi, “Optimal hyperparameters for random forest to predict leakage current alarm on premises,” in E3S Web of Conferences, EDP Sciences, Feb. 2020. https://doi.org/10.1051/e3sconf/202015203003
- V. Syrris and D. Geneiatakis, “On machine learning effectiveness for malware detection in Android OS using static analysis data,” Journal of Information Security and Applications, vol. 59, p. 102794, Jun. 2021. https://doi.org/10.1016/J.JISA.2021.102794
- M. Vakili, M. Ghamsari, and M. Rezaei, “Performance Analysis and Comparison of Machine and Deep Learning Algorithms for IoT Data Classification.” arXiv, 2020. https://doi.org/10.48550/ARXIV.2001.09636
- M. Grandini, E. Bagli, and G. Visani, “Metrics for Multi-Class Classification: an Overview,” Aug. 2020. https://doi.org/10.48550/arXiv.2008.05756
- E. S. Alomari et al., “Malware Detection Using Deep Learning and Correlation-Based Feature Selection,” Symmetry (Basel), vol. 15, no. 1, Jan. 2023. https://doi.org/10.3390/sym15010123
References
M. S. Akhtar and T. Feng, “Malware Analysis and Detection Using Machine Learning Algorithms,” Symmetry 2022, Vol. 14, Page 2304, vol. 14, no. 11, p. 2304, Nov. 2022. https://doi.org/10.3390/sym14112304
R. Sinha, “STUDY OF MALWARE DETECTION USING MACHINE LEARNING Cyber Crime View project Comparative Analysis of Digital Marketing vs Traditional Marketing View project,” University Grant Commision Consortium for Academic and Research Ethics, vol. 51, no. 1, pp. 145–154, 2021. https://doi.org/10.13140/RG.2.2.11478.16963
M. Yang, X. Chen, Y. Luo, and H. Zhang, “An Android Malware Detection Model Based on DT-SVM,” Security and Communication Networks, vol. 2020, 2020. https://doi.org/10.1155/2020/8841233
S. K. Sasidharan and C. Thomas, “ProDroid — An Android malware detection framework based on profile hidden Markov model,” Pervasive Mob Comput, vol. 72, Apr. 2021. https://doi.org/10.1016/j.pmcj.2021.101336
P. M. Kavitha and B. Muruganantham, “A study on deep learning approaches over malware detection,” Proceedings of 2020 IEEE International Conference on Advances and Developments in Electrical and Electronics Engineering, ICADEE 2020, Dec. 2020. https://doi.org/10.1109/ICADEE51157.2020.9368924
M. S. Saleem, J. Mišić, and V. B. Mišić, “Android Malware Detection using Feature Ranking of Permissions,” Jan. 2022. https://doi.org/10.48550/arXiv.2201.08468
L. N. Vu and S. Jung, “AdMat: A CNN-on-Matrix Approach to Android Malware Detection and Classification,” IEEE Access, vol. 9, pp. 39680–39694, 2021. https://doi.org/10.1109/ACCESS.2021.3063748
I. H. Sarker, “Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions,” SN Comput Sci, vol. 2, no. 6, pp. 1–20, Nov. 2021. https://doi.org/10.1007/s42979-021-00815-1
E. Odat and Q. M. Yaseen, “A Novel Machine Learning Approach for Android Malware Detection Based on the Co-Existence of Features,” IEEE Access, vol. 11, pp. 15471–15484, 2023. https://doi.org/10.1109/ACCESS.2023.3244656
A. Arora, S. K. Peddoju, and M. Conti, “PermPair: Android Malware Detection Using Permission Pairs,” IEEE Transactions on Information Forensics and Security, vol. 15, pp. 1968–1982, 2019. https://doi.org/10.1109/TIFS.2019.2950134
H. Rathore, S. K. Sahay, S. Thukral, and M. Sewak, “Detection of Malicious Android Applications: Classical Machine Learning vs. Deep Neural Network Integrated with Clustering,” Feb. 2021. https://doi.org/10.48550/arXiv.2103.00637
S. Lu, Q. Li, and X. Zhu, “Stealthy Malware Detection Based on Deep Neural Network,” in Journal of Physics: Conference Series, Institute of Physics Publishing, Jan. 2020. https://doi.org/10.1088/1742-6596/1437/1/012123
S. Yerima, “Android malware dataset for machine learning 2.” Jun. 2018. https://doi.org/10.6084/m9.figshare.5854653.v1
H. Rafiq, N. Aslam, M. Aleem, B. Issac, and R. H. Randhawa, “AndroMalPack: enhancing the ML-based malware classification by detection and removal of repacked apps for Android systems,” Scientific Reports 2022 12:1, vol. 12, no. 1, pp. 1–18, Nov. 2022. https://doi.org/10.1038/s41598-022-23766-w
E. Puyol-Antón et al., “Fairness in Cardiac MR Image Analysis: An Investigation of Bias Due to Data Imbalance in Deep Learning Based Segmentation,” Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 12903 LNCS, pp. 413–423, 2021. https://doi.org/10.1007/978-3-030-87199-4_39
N. Daoudi, K. Allix, T. F. Bissyandé, and J. Klein, “A Deep Dive Inside DREBIN: An Explorative Analysis beyond Android Malware Detection Scores,” ACM Transactions on Privacy and Security, vol. 25, no. 2, May 2022. https://doi.org/10.1145/3503463
K. Khariwal, R. Gupta, J. Singh, and A. Arora, “R MF Droid Android Malware Detection using Ranked Manifest File Components,” International Journal of Innovative Technology and Exploring Engineering, vol. 10, no. 7, pp. 55–64, May 2021. https://doi.org/10.35940/ijitee.G8951.0510721
D. Arp, M. Spreitzenbarth, M. Hübner, H. Gascon, and K. Rieck, “Drebin: Effective and Explainable Detection of Android Malware in Your Pocket,” in NDSS Symposium 2014, 2014. https://doi.org/10.14722/ndss.2014.23247
D. Dablain, B. Krawczyk, and N. V. Chawla, “DeepSMOTE: Fusing Deep Learning and SMOTE for Imbalanced Data,” IEEE Trans Neural Netw Learn Syst, 2022. https://doi.org/10.1109/TNNLS.2021.3136503
J. Qiu, J. Zhang, W. Luo, L. Pan, S. Nepal, and Y. Xiang, “A Survey of Android Malware Detection with Deep Neural Models,” ACM Computing Surveys, vol. 53, no. 6. Association for Computing Machinery, Feb. 01, 2021. https://doi.org/10.1145/3417978
A. Yokoyama and N. Yamaguchi, “Optimal hyperparameters for random forest to predict leakage current alarm on premises,” in E3S Web of Conferences, EDP Sciences, Feb. 2020. https://doi.org/10.1051/e3sconf/202015203003
V. Syrris and D. Geneiatakis, “On machine learning effectiveness for malware detection in Android OS using static analysis data,” Journal of Information Security and Applications, vol. 59, p. 102794, Jun. 2021. https://doi.org/10.1016/J.JISA.2021.102794
M. Vakili, M. Ghamsari, and M. Rezaei, “Performance Analysis and Comparison of Machine and Deep Learning Algorithms for IoT Data Classification.” arXiv, 2020. https://doi.org/10.48550/ARXIV.2001.09636
M. Grandini, E. Bagli, and G. Visani, “Metrics for Multi-Class Classification: an Overview,” Aug. 2020. https://doi.org/10.48550/arXiv.2008.05756
E. S. Alomari et al., “Malware Detection Using Deep Learning and Correlation-Based Feature Selection,” Symmetry (Basel), vol. 15, no. 1, Jan. 2023. https://doi.org/10.3390/sym15010123