This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Integrating Ensemble Learning and Information Gain for Malware Detection based on Static and Dynamic Features
Corresponding Author(s) : Fauzi Adi Rafrastara
Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control,
Vol. 10, No. 1, February 2025
Abstract
The rapid advancement of malware poses a significant threat to devices, like personal computers and mobile phones. One of the most serious threats commonly faced is malicious software, including viruses, worms, trojan horses, and ransomware. Conventional antivirus software is becoming ineffective against the ever-evolving nature of malware, which can now take on various forms like polymorphic, metamorphic, and oligomorphic variants. These advanced malware types can not only replicate and distribute themselves, but also create unique fingerprints for each offspring. To address this challenge, a new generation of antivirus software based on machine learning is needed. This intelligent approach can detect malware based on its behavior, rather than relying on outdated fingerprint-based methods. This study explored the integration of machine learning models for malware detection using various ensemble algorithms and feature selection techniques. The study compared three ensemble algorithms: Gradient Boosting, Random Forest, and AdaBoost. It used Information Gain for feature selection, analyzing 21 features. Additionally, the study employed a public dataset called ‘Malware Static and Dynamic Features VxHeaven and VirusTotal Data Set’, which encompasses both static and dynamic malware features. The results demonstrate that the Gradient Boosting algorithm combined with Information Gain feature selection achieved the highest performance, reaching an accuracy and F1-Score of 99.2%.
Keywords
Download Citation
Endnote/Zotero/Mendeley (RIS)BibTeX
- M. N. Alenezi, H. Alabdulrazzaq, A. A. Alshaher, and M. M. Alkharang, “Evolution of Malware Threats and Techniques: a Review,” International Journal of Communication Networks and Information Security (IJCNIS), vol. 12, no. 3, pp. 326–337, Dec. 2020. https://doi.org/10.17762/ijcnis.v12i3.4723
- C. S. Yadav and S. Gupta, “A Review on Malware Analysis for IoT and Android System,” SN Comput Sci, vol. 4, no. 2, pp. 1–45, Mar. 2023. https://doi.org/10.1007/s42979-022-01543-w
- F. A. Rafrastaraa, R. A. Pramunendar, D. P. Prabowo, E. Kartikadarma, and U. Sudibyo, “Optimasi Algoritma Random Forest menggunakan Principal Component Analysis untuk Deteksi Malware,” Jurnal Teknologi Dan Sistem Informasi Bisnis, vol. 5, no. 3, pp. 217–223, Jul. 2023. https://doi.org/10.47233/jteksis.v5i3.854
- M. Chen and M. Yan, “How to protect smart and autonomous vehicles from stealth viruses and worms,” ISA Trans, vol. 141, pp. 52–58, Oct. 2023. https://doi.org/10.1016/j.isatra.2023.04.019
- B. Bakić, M. Milić, I. Antović, D. Savić, and T. Stojanović, “10 years since Stuxnet: What have we learned from this mysterious computer software worm?,” 2021 25th International Conference on Information Technology, IT 2021, Feb. 2021. https://doi.org/10.1109/IT51528.2021.9390103
- S. Almutairi, S. Mahfoudh, S. Almutairi, and J. S. Alowibdi, “Hybrid Botnet Detection Based on Host and Network Analysis,” Journal of Computer Networks and Communications, vol. 2020, no. 1, p. 9024726, Jan. 2020. https://doi.org/10.1155/2020/9024726
- N. Shahid et al., “Mathematical analysis and numerical investigation of advection-reaction-diffusion computer virus model,” Results Phys, vol. 26, p. 104294, Jul. 2021. https://doi.org/10.1016/j.rinp.2021.104294
- W. Z. A. Zakaria, M. F. Abdollah, O. Mohd, S. M. W. M. S. M. M. Yassin, and A. Ariffin, “RENTAKA: A Novel Machine Learning Framework for Crypto-Ransomware Pre-encryption Detection,” International Journal of Advanced Computer Science and Applications, vol. 13, no. 5, pp. 378–385, 2022. https://dx.doi.org/10.14569/IJACSA.2022.0130545
- M. Robles-Carrillo and P. García-Teodoro, “Ransomware: An Interdisciplinary Technical and Legal Approach,” Security and Communication Networks, vol. 2022, no. 1, p. 2806605, Jan. 2022. https://doi.org/10.1155/2022/2806605
- P. Feng, J. Ma, C. Sun, X. Xu, and Y. Ma, “A novel dynamic android malware detection system with ensemble learning,” IEEE Access, vol. 6, pp. 30996–31011, 2018. https://doi.org/10.1109/ACCESS.2018.2844349
- O. Aslan and R. Samet, “A Comprehensive Review on Malware Detection Approaches,” IEEE Access, vol. 8, pp. 6249–6271, 2020. https://doi.org/10.1109/ACCESS.2019.2963724
- A. Sharma and S. K. Sahay, “Evolution and Detection of Polymorphic and Metamorphic Malwares: A Survey,” Int J Comput Appl, vol. 90, no. 2, pp. 7–11, Jun. 2014. https://doi.org/10.5120/15544-4098
- M. Abujazoh, D. Al-Darras, N. A. Hamad, and S. Al-Sharaeh, “Feature Selection for High-Dimensional Imbalanced Malware Data Using Filter and Wrapper Selection Methods,” 2023 International Conference on Information Technology: Cybersecurity Challenges for Sustainable Cities, ICIT 2023 - Proceeding, pp. 196–201, 2023. https://doi.org/10.1109/ICIT58056.2023.10226049
- C. Supriyanto, F. Adi Rafrastara, A. Amiral, S. Rosa Amalia, M. Daffa Al Fahreza, and M. Faizal bin Abdollah, “Malware Detection Using K-Nearest Neighbor Algorithm and Feature Selection,” Jurnal Media Informatika Budidarma, vol. 8, no. 1, pp. 412–420, Jan. 2024. https://doi.org/10.30865/MIB.V8I1.6970
- “Malware static and dynamic features VxHeaven and Virus Total - UCI Machine Learning Repository.” Accessed: Jan. 14, 2025.
- F. A. Rafrastara, C. Supriyanto, C. Paramita, Y. P. Astuti, and F. Ahmed, “Performance Improvement of Random Forest Algorithm for Malware Detection on Imbalanced Dataset using Random Under-Sampling Method,” Jurnal Informatika: Jurnal Pengembangan IT, vol. 8, no. 2, pp. 113–118, May 2023. https://doi.org/10.30591/jpit.v8i2.5207
- Y. Prihantono and K. Ramli, “Model-Based Feature Selection for Developing Network Attack Detection and Alerting System,” Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 6, no. 2, pp. 322–329, Apr. 2022. https://doi.org/10.29207/resti.v6i2.3989
- D. Singh and B. Singh, “Investigating the impact of data normalization on classification performance,” Appl Soft Comput, vol. 97, p. 105524, Dec. 2020. https://doi.org/10.1016/j.asoc.2019.105524
- A. Q. Md, S. Kulkarni, C. J. Joshua, T. Vaichole, S. Mohan, and C. Iwendi, “Enhanced Preprocessing Approach Using Ensemble Machine Learning Algorithms for Detecting Liver Disease,” Biomedicines, vol. 11, no. 2, Feb. 2023. https://doi.org/10.3390/biomedicines11020581
- A. A. Ceran, Y. Ar, Ö. Tanrıöver, and S. Seyrek Ceran, “Prediction of software quality with Machine Learning-Based ensemble methods,” Mater Today Proc, vol. 81, pp. 18–25, Jan. 2023. https://doi.org/10.1016/j.matpr.2022.11.229
- G. Battineni, G. G. Sagaro, C. Nalini, F. Amenta, and S. K. Tayebati, “Comparative Machine-Learning Approach: A Follow-Up Study on Type 2 Diabetes Predictions by Cross-Validation Methods,” Machines 2019, Vol. 7, Page 74, vol. 7, no. 4, p. 74, Dec. 2019. https://doi.org/10.3390/machines7040074
- G. Orrù, M. Monaro, C. Conversano, A. Gemignani, and G. Sartori, “Machine learning in psychometrics and psychological research,” Front Psychol, vol. 10, p. 492685, Jan. 2020. https://doi.org/10.3389/FPSYG.2019.02970/BIBTEX
- S. Dev, B. Kumar, D. C. Dobhal, and H. Singh Negi, “Performance Analysis and Prediction of Diabetes using Various Machine Learning Algorithms,” Proceedings - 2022 4th International Conference on Advances in Computing, Communication Control and Networking, ICAC3N 2022, pp. 517–521, 2022. https://doi.org/10.1109/ICAC3N56670.2022.10074117
- G. Gupta, A. Rai, and V. Jha, “Predicting the Bandwidth Requests in XG-PON System using Ensemble Learning,” International Conference on ICT Convergence, vol. 2021-October, pp. 936–941, 2021. https://doi.org/10.1109/ICTC52510.2021.9620935
- V. P. D and V. P, “Detecting android malware using an improved filter based technique in embedded software,” Microprocess Microsyst, vol. 76, p. 103115, Jul. 2020. https://doi.org/10.1016/j.micpro.2020.103115
- K. Sudharson, C. Rohini, A. M. Sermakani, Dhakshunhaamoorthiy, P. Menaga, and M. Maharasi, “Quantum-Resistant Wireless Intrusion Detection System using Machine Learning Techniques,” 2023 7th International Conference On Computing, Communication, Control And Automation, ICCUBEA 2023, 2023. https://doi.org/10.1109/ICCUBEA58933.2023.10392127
References
M. N. Alenezi, H. Alabdulrazzaq, A. A. Alshaher, and M. M. Alkharang, “Evolution of Malware Threats and Techniques: a Review,” International Journal of Communication Networks and Information Security (IJCNIS), vol. 12, no. 3, pp. 326–337, Dec. 2020. https://doi.org/10.17762/ijcnis.v12i3.4723
C. S. Yadav and S. Gupta, “A Review on Malware Analysis for IoT and Android System,” SN Comput Sci, vol. 4, no. 2, pp. 1–45, Mar. 2023. https://doi.org/10.1007/s42979-022-01543-w
F. A. Rafrastaraa, R. A. Pramunendar, D. P. Prabowo, E. Kartikadarma, and U. Sudibyo, “Optimasi Algoritma Random Forest menggunakan Principal Component Analysis untuk Deteksi Malware,” Jurnal Teknologi Dan Sistem Informasi Bisnis, vol. 5, no. 3, pp. 217–223, Jul. 2023. https://doi.org/10.47233/jteksis.v5i3.854
M. Chen and M. Yan, “How to protect smart and autonomous vehicles from stealth viruses and worms,” ISA Trans, vol. 141, pp. 52–58, Oct. 2023. https://doi.org/10.1016/j.isatra.2023.04.019
B. Bakić, M. Milić, I. Antović, D. Savić, and T. Stojanović, “10 years since Stuxnet: What have we learned from this mysterious computer software worm?,” 2021 25th International Conference on Information Technology, IT 2021, Feb. 2021. https://doi.org/10.1109/IT51528.2021.9390103
S. Almutairi, S. Mahfoudh, S. Almutairi, and J. S. Alowibdi, “Hybrid Botnet Detection Based on Host and Network Analysis,” Journal of Computer Networks and Communications, vol. 2020, no. 1, p. 9024726, Jan. 2020. https://doi.org/10.1155/2020/9024726
N. Shahid et al., “Mathematical analysis and numerical investigation of advection-reaction-diffusion computer virus model,” Results Phys, vol. 26, p. 104294, Jul. 2021. https://doi.org/10.1016/j.rinp.2021.104294
W. Z. A. Zakaria, M. F. Abdollah, O. Mohd, S. M. W. M. S. M. M. Yassin, and A. Ariffin, “RENTAKA: A Novel Machine Learning Framework for Crypto-Ransomware Pre-encryption Detection,” International Journal of Advanced Computer Science and Applications, vol. 13, no. 5, pp. 378–385, 2022. https://dx.doi.org/10.14569/IJACSA.2022.0130545
M. Robles-Carrillo and P. García-Teodoro, “Ransomware: An Interdisciplinary Technical and Legal Approach,” Security and Communication Networks, vol. 2022, no. 1, p. 2806605, Jan. 2022. https://doi.org/10.1155/2022/2806605
P. Feng, J. Ma, C. Sun, X. Xu, and Y. Ma, “A novel dynamic android malware detection system with ensemble learning,” IEEE Access, vol. 6, pp. 30996–31011, 2018. https://doi.org/10.1109/ACCESS.2018.2844349
O. Aslan and R. Samet, “A Comprehensive Review on Malware Detection Approaches,” IEEE Access, vol. 8, pp. 6249–6271, 2020. https://doi.org/10.1109/ACCESS.2019.2963724
A. Sharma and S. K. Sahay, “Evolution and Detection of Polymorphic and Metamorphic Malwares: A Survey,” Int J Comput Appl, vol. 90, no. 2, pp. 7–11, Jun. 2014. https://doi.org/10.5120/15544-4098
M. Abujazoh, D. Al-Darras, N. A. Hamad, and S. Al-Sharaeh, “Feature Selection for High-Dimensional Imbalanced Malware Data Using Filter and Wrapper Selection Methods,” 2023 International Conference on Information Technology: Cybersecurity Challenges for Sustainable Cities, ICIT 2023 - Proceeding, pp. 196–201, 2023. https://doi.org/10.1109/ICIT58056.2023.10226049
C. Supriyanto, F. Adi Rafrastara, A. Amiral, S. Rosa Amalia, M. Daffa Al Fahreza, and M. Faizal bin Abdollah, “Malware Detection Using K-Nearest Neighbor Algorithm and Feature Selection,” Jurnal Media Informatika Budidarma, vol. 8, no. 1, pp. 412–420, Jan. 2024. https://doi.org/10.30865/MIB.V8I1.6970
“Malware static and dynamic features VxHeaven and Virus Total - UCI Machine Learning Repository.” Accessed: Jan. 14, 2025.
F. A. Rafrastara, C. Supriyanto, C. Paramita, Y. P. Astuti, and F. Ahmed, “Performance Improvement of Random Forest Algorithm for Malware Detection on Imbalanced Dataset using Random Under-Sampling Method,” Jurnal Informatika: Jurnal Pengembangan IT, vol. 8, no. 2, pp. 113–118, May 2023. https://doi.org/10.30591/jpit.v8i2.5207
Y. Prihantono and K. Ramli, “Model-Based Feature Selection for Developing Network Attack Detection and Alerting System,” Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 6, no. 2, pp. 322–329, Apr. 2022. https://doi.org/10.29207/resti.v6i2.3989
D. Singh and B. Singh, “Investigating the impact of data normalization on classification performance,” Appl Soft Comput, vol. 97, p. 105524, Dec. 2020. https://doi.org/10.1016/j.asoc.2019.105524
A. Q. Md, S. Kulkarni, C. J. Joshua, T. Vaichole, S. Mohan, and C. Iwendi, “Enhanced Preprocessing Approach Using Ensemble Machine Learning Algorithms for Detecting Liver Disease,” Biomedicines, vol. 11, no. 2, Feb. 2023. https://doi.org/10.3390/biomedicines11020581
A. A. Ceran, Y. Ar, Ö. Tanrıöver, and S. Seyrek Ceran, “Prediction of software quality with Machine Learning-Based ensemble methods,” Mater Today Proc, vol. 81, pp. 18–25, Jan. 2023. https://doi.org/10.1016/j.matpr.2022.11.229
G. Battineni, G. G. Sagaro, C. Nalini, F. Amenta, and S. K. Tayebati, “Comparative Machine-Learning Approach: A Follow-Up Study on Type 2 Diabetes Predictions by Cross-Validation Methods,” Machines 2019, Vol. 7, Page 74, vol. 7, no. 4, p. 74, Dec. 2019. https://doi.org/10.3390/machines7040074
G. Orrù, M. Monaro, C. Conversano, A. Gemignani, and G. Sartori, “Machine learning in psychometrics and psychological research,” Front Psychol, vol. 10, p. 492685, Jan. 2020. https://doi.org/10.3389/FPSYG.2019.02970/BIBTEX
S. Dev, B. Kumar, D. C. Dobhal, and H. Singh Negi, “Performance Analysis and Prediction of Diabetes using Various Machine Learning Algorithms,” Proceedings - 2022 4th International Conference on Advances in Computing, Communication Control and Networking, ICAC3N 2022, pp. 517–521, 2022. https://doi.org/10.1109/ICAC3N56670.2022.10074117
G. Gupta, A. Rai, and V. Jha, “Predicting the Bandwidth Requests in XG-PON System using Ensemble Learning,” International Conference on ICT Convergence, vol. 2021-October, pp. 936–941, 2021. https://doi.org/10.1109/ICTC52510.2021.9620935
V. P. D and V. P, “Detecting android malware using an improved filter based technique in embedded software,” Microprocess Microsyst, vol. 76, p. 103115, Jul. 2020. https://doi.org/10.1016/j.micpro.2020.103115
K. Sudharson, C. Rohini, A. M. Sermakani, Dhakshunhaamoorthiy, P. Menaga, and M. Maharasi, “Quantum-Resistant Wireless Intrusion Detection System using Machine Learning Techniques,” 2023 7th International Conference On Computing, Communication, Control And Automation, ICCUBEA 2023, 2023. https://doi.org/10.1109/ICCUBEA58933.2023.10392127