Information and communication and chemical technologies

No. 4 (25) - 2024 / 2024-12-31 / Number of views: 10

THE STUDY OF MACHINE AND DEEP LEARNING MODELS FOR MALWARE CLASSIFICATION

Authors

Keywords

Malware, information security, threat detection, machine learning, deep learning, Chi-square, class balancing

Link to DOI:

https://doi.org/10.58805/kazutb.v.4.25-559

How to quote

Zhumabekova, A., O. Ussatova, M. Kalimoldayev, V. Karyukin, and Y. . Begimbayeva. “THE STUDY OF MACHINE AND DEEP LEARNING MODELS FOR MALWARE CLASSIFICATION”. Vestnik KazUTB, vol. 4, no. 25, Dec. 2024, doi:10.58805/kazutb.v.4.25-559.

Abstract

The rapid growth of cyber threats and attacks has highlighted the need for robust information security, confidentiality, and integrity measures. Malware, a significant category of cyber threats, is designed to disrupt operations, damage information environments, and gain unauthorized access to systems, networks, and data. Various types of malware, including viruses, worms, trojans, spyware, and rootkits, pose pervasive and evolving dangers, often spread through the internet or removable devices. While effective against known threats, traditional signature-based detection methods struggle to identify new malware. Modern machine learning-based approaches offer a more flexible solution by learning from large datasets without relying on predefined signatures. This research presents a machine learning-based malware detection system using a dataset of diverse network threats. The study explores both classical machine learning algorithms and advanced deep learning models, including dense neural networks (DNN), Long Short-Term Memory (LSTM) networks, and Gated Recurrent Units (GRU), to enhance malware detection accuracy. All these models demonstrated good classification results. The Decision Tree, Random Forest, and XGBoost machine learning models were superior to neural networks by around 0.02. The experiments showed that machine learning algorithms are still strong in the classification tasks of the cybersecurity field. Among neural networks, the simple DNN model was a little worse than LSTM and GRU by around 0.01. The recurrent LSTM and GRU models showed mostly identical scores. Another specification of the conducted experiments is that the Random Forest model reached the metrics score of 0.99, being the best among all others.