Malware Classification and Detection using Variations of Machine Learning Algorithm Models

Authors

  • Andi Maslan Universitas Putera Batam
  • Abdul Hamid University Tunn onn Husein Malaysia

DOI:

https://doi.org/10.26555/jiteki.v11i1.30477

Keywords:

Malware, Machine Learning, SVM, KNN, Neural Network, Networking

Abstract

Malware attacks are attacks carried out by an attacker by sending malicious codes to various files or even many packages and servers. Therefore, reliable network operations are a factor that needs to be considered to prevent attacks as early as possible in order to avoid more severe system damage. Types of attacks can be Ping of Death, flooding, remote-controlled attacks, UDP flooding, and Smurf Attacks.  Attack data was obtained from the ClaMP dataset, which has an unbalanced data set, and has very high noise, so it is necessary to analyze data packets in network logs and optimize feature extraction which is then analyzed statistically with machine learning algorithms. The purpose of the study is to detect, classify malware attacks using a variety of ML Algorithm models such as SVM, KNN and Neural Network and testing detection performance. The research stage starts from pre-Processing, extraction, feature selection and classification processes and performance testing. Training and testing data in the study used a mixed model, namely data division, split model and cross validation. The results of the study concluded that the best algorithm for detecting malware packages is the Neural Network for the Feature Combination category with an accuracy rate of 96.91%, Recall of 97.35% and Precision of 96.78%. So that the study can have implications for cyber experts to be able to prevent malware attacks early. While further research requires a special algorithm to improve malware attack detection, in addition to KNN, SVM and Neural Network. And another research challenge is to focus on feature extraction techniques on datasets that have unbalanced or varied features with the Natural Language Processing (NLP) approach. So this research can be used as a reference for researchers who are conducting research in the same field.

References

[1] A. Mehrban and P. Ahadian, “Malware Detection in IoT Systems using Machine Learning Techniques,” International Journal of Wireless & Mobile Networks, vol. 15, no. 6, pp. 13–23, 2023, https://doi.org/10.5121/ijwmn.2023.15602.

[2] A. Kamboj, P. Kumar, A. K. Bairwa, and S. Joshi, “Detection of malware in downloaded files using various machine learning models,” Egyptian Informatics Journal, vol. 24, no. 1, pp. 81–94, 2023, https://doi.org/10.1016/j.eij.2022.12.002.

[3] A. R. Damanik, H. B. Seta, and T. Theresiawati, “Analisis Trojan Dan Spyware Menggunakan Metode Hybrid Analysis,” Jurnal Ilmiah Matrik, vol. 25, no. 1, pp. 89–97, 2023, https://doi.org/10.33557/jurnalmatrik.v25i1.2327.

[4] L. M. Kadhum, A. Firdaus, S. I. Hisham, W. Mushtaq, and M. F. A. Razak, “Features, Analysis Techniques, and Detection Methods of Cryptojacking Malware: A Survey,” International Journal on Informatics Visualization, vol. 8, no. 2, pp. 891–896, 2024, https://doi.org/10.62527/joiv.8.2.2725.

[5] R. U. Khan, X. Zhang, R. Kumar, A. Sharif, N. A. Golilarz, and M. Alazab, “An adaptive multi-layer botnet detection technique using machine learning classifiers,” Applied Sciences (Switzerland), vol. 9, no. 11, 2019, https://doi.org/10.3390/app9112375.

[6] I. Ben, A. Ouahab, L. Elaachak, and M. Bouhorma, “Image-Based Malware Classification Using Multi-layer Perceptron Image-Based Malware Classification Using Multi-layer Perceptron,” Intelligent Systems and Security: Proceedings of NISS pp. 453-464, 2021, https://doi.org/10.1007/978-981-16-3637-0.

[7] D. Stiawan, S. M. Daely, A. Heryanto, N. Afifah, M. Y. Idris, and R. Budiarto, “Ransomware detection based on opcode behaviour using k-nearest neighbours algorithm,” Information Technology and Control, vol. 50, no. 3, pp. 495–506, 2021, https://doi.org/10.5755/j01.itc.50.3.25816.

[8] J. Jiang and F. Zhang, “Detecting Portable Executable Malware by Binary Code Using an Artificial Evolutionary Fuzzy LSTM Immune System,” Security and Communication Networks, p. 3578695 2021, https://doi.org/10.1155/2021/3578695.

[9] P. Udayakumar, S. Yalamati, L. Mohan, M. J. Haque, G. Narkhede, and K. M. Bhashyam, “Android malware detection using GIST based machine learning and deep learning techniques,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 35, no. 2, pp. 1244–1252, 2024, https://doi.org/10.11591/ijeecs.v35.i2.pp1244-1252.

[10] E. Irshad and A. Basit Siddiqui, “Cyber threat attribution using unstructured reports in cyber threat intelligence,” Egyptian Informatics Journal, vol. 24, no. 1, pp. 43–59, 2023, https://doi.org/10.1016/j.eij.2022.11.001.

[11] J. A. Mata-Torres, E. Tello-Leal, J. D. Hernandez-Resendiz, and U. M. Ramirez-Alcocer, “Evaluation of Machine Learning Techniques for Malware Detection,” Intelligent Systems Reference Library, vol. 226, pp. 121–140, 2023, https://doi.org/10.1007/978-3-031-08246-7_6.

[12] A. Martín, R. Lara-Cabrera, and D. Camacho, “Android malware detection through hybrid features fusion and ensemble classifiers: The AndroPyTool framework and the OmniDroid dataset,” Information Fusion, vol. 52, pp. 128–142, 2019, https://doi.org/10.1016/j.inffus.2018.12.006.

[13] C. Singha, V. K. Rana, Q. B. Pham, D. C. Nguyen, and E. Łupikasza, “Integrating machine learning and geospatial data analysis for comprehensive flood hazard assessment,” Environmental Science and Pollution Research, vol. 31, no. 35, pp. 48497–48522, 2024, https://doi.org/10.1007/s11356-024-34286-7.

[14] K. K. Verma, B. M. Singh, and A. Dixit, “A review of supervised and unsupervised machine learning techniques for suspicious behavior recognition in intelligent surveillance system,” International Journal of Information Technology (Singapore), vol. 14, no. 1, pp. 397–410, 2022, https://doi.org/10.1007/s41870-019-00364-0.

[15] J. Pavithra and S. Samy, “A Comparative Study on Detection of Malware and Benign on the Internet Using Machine Learning Classifiers,” Mathematical Problems in Engineering, p. 4893390, 2022, https://doi.org/10.1155/2022/4893390.

[16] G. Farahani, “Feature Selection Based on Cross-Correlation for the Intrusion Detection System,” Security and Communication Networks, vol. 2020, pp. 1–17, 2020, https://doi.org/10.1155/2020/8875404.

[17] M. Jedh, L. Ben Othmane, N. Ahmed, and B. Bhargava, “Detection of Message Injection Attacks onto the CAN Bus Using Similarities of Successive Messages-Sequence Graphs,” IEEE Transactions on Information Forensics and Security, vol. 16, no. 1, pp. 4133–4146, 2021, https://doi.org/10.1109/TIFS.2021.3098162.

[18] Y. Bouchlaghem, Y. Akhiat, and S. Amjad, “Feature Selection: A Review and Comparative Study,” E3S Web of Conferences, vol. 351, no. May, 2022, https://doi.org/10.1051/e3sconf/202235101046.

[19] A. Fricticarani and H. Maksum, “Improving Student Activity and Learning Outcomes by Applying the Jigsaw Type Learning Model in PPHP Skills Study,” Journal of Education Research and Evaluation, vol. 4, no. 4, p. 296, 2020, https://doi.org/10.23887/jere.v4i4.30240.

[20] T. A. Tuan, H. V. Long, L. H. Son, R. Kumar, I. Priyadarshini, and N. T. K. Son, “Performance evaluation of Botnet DDoS attack detection using machine learning,” Evolutionary Intelligence, vol. 13, no. 2, pp. 283–294, 2020, https://doi.org/10.1007/s12065-019-00310-w.

[21] K. Swapna, “Semi-Supervised Machine Learning For Ddos Attack Classification Using Clustering Based,” IEEE 10th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON), vol. 12, no. 12, pp. 472–478, 2021, https://doi.org/10.1109/UEMCON47517.2019.8993021.

[22] K. Bouzoubaa, Y. Taher, and B. Nsiri, “Predicting DOS-DDOS Attacks: Review and Evaluation Study of Feature Selection Methods based on Wrapper Process,” International Journal of Advanced Computer Science and Applications, vol. 12, no. 5, pp. 132–145, 2021, https://doi.org/10.14569/IJACSA.2021.0120517.

[23] M. Aamir and S. M. Ali Zaidi, “Clustering based semi-supervised machine learning for DDoS attack classification,” Journal of King Saud University - Computer and Information Sciences, vol. 33, no. 4, pp. 436–446, 2021, https://doi.org/10.1016/j.jksuci.2019.02.003.

[24] Suyanto, Machine Learning Tingkat Dasar dan Lanjut. Informatika, 2018.

[25] M. Hemmat Esfe, D. Toghraie, F. Amoozadkhalili, and S. Alidoust, “Optimization of accuracy in estimating the dynamic viscosity of MWCNT-CuO/oil 10W40 nano-lubricants,” Egyptian Informatics Journal, vol. 24, no. 1, pp. 117–128, 2023, https://doi.org/10.1016/j.eij.2022.12.006.

[26] S. Brindha, M. P. Abirami, V. Arjun, B. Logesh, and M. S. P, “Heuristic Approach to Intrusion Detection System,” Int Res J Eng Technol, vol. 7, no. 3, pp. 377–379, 2020, https://mail.irjet.net/archives/V7/i3/IRJET-V7I381.pdf'.

[27] S. U. Nisa, A. Mahmood, F. S. Ujager, and M. Malik, “HIV/AIDS predictive model using random forest based on socio-demographical, biological and behavioral data,” Egyptian Informatics Journal, vol. 24, no. 1, pp. 107–115, 2023, https://doi.org/10.1016/j.eij.2022.12.005.

[28] A. Khraisat, I. Gondal, P. Vamplew, and J. Kamruzzaman, “Survey of intrusion detection systems: techniques, datasets and challenges,” Cybersecurity, vol. 2, no. 1, 2019, https://doi.org/10.1186/s42400-019-0038-7.

[29] I. Anggraeni and D. M. Akhmad, “Detection and Classification of DDos Attack on Software Defined Network,” Komputasi: Jurnal Ilmiah Ilmu Komputer dan Matematika, vol. 19, no. 2, pp. 77–86, 2022, https://doi.org/10.33751/komputasi.v19i2.4769.

[30] R. H. Abbas and F. A. E. A. Kareem, “Text Language Identification Using Letters (Frequency, Self-information, and Entropy) Analysis for English, French, and German Languages,” Journal of Southwest Jiaotong University, vol. 54, no. 4, 2019, https://doi.org/10.35741/issn.0258-2724.54.4.21.

[31] M. B. Al Amin, R. S. Ilmiaty, and A. Marlina, “Flood Hazard Mapping in Residential Area Using Hydrodynamic Model HEC-RAS 5.0,” Geoplanning, vol. 7, no. 1, pp. 25–36, 2020, https://doi.org/10.14710/geoplanning.7.1.25-36.

[32] J. P. Tanjung, F. C. Tampubolon, A. W. Panggabean, and M. A. A. Nandrawan, “Customer Classification Using Naive Bayes Classifier With Genetic Algorithm Feature Selection,” Sinkron, vol. 8, no. 1, pp. 584–589, 2023, https://doi.org/10.33395/sinkron.v8i1.12182.

[33] N. Pachhala, S. Jothilakshmi, and B. P. Battula, “Cross-Platform Malware Classification: Fusion of CNN and GRU Models,” International Journal of Safety and Security Engineering, vol. 14, no. 2, pp. 477–486, 2024, https://doi.org/10.18280/ijsse.140215.

[34] S. Singh, D. Krishnan, V. Vazirani, V. Ravi, and S. A. Alsuhibany, “Deep hybrid approach with sequential feature extraction and classification for robust malware detection,” Egyptian Informatics Journal, vol. 27, p. 100539, 2024, https://doi.org/10.1016/j.eij.2024.100539.

[35] D. Stiawan et al., “CICIDS-2017 Dataset Feature Analysis with Information Gain for Anomaly Detection,” IEEE Access, vol. 8, pp. 132911–132921, 2020, https://doi.org/10.1109/ACCESS.2020.3009843.

[36] B. Santosa, A. Umam. Data Mining and Big Data Analytics: Teori dan Implementasi Menggunakan Python & Apache Spark. Yogyakarta: Penebar Media Pustaka, 2018, https://books.google.co.id/books/about/Data_Mining_dan_Big_Data_Analytics_Teori.html?id=w1nUDwAAQBAJ&redir_esc=y.

[37] J. S. Pimentel, R. Ospina, and A. Ara, “A novel fusion Support Vector Machine integrating weak and sphere models for classification challenges with massive data,” Decision Analytics Journal, vol. 11, p. 100457, 2024, https://doi.org/10.1016/j.dajour.2024.100457.

[38] C. Gambella, B. Ghaddar, and J. Naoum-Sawaya, “Optimization problems for machine learning: A survey,” European Journal of Operational Research, vol. 290, no. 3, pp. 807–828, 2021, https://doi.org/10.1016/j.ejor.2020.08.045.

[39] R. Guido, S. Ferrisi, D. Lofaro, and D. Conforti, “An Overview on the Advancements of Support Vector Machine Models in Healthcare Applications: A Review,” Information (Switzerland), vol. 15, no. 4, 2024, https://doi.org/10.3390/info15040235.

[40] F. Talpur, I. A. Korejo, A. A. Chandio, A. Ghulam, and M. S. H. Talpur, “ML-Based Detection of DDoS Attacks Using Evolutionary Algorithms Optimization,” Sensors, vol. 24, no. 5, 2024, https://doi.org/10.3390/s24051672.

[41] F. S. Prity et al., “Machine learning-based cyber threat detection: an approach to malware detection and security with explainable AI insights,” Human-Intelligent Systems Integration, pp. 1-30, 2024, https://doi.org/10.1007/s42454-024-00055-7.

[42] M. Najafimehr, S. Zarifzadeh, and S. Mostafavi, "A hybrid machine learning approach for detecting unprecedented DDoS attacks," The Journal of Supercomputing, vol. 78, no. 6, pp. 8106-8136, 2022, https://doi.org/10.1007/s11227-021-04253-x.

[43] H. Polat, O. Polat, and A. Cetin, “Detecting DDoS attacks in software-defined networks through feature selection methods and machine learning models,” Sustainability (Switzerland), vol. 12, no. 3, 2020, https://doi.org/10.3390/su12031035.

[44] A. G. Ismaeel et al., “Traffic Pattern Classification in Smart Cities Using Deep Recurrent Neural Network,” Sustainability (Switzerland), vol. 15, no. 19, pp. 1–17, 2023, https://doi.org/10.3390/su151914522.

[45] Z. Chiba, N. Abghour, K. Moussaid, A. El, and M. Rida, “Intelligent and Improved Self-Adaptive Anomaly based Intrusion Detection System for Networks,” International Journal of Communication Networks and Information Security, vol. 11, no. 2, pp. 312–330, 2019, https://www.proquest.com/openview/7971f25b8413893124408e5e35bd695a/1?pq-origsite=gscholar&cbl=52057.

[46] S. Sadhwani, B. Manibalan, R. Muthalagu, and P. Pawar, “A Lightweight Model for DDoS Attack Detection Using Machine Learning Techniques,” Applied Sciences (Switzerland), vol. 13, no. 17, 2023, https://doi.org/10.3390/app13179937.

[47] T. Hairani, “Botnet Detection Using K-Nearest Neighbor Algorithm,”Review Point, 2018, https://reviewpoint.org/blog/botnet-detection-using-the-k.

[48] L. Hammood, İ. A. Doğru, and K. Kılıç, “Machine Learning-Based Adaptive Genetic Algorithm for Android Malware Detection in Auto-Driving Vehicles,” Applied Sciences (Switzerland), vol. 13, no. 9, 2023, https://doi.org/10.3390/app13095403.

[49] K. Kumari and M. Mrunalini, “Detecting Denial of Service attacks using machine learning algorithms,” Journal of Big Data, vol. 9, no. 1, 2022, https://doi.org/10.1186/s40537-022-00616-0.

[50] H. Al-Khshali and M. Ilyas, “Impact of Portable Executable Header Features on Malware Detection Accuracy,” Computers, Materials and Continua, vol. 74, pp. 153–178, Aug. 2022, https://doi.org/10.32604/cmc.2023.032182.

Downloads

Published

2025-03-05

How to Cite

[1]
A. Maslan and A. Hamid, “Malware Classification and Detection using Variations of Machine Learning Algorithm Models”, J. Ilm. Tek. Elektro Komput. Dan Inform, vol. 11, no. 1, pp. 27–41, Mar. 2025.

Issue

Section

Articles

Similar Articles

<< < 1 2 3 4 

You may also start an advanced similarity search for this article.