Malware Detection in Portable Document Format (PDF) Files with Byte Frequency Distribution (BFD) and Support Vector Machine (SVM)

Authors

  • Heru Saputra Universitas Sriwijaya
  • Deris Stiawan Universitas Sriwijaya
  • Hadipurnawan Satria Universitas Sriwijaya

DOI:

https://doi.org/10.26555/jiteki.v%25vi%25i.27559

Keywords:

Portable document format, Malware, Byte frequency distribution, Sequential forward selection, Support vector machine

Abstract

Portable Document Format (PDF) files as well as files in several other formats such as (.docx, .hwp and .jpg) are often used to conduct cyber attacks. According to VirusTotal, PDF ranks fourth among document files that are frequently used to spread malware in 2020. Malware detection is challenging partly because of its ability to stay hidden and adapt its own code and thus requiring new smarter methods to detect. Therefore, outdated detection and classification methods become less effective. Nowadays, one of such methods that can be used to detect PDF files infected with malware is a machine learning approach. In this research, the Support Vector Machine (SVM) algorithm was used to detect PDF malware because of its ability to process non-linear data, and in some studies, SVM produces the best accuracy. In the process, the file was converted into byte format and then presented in Byte Frequency Distribution (BFD). To reduce the dimensions of the features, the Sequential Forward Selection (SFS) method was used. After the features are selected, the next stage is SVM to train the model. The performance obtained using the proposed method was quite good, as evidenced by the accuracy obtained in this study, which was 99.11% with an F1 score of 99.65%. The contributions of this research are new approaches to detect PDF malware which is using BFD and SVM algorithm, and using SFS to perform feature selection with the purpose of improving model performance. To this end, this proposed system can be an alternative to detect PDF malware.

Author Biographies

Heru Saputra, Universitas Sriwijaya

Heru Saputra, currently a Master's student in Universitas Sriwijaya. He received her undergraduate degree in the same university, majoring in Informatics. He areas of interest include Crypthography, Machine Learning, and Cyber Security. He can be contacted at email: saputra31.heru@gmail.com.

Deris Stiawan, Universitas Sriwijaya

Deris Stiawan, received the PhD degree in Computer Engineering from Universiti Teknologi Malaysia, Malaysia. He is currently a Professor at Department of Computer Engineering, Faculty of Computer Science, Universitas Sriwijaya. His research interests include computer network, Intrusion Detection/ Prevention System, and heterogeneous network. He can be contacted at email: deris@unsri.ac.id.

Hadipurnawan Satria, Universitas Sriwijaya

Hadipurnawan Satria, received the PhD degree in Computer Science from Sun Moon University, South Korea. He is currently a Lecturer at Department of Computer Engineering, Faculty of Computer Science, Universitas Sriwijaya. His research interests include Platform-based Development, Embedded System, and Software Engineering. He can be contacted at email: hadi@ilkom.unsri.ac.id.

Downloads

Published

2024-02-09

How to Cite

[1]
H. Saputra, D. Stiawan, and H. Satria, “Malware Detection in Portable Document Format (PDF) Files with Byte Frequency Distribution (BFD) and Support Vector Machine (SVM)”, J. Ilm. Tek. Elektro Komput. Dan Inform, vol. 9, no. 4, pp. 1144–1153, Feb. 2024.

Issue

Section

Articles

Similar Articles

1 2 > >> 

You may also start an advanced similarity search for this article.