Malware Detection in Portable Document Format (PDF) Files with Byte Frequency Distribution (BFD) and Support Vector Machine (SVM)

Heru Saputra, Deris Stiawan, Hadipurnawan Satria


Portable Document Format (PDF) files as well as files in several other formats such as (.docx, .hwp and .jpg) are often used to conduct cyber attacks. According to VirusTotal, PDF ranks fourth among document files that are frequently used to spread malware in 2020. Malware detection is challenging partly because of its ability to stay hidden and adapt its own code and thus requiring new smarter methods to detect. Therefore, outdated detection and classification methods become less effective. Nowadays, one of such methods that can be used to detect PDF files infected with malware is a machine learning approach. In this research, the Support Vector Machine (SVM) algorithm was used to detect PDF malware because of its ability to process non-linear data, and in some studies, SVM produces the best accuracy. In the process, the file was converted into byte format and then presented in Byte Frequency Distribution (BFD). To reduce the dimensions of the features, the Sequential Forward Selection (SFS) method was used. After the features are selected, the next stage is SVM to train the model. The performance obtained using the proposed method was quite good, as evidenced by the accuracy obtained in this study, which was 99.11% with an F1 score of 99.65%. The contributions of this research are new approaches to detect PDF malware which is using BFD and SVM algorithm, and using SFS to perform feature selection with the purpose of improving model performance. To this end, this proposed system can be an alternative to detect PDF malware.


Portable document format; Malware; Byte frequency distribution; Sequential forward selection; Support vector machine

Full Text:




  • There are currently no refbacks.

Copyright (c) 2023 Heru Saputra, Deris Stiawan, Hadipurnawan Satria

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

About the JournalJournal PoliciesAuthor Information

Jurnal Ilmiah Teknik Elektro Komputer dan Informatika
ISSN 2338-3070 (print) | 2338-3062 (online)
Organized by Electrical Engineering Department - Universitas Ahmad Dahlan
Published by Universitas Ahmad Dahlan
Email 1:
Email 2:
Office Address: Kantor Program Studi Teknik Elektro, Lantai 6 Sayap Barat, Kampus 4 UAD, Jl. Ringroad Selatan, Tamanan, Kec. Banguntapan, Bantul, Daerah Istimewa Yogyakarta 55191, Indonesia