Optimized Machine Learning Performance with Feature Selection for Breast Cancer Disease Classification

Koirunnisa Koirunnisa, Amril Mutoi Siregar, Sutan Faisal


The prevalence of breast cancer is relatively high among adults worldwide. Particularly in Indonesia, according to the latest data from the World Health Organization (WHO), breast cancer accounts for 1.41% of all deaths and continues to increase. In order to address this growing issue, a proactive approach becomes essential. Therefore, the objective of this study is to classify the diagnosis of breast cancer into two categories: Benign and Malignant. Moreover, this classification pattern can serve as a benchmark for early detection and is expected to reduce mortality and cancer rates in breast cancer cases. The dataset used in this study is obtained from Kaggle and consists of 569 rows with 32 attributes. Various machine learning algorithms, such as Artificial Neural Network (ANN), Support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF), Logistic Regression (LR), K-Nearest Neighbors (KNN), and Naïve Bayes (NB), are employed for the classification analysis in this disease. . This study uses Principal Component Analysis (PCA) for optimized feature selection techniques with dimension reduction are employed on the dataset prior to modeling the data. Our highest accuracy model is the Support Vector Machine (SVM) with an RBF kernel, utilizing c-value selection. Additionally, the Logistic Regression (LR) model achieves an accuracy of 97.3%. However, it is worth noting that the precision and recall of the SVM model are both 100%. Moreover, the Receiver Operating Characteristic (ROC) curve indicates that the SVM graph surpasses the LR graph, which can be attributed to the results obtained from the confusion matrix calculation, where the False Positive Rate is found to be 0. Consequently, the overall performance evaluation of the SVM model with an RBF kernel, along with the utilization of the c-value selection approach, is significantly superior. This is primarily due to the fact that the SVM model does not make any incorrect predictions by classifying something as positive when it is actually negative.


Breast Cancer; Classification; Machine Learning; Supervised Algorithm; PCA

Full Text:


DOI: http://dx.doi.org/10.26555/jiteki.v9i4.27527


  • There are currently no refbacks.

Copyright (c) 2023 Koirun Nisa, Amril Mutoi Siregar, Sutan Faisal

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

About the JournalJournal PoliciesAuthor Information

Jurnal Ilmiah Teknik Elektro Komputer dan Informatika
ISSN 2338-3070 (print) | 2338-3062 (online)
Organized by Electrical Engineering Department - Universitas Ahmad Dahlan
Published by Universitas Ahmad Dahlan
Website: http://journal.uad.ac.id/index.php/jiteki
Email 1: jiteki@ee.uad.ac.id
Email 2: alfianmaarif@ee.uad.ac.id
Office Address: Kantor Program Studi Teknik Elektro, Lantai 6 Sayap Barat, Kampus 4 UAD, Jl. Ringroad Selatan, Tamanan, Kec. Banguntapan, Bantul, Daerah Istimewa Yogyakarta 55191, Indonesia