Optimized Machine Learning Performance with Feature Selection for Breast Cancer Disease Classification

Authors

DOI:

https://doi.org/10.26555/jiteki.v9i4.27527

Keywords:

Breast Cancer, Classification, Machine Learning, Supervised Algorithm, PCA

Abstract

The prevalence of breast cancer is relatively high among adults worldwide. Particularly in Indonesia, according to the latest data from the World Health Organization (WHO), breast cancer accounts for 1.41% of all deaths and continues to increase. In order to address this growing issue, a proactive approach becomes essential. Therefore, the objective of this study is to classify the diagnosis of breast cancer into two categories: Benign and Malignant. Moreover, this classification pattern can serve as a benchmark for early detection and is expected to reduce mortality and cancer rates in breast cancer cases. The dataset used in this study is obtained from Kaggle and consists of 569 rows with 32 attributes. Various machine learning algorithms, such as Artificial Neural Network (ANN), Support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF), Logistic Regression (LR), K-Nearest Neighbors (KNN), and Naïve Bayes (NB), are employed for the classification analysis in this disease. . This study uses Principal Component Analysis (PCA) for optimized feature selection techniques with dimension reduction are employed on the dataset prior to modeling the data. Our highest accuracy model is the Support Vector Machine (SVM) with an RBF kernel, utilizing c-value selection. Additionally, the Logistic Regression (LR) model achieves an accuracy of 97.3%. However, it is worth noting that the precision and recall of the SVM model are both 100%. Moreover, the Receiver Operating Characteristic (ROC) curve indicates that the SVM graph surpasses the LR graph, which can be attributed to the results obtained from the confusion matrix calculation, where the False Positive Rate is found to be 0. Consequently, the overall performance evaluation of the SVM model with an RBF kernel, along with the utilization of the c-value selection approach, is significantly superior. This is primarily due to the fact that the SVM model does not make any incorrect predictions by classifying something as positive when it is actually negative.

Downloads

Published

2023-12-23

How to Cite

[1]
K. Koirunnisa, A. M. Siregar, and S. Faisal, “Optimized Machine Learning Performance with Feature Selection for Breast Cancer Disease Classification”, J. Ilm. Tek. Elektro Komput. Dan Inform, vol. 9, no. 4, pp. 1131–1143, Dec. 2023.

Issue

Section

Articles

Similar Articles

1 2 > >> 

You may also start an advanced similarity search for this article.

Most read articles by the same author(s)