Cervical cancer classification using convolutional neural network-support vector machine

Cervical cancer is the second most common cancer in women worldwide, and occurs when there are presences of abnormal cells in the cervix, which continue to grow uncontrollably. In the early stages, cervical cancer indications are not perceptible; however, it is easily detected with different forms of machine learning methods, such as the Convolutional Neural Network (CNN). This is a popular method with a wide range of applications and known for its high accuracy value. Moreover, there is a Support Vector Machine (SVM) with several kernel functions that is commonly used in the classiﬁcation of diseases, and also known for its high accuracy value. Therefore, the combination of CNN–SVM with several linear kernels functions as classifier for the categorization of cervical cancer.


INTRODUCTION
Many countries ranked cancer as the second most common health issues, of which cervical cancer causes the most of deaths recorded [1]. Cervical cancer occurs when there are abnormal cells in the cervix, which continues to grow uncontrollably, and results in benign tumors, which later develops into cervical cancer cells that spread to other body parts [2].
This cancer is one of the most common disease in women throughout the world with nearly 500,000 women developing the disease each year, and ranked the fourth most communal malignant disease worldwide [3], [4]. In the initial stages, early cervical cancer and pre-cancer do not experience symptoms, since they do not show symptoms until the tumor is formed. Most cases are recorded in less developed countries with unavailability of effective screening systems [4].
Almost all cases are caused by human papillomavirus (HPV) and the risk factors include exposure to smoking, and immune-system dysfunction [4]. There are more than one hundred types of HPV; however, one of about 15 genotypes of carcinogenic HPV is very common among young women in their first sexual activity [5]. Besides carcinogenic risks that are linked to evolutionary species, each genotype acts as an independent infection [5].
In women's bodies, this virus produces 2 types of proteins, namely E6 and E7. Both of them are dangerous, since they deactivate certain genes that play crucial role in stopping tumor development. These two proteins also aggressively trigger the growth of uterine cell wall. This unnatural cell growth eventually causes gene changes or gene mutations, which then become the cause of cervical cancer that develops in the  The symptoms that characterized the disease are as follows, unusual bleeding from the vagina,  irregular menstrual cycles, pain in the hip, low back pain, body weakness and tiredness, weight loss when not  on a diet, loss of appetite, abnormal vaginal fluid, and leg inflammation. Cervical cancers are mostly associated with the low and middle-income countries with approximately 90% HPV vaccination programs and uorganized screening [6]. The treatment depends on the level of the disease with respect to the available resources and diagnosis made in the early-stages. Fertility-preserving surgical procedures have been the care standard for women with low-risk. The overall prognosis remains poor for women with metastatic or recurrent disease. Yet, the period of survival is less than 12 months, however, the incorporation of the anti-vascular endothelial growth factor (VEGF) agent has been able to extend it.

RESEARCH METHOD 2.1. Convolutional neural network
Convolutional neural network (CNN) is a type of deep neural networks as a result of the multilayer perceptron (MLP) [7], [8]. The difference between CNNs and MLP is their ability of being used in the detection and recognition of objects in image forms. CNNs gives better results than neural networks (NNs), due to the addition of one layer to CNNs, which is known as the convolutional layer and consist of neurons with activation functions, bias, and weight [7]. CNNs is classified into two important parts which are, feature extraction and fully-connected layer [8], [9]. Illustration of CNNs is shown in Figure 1 [10].

Feature extraction layer
Feature extraction layer "encodes" an image in the form of the object represented (feature extraction) [7]. Hence, CNNs is technically an architecture encompassing several stages, and each input and output process, features maps and numerous arrays, while the extraction layer individually comprises of two parts, as follows [11]- [13].

− Convolutional layer
Convoluted layer is the main structure of a convolutional neural network (CNN). This layer is utilized in the transformation of inputs into a form that is easily processed by going through a filter or kernel of a fixed size without losing essential convulated features [14]. In this layer, there are filters (kernels) that spread to the entire input, and each unit receives input from the previous layer. Therefore, through convolution, the input map is generated between each filter, then shifting the input and using the sum of dot products.

− Pooling
Pooling is a technique for reducing dimensions with the aid of two common approaches namely the average and maximum pooling [15]. This operation is called the max pooling when it uses the highest value, while the average pooling uses the medial value. After this, the flattening process takes place, which is the reshaping of a pooled structure into a one-dimensional vector, then placed into fully-connected neural networks or MLP for classification [14]. − MLP layers MLP layers is a fully connected multi-layer perceptron that performs the classification operation. There are three layers in MLP namely, the hidden layers, input and output layers. The activation function uses the rectified linear unit (ReLU), which is quite popular in deep learning due to its simplicity.

Fully-connected layer
Fully-connected layer functions based on the feature extraction layer, which is a multidimensional array, with flatten (reshape) in the vector feature map [16], [17]. In addition, all active neurons from previous layer are linked with the next layer as in neural networks. Therefore, in order to connect properly, individual activation (of the previous) ought to be converted into 1-D data. These usually use MLP term, which process data with proper classification [18]. Meanwhile, the contrast against convolution layers are the neurons, which are connected to a specific input area, while fully-connected occurs in almost all parts. However, both continue to perform "dot product" operations; therefore, their functions are not significantly different.

Support vector machine
Support vector machine (SVM) has received much attention in the classification aspect [19]. The main field of this study is used to develop SVM algorithm based on the statistical learning theory [20]. SVM is also known as one of the effective machine learning and has high classification efficiency [20]. Illustration of SVMs is shown in Figure 2 [21]. SVM is a machine learning algorithm for classification and regression, which was introduced by Vapnik (1990) [22]. Then, Nello Cristianini researched about SVM based on Vapnik results [23]. Subsequently, Bernhard Scholkopf developed SVM theory and kernel function [24]. In addition, SVM is an initial form for binary classification; however, it is also for multiclass categorization.

TELKOMNIKA Telecommun Comput El Control
SVM does mapping forms a higher dimensional space for supporting nonlinear classification, and constructing the maximal separating hyperplane. For instance, there is a set of firms represented by the value of their ratios { }, = 1, …,and a set of associated labels ∈ {−1, +1} which describes results as failed or healthy.
The main purpose of SVM is to find the best hyperplane that is written as; The (1) above is able to maximize the margin. The optimization problem of SVM is summarized as follow; Subject to; The (2) finds ∈ and ∈ with constrains to (3), along (weights) and (bias). Problem in (2) is quadratic optimization. Therefore, the Lagrange multipliers for each of the constraints in (2) is shown by giving the function as; When and b equal to zero, setting the derivatives of ( , , ) the equations obtained are, Then, eliminating and b from ( , , ) using (5) and (6), obtained the dual form as; From (1) which is ( ) = • + , the and of regression function is finally obtained as follows; In this study, linear kernels are used for support vector machines (SVM) [25]. Kernel function resolves linear dimension problems and also for algorithms expression in the inner product between two vectors [25]. There are several kernel functions with their parameters in Table 1.

Confusion matrix
Accuracy is one of main parameter that used to observe a classification's success. Refers to the percentage of correct answers at testing stage, confusion matrix used to measures the accuracy. The confusion matrix used is shown in Table 2 [25].
The formula of accuracy is written as: T p : Number of samples having cervical cancer and classified correctly.

RESULTS AND ANALYSIS 3.1. Data
This paper received database of cervical cancer sufferers, which consisted of 652 informations with actual amounts of 607 major and 45 minor data. The minor represented the classes that indicated the presence of cervical cancer with label '1', while the major represented the classes that do not indicate the presence of cervical cancer with label '0'. There were 25 features used in this study, namely age, number of sexual partners, first sexual intercourse, number of pregnancies, smokes (years, packs/year), hormonal contraceptives (years), intrauterine device (years), sexually transmitted diseases (STD) (number, condylomatosis, vulvo-perineal condylomatosis, syphilis, human immunodeficiency virus (HIV), number of diagnosis), diagnosis (cancer, human papillomavirus (HPV)), hinselmann, schiller, and citology.

Results
For the classification method, this research used 20% data for training and 80% data for testing. In this study, 1,000 amount of epochs were used for the convolutional neural network with the combination of several kernel functions used for the support vector machine. The results were shown in Figures 3, 4, and 5. In Figure 3 (a), there was a rise in the accuracy level of the model as many epochs increase. The blue line which stands for training data gave higher accuracy of 100%, while the orange line for testing data gave an accuracy value of 93.67%. Figure 3 (b) showed that the number of loss (error) decreases as the number of epochs decrease. The error found on training data was 0, while error on test data was 0.06. Figure 4 (a), showed that the accuracy of the model increases as many epochs increase, since the blue line (training data) gave higher accuracy than the orange line (testing data). The accuracy of the training data was 100%, while for testing data it was 92.72%. Figure 4 (b) showed that the number of loss TELKOMNIKA Telecommun Comput El Control  Cervical cancer classification using convolutional neural network-support vector … (Jane Eva Aurelia) 1609 (error) decreases as the number of epochs decrease. The error in training data was 0, while in testing data it was 0.07. Figure 5 (a) showed that the accuracy of the model increases as many epochs increase. Then, the training data (blue line) gave higher accuracy than the testing data (orange line). The accuracy of training data was 100%, while the testing data was 92.91%. Besides this, Figure 5 (b) showed that the number of loss (error) decreases as the number of epoch decrease, which was 0 for training data and 0.07 on test data. Table 3 showed the accuracy of each method.
The comparison of convolutional neural network-support vector machine with some kernels for the classification of cervical cancer, were found to properly and correctly predict data. Result showed that the convolutional neural network-support vector machine with linear kernel had the best accuracy value of 93.67% on the test data. While on the training data, all methods gave best accuracy for database categorization. Therefore, the best method for the classification of cervical cancer is the convolutional neural network-support vector machine with linear kernel.

CONCLUSION
Predicting the presence of disease by diagnosing with machine learning method help medical staff to classify ailments. An early detection of disease is important, since it makes the patient to receive a prompt right treatment, which helps to increase the chance of survival and reduce the health risk. Therefore, this research focuses on cervical cancer which is a common health problem with 652 data collected and 25 features observed. The method used was the combination of convolutional neural network-support vector machine with several kernel functions as classifier. The experimental results showed that the methods used, properly and correctly predicted the data. Based on findings, convolutional neural network-support vector machine with linear kernel is the best model for the classification of cervical cancer data as shown in Table 3. Therefore, in the future research, this method develops to give higher accuracy and uses a larger database, in order to give better results for predicting and classifying different diseases.