Classification of breast cancer grades using physical parameters and K-nearest neighbor method

Breast cancer is a health problem in the world. To overcome this problem requires early detection of breast cancer. The purpose of this study is to classify early breast cancer grades. Combination of physical parameters with k-nearest neighbor Method is proposed to detect early breast cancer grades. The experiments were performed on 87 mammograms consisting of 12 mammograms of grade 1,41 mammograms of grade 2 and 34 mammogram of grade 3. The proposed method was effective to classify the grades of breast cancer by an accuracy of 64.36%, 50% sensitivity and 73,5% specitifity. Physical parameters can be used to classify grades of breast cancer. The results of this study can be used to complement the diagnosis of breast mammography examination.


Introduction
Breast cancer is a health problem in the world. To overcome this problem requires early detection of breast cancer. Discovered microcalsification is a sign of breast cancer. Many methods have successfully detected the presence of microcalsification [1][2][3][4][5][6]. However, the discovery of microcalsification is not enough to classify the breast cancer grades. Nezha H [7] classified breast cancer using the Quantum Clustering and Wavelet method. Shofwatul U [8] classified malignant and benign lesions using Feature Selection method. Seyyid A M [9] classified breast cancer using the K-Nearest Neighbor method with different distances. Mandeep R [10] classified malignant and benign breast cancer lesions using the Machine Learning Techniques method. Anggrek C N [11] classified normal and abnormal breast cancer using the K-Nearest Neighbor method. All the researchers mentioned above, none of them classifies breast cancer grade To classify the grades of breast cancer typically used the methods of Tumor Node Metastase [12] and Scarff Bloom Richardson [13] are used. In this study, we proposed a new method for classifying breast cancer grades using a combination of physical parameters using the K-nearest neighbor method. The updated feature of our study is to use the physical parameters contained in the mammogram as input to the K-nearest neighbor method.
This research needs to be done to improve the prognosis of breast Cancer patient. The uniqueness of the research is by converting from a mammogram image to a numeric to determine the grades of breast cancer without a fine needle biopsy. The results of this study are used as a complement to mammography examination.

Materials and Methods
The steps to classify breast cancer grades are as follows: the breast is photographed using a digital mammography device, then it cuts suspicious mass and is stored using 256 heat bmp format. Then the image quality is improved to make it brighter. After that, the calculation of physical parameters using (1) to (13), then statistical tests using anova test to determine the significant physical parameters to distinguish breast cancer grades, a significant parameter and then used as an input variable from the K-Nearest Neighbor method using (14), the closest distance shows the results of grades classification of breast cancer as shown in Figure 1. To classify breast cancer levels, 10 physical parameters are needed as follows: y t y r =y 1 y t y q =|y q −y r |=i (10) with H(yq,yr,d), d, y each is the probability of a pair of gray-level, the distance between the pixel and gray level value, respectively [14]. K-Nearest Neighbor is a method to classify using the distance of the nearest neighbor [15][16][17][18][19][20], expressed in (14). Many researchers use the KNN method to classify breast cancer as has it done by [21][22][23][24][25].  (14) with D, T and U respectively are the closest neighbors distance, training data, data to be tested.
The study was conducted at the Sanglah central public hospital of Bali, Prima Medika Bali hospital, and Doctor Soetomo Hospital Surabaya. This research has been approved by the research ethics committee of medical faculty of Udayana University and Sanglah central public hospital Denpasar, with approval number: 1204/UN.14.2/KEP/2017. Mammography images taken from Kodak brand mammography type dry view 6800 laser imager with setting KV=30, MAS=25, brightness=7, latitude=11, contrast=-4, movie size=18x24 cm. Total trial data of 87 mammograms consisting of 12 mammograms of grade 1,41 mammogram grade 2 and 34 mammogram grade 3. Experimental design that we use is cross section. Annova was used to find significant physical parameters in differentiating grade 1, 2 and 3. Significant variables were incorporated into KNN method to classify grading of breast cancer. Physical parameters are parameters contained in the mammographic image converted into entropy, contrast, angular second moment, inverse differential moment, mean, deviation, entropy of difference second order histogram, angular second moment of difference second order histogram and mean of difference secondorder histogram expressed in (1) through (13).

Results
Suspicious mass is shown by arrows such as Figures  , it turns out that there are significant differences in grades 1, 2, and 3. We took grade 1 images from the radiology installation room database and grade 1 status we got from the medical record of Doctor Soetomo Hospital Surabaya. In Figure 2  We took the grade 2 image from the radiology installation room database and the grade 2 status we got from the medical record of Doctor Soetomo Hospital Surabaya. In Figure 3 (a) there is shrinking of the skin around the nipples. We took the grade 3 image from the radiology installation room database and the grade 3 status we got from the medical record of Doctor Soetomo Hospital Surabaya. In Figure 4 (a) there is a very large density.
To classify grades of breast cancer using 10 physical parameters, not all physical parameters are significant for classifying grades of breast cancer. Annova statistical test is done to find a significant variable by looking at significant values smaller than 0.05. From the results of the study, only contrast variables that have significant values smaller than 0.5, as shown in Table 1 (see in Appendix). By: d is the distance between pixels; grade 1 (n=12) was taken 12 patients with level one malignancy; garde 2 (n=41) was taken 41 patients with level two malignancy; grade 3 (n=34) was taken 34 patients with level three malignancy. To determine the value of accuracy, sensitivity and specificity in this study required TP value means that if the actual grade 1 data turns out to be true grade 1, FNa means that if the actual grade 1 data turns out to be incorrect grade 1 but grade 2, FNb means the actual data Grade 1 turns out to be a non-grade 1 class, but grade 3, FP1 means that if the actual grade 2 data turns out to be incorrect grade 2, grade 1. TN1 means that if the actual grade 2 data is true the grade results actually state grade 2. FN1 means if the data actual grade 2 turns out that the result of the incorrect classification is not grade 2 but grade 3. FP2 means that if the actual grade 3 data turns out to be incorrect grade 3 but grade 1, FN2 means that the actual grade 3 data is not grade 3 but grade 2, TN2 means that the actual grade 3 data turns out to be true grade 3 classification. The formula for determining accuracy, sensitivity and specificity is as follows: The accuracy, sensitivity and specificity values are as follows: accuracy = 64.36%, sensitivity = 50%, specifity = 73.5%.
Graph Relation of grade 1, 2 and 3 to the value of contrast as Figure 5.

Discussion
In this paper we presented a new method for breast cancer grades classification based on a combination of physical parameters using the K-nearest neighbor method. The main motivation of this research is to develop the concept of early detection of breast cancer grades with emphasis on physical parameters with K-Nearest Neighbor. The method we propose gives good results. Evaluation was done by taking new data as many as 87 pictures from Doctor Soetomo Hospital Surabaya obtained accuracy, sensitivity and specificity are 64.36, 50 and 73.5% respectively. Our method is very stable and reliable. During our classification testing we have achieved good results regardless of the K factor value in the K-nearest neighbor algorithm. The test has successfully determined the ac`curacy, sensitivity and specificity of the method we propose. Tests have shown that the method we propose is sensitive to the type of breast cancer grades. Analysis Nine physical parameters show that not all physical parameters have a significant impact on classifying breast cancer grades. Because of this, significant parameters are needed to improve preprocessing and achieve better results. The combination of physical parameters and the K-nearest neighbor method has been shown to be a good choice for classifying breast cancer grades. The method we propose provides the ability to improve the classification of breast cancer grades.

Conclusion
The combination of physical parameters with K-nearest neighbor method is expected to detect early breast cancer grades. From the experimental results turned out contrast parameters as input method K-nearest neighbor able to classify the grades of breast cancer well. Future research prospects were developed using a combination of physical parameters with adaptive neuro fuzzy method, gynecological algorithm, fuzzy logic, c-mean clustering, neural network and support vector machine. The best results of these methods can be applied to digital mammography tools. So that digital mammography tool is able to detect early and predict the type of breast cancer before the biopsy.