Distance Functions Study in Fuzzy C-Means Core and Reduct Clustering
DOI:
https://doi.org/10.26555/jiteki.v7i1.20516Keywords:
Fuzzy C-Means, Objective Function, FCM Distance FunctionAbstract
Fuzzy C-Means is a distance-based clustering process which applied by fuzzy logic concept. Clustering process worked in linear to the iteration process to minimizing the objective function. The objective function is an addition of the multiplication between the coordinates distance towards their closest cluster centroid and their membership degree. The more the iteration process, the objective function should get lower and lower. The objective of this research is to observe whether the distances which usually applied are able to fulfill the aforementioned hypothesis for determining the most suitable distance for Fuzzy C-Means clustering application. Few distance function was applied in the same dataset. 5 standard datasets and 2 random datasets were used to test the fuzzy c-means clustering performance with the 7 different distance function. Accuracy, purity, and Rand Index also applied to measure the quality of the resulted cluster. The observation result depicted that the distance function which resulted in the best quality of clusters are Euclidean, Average, Manhattan, Minkowski, Minkowski-Chebisev, and Canberra distance. These 6 distances were able to fulfill the basic hypothesis of the objective function behavior on Fuzzy C-Means Clustering method. The only distance who were not able to fulfill the basic hypothesis is Chebisev distance.
References
B. Marr, Big Data in Practice, 1st ed., vol. 1, no. 1. West Sussex: Wiley, 2016. https://doi.org/10.1002/9781119278825
Y. Riahi and S. Riahi, “Big Data and Big Data Analytics: Concepts, Types and Technologies Big Data and Big Data Analytics: Concepts, Types and Technologies,†Int. J. Res. Eng., vol. 5, no. 9, pp. 524–528, 2018. https://doi.org/10.21276/ijre.2018.5.9.5
J. Eliyanto, Sugiyarto, Suparman, I. Djakaria, and M. A. H. Ruhama, “Dimension reduction using core and reduct to improve fuzzy C-means clustering performance,†Technol. Reports Kansai Univ., vol. 62, no. 6, pp. 2855–2867, 2020.
W. Purba, S. Tamba, and J. Saragih, “The effect of mining data k-means clustering toward students profile model drop out potential,†J. Phys. Conf. Ser., vol. 1007, no. 1, pp. 0–6, 2018. https://doi.org/10.1088/1742-6596/1007/1/012049
D. P. Ismi, S. Panchoo, and Murinto, “K-means clustering based filter feature selection on high dimensional data,†Int. J. Adv. Intell. Informatics, vol. 2, no. 1, pp. 38–45, 2016. https://doi.org/10.26555/ijain.v2i1.54
E. Hardika and S. Atmaja, “Implementation of k-Medoids Clustering Algorithm to Cluster Crime Patterns in Yogyakarta,†Int. J. Appl. Sci. Smart Technol., vol. 1, no. 1, pp. 33–44, 2019. https://doi.org/10.24071/ijasst.v1i1.1859
K. V. Rajkumar, A. Yesubabu, and K. Subrahmanyam, “Fuzzy clustering and Fuzzy C-Means partition cluster analysis and validation studies on a subset of CiteScore dataset,†Int. J. Electr. Comput. Eng., vol. 9, no. 4, pp. 2760–2770, 2019. https://doi.org/10.11591/ijece.v9i4.pp2760-2770
A. Gosain and S. Dahiya, “Performance Analysis of Various Fuzzy Clustering Algorithms: A Review,†Procedia Comput. Sci., vol. 79, pp. 100–111, 2016. https://doi.org/10.1016/j.procs.2016.03.014
A. A. hussian Hassan, W. M. Shah, M. F. I. Othman, and H. A. H. Hassan, “Evaluate the performance of K-Means and the fuzzy C-Means algorithms to formation balanced clusters in wireless sensor networks,†Int. J. Electr. Comput. Eng., vol. 10, no. 2, pp. 1515–1523, 2020. https://doi.org/10.11591/ijece.v10i2.pp1515-1523
A. Nurzahputra, M. A. Muslim, and R. Kurniawan, “Online Fuzzy C-Means Clustering for Lecturer Performance Assessment Based on National and International Journal Publication,†in International Conference on Mathematics, Science, and Education, 2016.
S. Kapil and M. Chawla, “Performance Evaluation ofK-means Clustering Algorithm with Various Distance Metrics,†1st IEEE Int. Conf. Power Electron. Intell. Control Energy Syst., pp. 1–4, 2016. https://doi.org/10.5120/19360-0929
M. K. Arzoo, A. Prof, and K. Rathod, “K-Means algorithm with different distance metrics in spatial data mining with uses of NetBeans IDE 8.2,†Int. Res. J. Eng. Technol., vol. 4, no. 4, pp. 2363–2368, 2017.
B. Charulatha, P. Rodrigues, T. Chitralekha, and A. Rajaraman, “A Comparative study of different distance metrics that can be used in Fuzzy Clustering Algorithms,†Int. J. Emerg. Trends Technol. Comput. Sci., vol. 2013, pp. 2–5, 2013.
A. Singh, A. Yadav, and A. Rana, “K-means with Three Different Distance Metrics,†Int. J. Comput. Appl., vol. 67, no. 10, pp. 13–17, 2013. https://doi.org/10.5120/11430-6785
P. Grabusts, “The choice of metrics for clustering algorithms,†Vide. Tehnol. Resur. - Environ. Technol. Resour., vol. 2, no. 1, pp. 70–78, 2011. https://doi.org/10.17770/etr2011vol2.973
Mahatme and Boyar, “Impact of Distance Metrics on the Performace of K-Means and Fuzzy C-means Clustering - an Approach to Assess Student’s performance in E-Learning Environment,†International Journal of Advanced Research in Computer Science, vol. 9, no. 1, pp. 888–892, 2018. https://doi.org/10.26483/ijarcs.v9i1.5417
S. Surono and R. D. A. Putri, “Optimization of Fuzzy C-Means Clustering Algorithm with Combination of Minkowski and Chebyshev Distance Using Principal Component Analysis,†Int. J. Fuzzy Syst., vol. 23, no. 1, pp. 139–144, 2020. https://doi.org/10.1007/s40815-020-00997-5
A. S. Shirkhorshidi, S. Aghabozorgi, and T. Ying Wah, “A Comparison study on similarity and dissimilarity measures in clustering continuous data,†PLoS One, vol. 10, no. 12, pp. 1–20, 2015. https://doi.org/10.1371/journal.pone.0144059
B. R. A. Moreira et al., “Classifying Hybrids of Energy Cane for Production of Bioethanol and Cogeneration of Biomass-Based Electricity by Principal Component Analysis-Linked Fuzzy C-Means Clustering Algorithm,†J. Agric. Sci., vol. 11, no. 14, p. 246, 2019. https://doi.org/10.5539/jas.v11n14p246
M. M. Deris, N. Senan, Z. Abdullah, R. Mamat, and B. Handaga, “Dimensional reduction using conditional entropy for incomplete information systems,†Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 11657, pp. 263–272, 2019. https://doi.org/10.1007/978-3-030-25636-4_21
R. Zhao, L. Gu, and X. Zhu, “Combining fuzzy C-means clustering with fuzzy rough feature selection,†Appl. Sci., vol. 9, no. 4, 2019. https://doi.org/10.3390/app9040679
D. Dua and C. Graff, “UCI Machine Learning Repository.†University of California, School of Information and Computer Science., Irvine, 2019. http://archive.ics.uci.edu/ml
E. O. Rodrigues, “Combining Minkowski and Chebyshev: New Distances Proposal and Survey of Distances Metrics Using K-Nearest Neighbours Classifier,†Elsevier, 2018. https://doi.org/10.1016/j.patrec.2018.03.021
F. Wang, H. H. Franco-Penya, J. D. Kelleher, J. Pugh, and R. Ross, “An analysis of the application of simplified silhouette to the evaluation of k-means clustering validity,†Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 10358, 2017. https://doi.org/10.1007/978-3-319-62416-7_21
M. F. Dzulkalnine and R. Sallehuddin, “Missing data imputation with fuzzy feature selection for diabetes dataset,†SN Appl. Sci., vol. 1, no. 4, 2019. https://doi.org/10.1007/s42452-019-0383-x
M. Sammany and T. Medhat, “Dimensionality reduction using rough set approach for two neural networks-based applications,†Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 4585 LNAI, pp. 639–647, 2007. https://doi.org/10.1007/978-3-540-73451-2_67
Downloads
Published
How to Cite
Issue
Section
License
Authors who publish with JITEKI agree to the following terms:
- Authors retain copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (CC BY-SA 4.0) that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.
This work is licensed under a Creative Commons Attribution 4.0 International License