Clustering based feature selection using Partitioning Around Medoids (PAM)
Keywords:
Dimensionality reduction, Feature selection, Clustering, Partitioning Around Medoids (PAM), High dimensional dataAbstract
High-dimensional data contains a large number of features. With many features, high dimensional data requires immense computational resources, including space and time. Several studies indicate that not all features of high dimensional data are relevant to classification result. Dimensionality reduction is inevitable and is required due to classifier performance improvement. Several dimensionality reduction techniques were carried out, including feature selection techniques and feature extraction techniques. Sequential forward feature selection and backward feature selection are feature selection using the greedy approach. The heuristics approach is also applied in feature selection, using the Genetic Algorithm, PSO, and Forest Optimization Algorithm. PCA is the most well-known feature extraction method. Besides, other methods such as multidimensional scaling and linear discriminant analysis. In this work, a different approach is applied to perform feature selection. Cluster analysis based feature selection using Partitioning Around Medoids (PAM) clustering is carried out. Our experiment results showed that classification accuracy gained when using feature vectors' medoids to represent the original dataset is high, above 80%.References
G. Chandrashekar, F. Sahin. A survey on feature selection methods. Computers and Electrical Engineering. Vol. 40. Issue 1. January 2014. pp 16-28.
Z. Li, J. Liu, Y. Yang, X. Zhou, H. Lu. Clustering-Guided Sparse Structural Learning for Unsupervised Feature Selection. IEEE Transactions on Knowledge and Data Engineering. Vol. 26. Issue 9. September 2014. pp. 2138-2150.
R. Duda, P. Hart, and D. Stork. Pattern Recognition. 2nd ed. New York, NY, USA: Wiley, 2001.
H. Liu, X. Wu, and S. Zhang, “Feature selection using hierarchical feature clustering,†in Proc. ACM Int. Conf. Inform. Knowl. Manage., New York, NY, USA, 2011.
V.B. Canedo., N.S. Marono, A.A. Betanzos. A review of feature selection methods on synthetic data. Knowledge and Information System (2013) 34:483.
E. Alpaydin, Introduction to Machine Learning 2nd edition, MIT Press, 2006.
T.M. Cover, J.V.P. Campenhout. On the Possible Orderings in the Measurement Selection Problem. IEEE Transactions on Systems, Man, and Cybernatics. Vol. 7. Issue 9. September 1977. pp. 657-661.
X.Wang, J.Yang, X.Teng, W.Xia, R. Jensen. Feature Selection based on Rough Sets and Particle Swarm Optimization. Pattern Recognition Letters. Vol 28. Issue 4. March 2007. pp. 459-471.
C.L. Huang, C.J. Wang. A GA-based feature selection and parameters optimization. Expert Systems with Applications. Vol 31. Issue 2. August 2006. pp 231-240.
M. Ghaemi, M.R.F Derakhshi. Feature selection using Forest Optimization Algorithm. Pattern Recognition.Vol. 60. December 2016. pp. 121-129.
T.F. Cox, M.A.A. Cox. Multidimensional Scaling. London: Chapman and Hall.1994
G.J. McLachlan. Discriminant Analysis and Statistical Pattern Recognition. New York: Willey. 1992
S.T. Roweis, L.K. Saul. Nonlinear Dimensionality Reduction by Locally Linear Embedding. Science. Vol.290. December 2000. pp: 2323-2326.
UCI Machine Learning Repositories http:// http://archive.ics.uci.edu/ml/
D. Anguita, A. Ghio, L. Oneto, X. Parra and J.L.R.Ortiz. Human Activity Recognition on Smartphones using a Multiclass Hardware-Friendly Support Vector Machine. International Workshop of Ambient Assisted Living (IWAAL 2012). Vitoria-Gasteiz, Spain. Dec 2012.
D. Anguita, A. Ghio, L. Oneto, X. Parra, J.L.R.Ortiz. Energy Efficient Smartphone-Based Activity Recognition using Fixed-Point Arithmetic. Journal of Universal Computer Science. Special Issue in Ambient Assisted Living: Home Care. Volume 19. Issue 9. May 2013.
J.L.R. Ortiz, A. Ghio, X. Parra, D. Anguita, J. Cabestany, A. Catala. Human Activity and Motion Disorder Recognition: Towards Smarter Interactive Cognitive Environments. 21st European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, ESANN 2013. Bruges, Belgium, 24-26 April 2013.
M. van Breukelen, R.P.W. Duin, D.M.J. Tax, J.E. den Hartog, Handwritten digit recognition by combined classifiers, Kybernetika. vol. 34. no. 4. 1998. pp 381-386.
M. van Breukelen, R.P.W. Duin. Neural Network Initialization by Combined Classifiers, in: A.K. Jain, S. Venkatesh, B.C. Lovell (eds.). ICPR'98. Proc. 14th Int. Conference on Pattern Recognition (Brisbane, Aug. 16-20).1998.
Downloads
Published
Issue
Section
License
Authors who publish with Jurnal Informatika (JIFO) agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (CC BY-SA 4.0) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.