Clustering based feature selection using Partitioning Around Medoids (PAM)

Authors

  • Dewi Pramudi Ismi Department of Informatics, Universitas Ahmad Dahlan
  • Murinto Murinto Department of Informatics, Universitas Ahmad Dahlan

Keywords:

Dimensionality reduction, Feature selection, Clustering, Partitioning Around Medoids (PAM), High dimensional data

Abstract

High-dimensional data contains a large number of features. With many features, high dimensional data requires immense computational resources, including space and time. Several studies indicate that not all features of high dimensional data are relevant to classification result. Dimensionality reduction is inevitable and is required due to classifier performance improvement. Several dimensionality reduction techniques were carried out, including feature selection techniques and feature extraction techniques. Sequential forward feature selection and backward feature selection are feature selection using the greedy approach. The heuristics approach is also applied in feature selection, using the Genetic Algorithm, PSO, and Forest Optimization Algorithm. PCA is the most well-known feature extraction method. Besides, other methods such as multidimensional scaling and linear discriminant analysis. In this work, a different approach is applied to perform feature selection. Cluster analysis based feature selection using Partitioning Around Medoids (PAM) clustering is carried out. Our experiment results showed that classification accuracy gained when using feature vectors' medoids to represent the original dataset is high, above 80%.

References

G. Chandrashekar, F. Sahin. A survey on feature selection methods. Computers and Electrical Engineering. Vol. 40. Issue 1. January 2014. pp 16-28.

Z. Li, J. Liu, Y. Yang, X. Zhou, H. Lu. Clustering-Guided Sparse Structural Learning for Unsupervised Feature Selection. IEEE Transactions on Knowledge and Data Engineering. Vol. 26. Issue 9. September 2014. pp. 2138-2150.

R. Duda, P. Hart, and D. Stork. Pattern Recognition. 2nd ed. New York, NY, USA: Wiley, 2001.

H. Liu, X. Wu, and S. Zhang, “Feature selection using hierarchical feature clustering,†in Proc. ACM Int. Conf. Inform. Knowl. Manage., New York, NY, USA, 2011.

V.B. Canedo., N.S. Marono, A.A. Betanzos. A review of feature selection methods on synthetic data. Knowledge and Information System (2013) 34:483.

E. Alpaydin, Introduction to Machine Learning 2nd edition, MIT Press, 2006.

T.M. Cover, J.V.P. Campenhout. On the Possible Orderings in the Measurement Selection Problem. IEEE Transactions on Systems, Man, and Cybernatics. Vol. 7. Issue 9. September 1977. pp. 657-661.

X.Wang, J.Yang, X.Teng, W.Xia, R. Jensen. Feature Selection based on Rough Sets and Particle Swarm Optimization. Pattern Recognition Letters. Vol 28. Issue 4. March 2007. pp. 459-471.

C.L. Huang, C.J. Wang. A GA-based feature selection and parameters optimization. Expert Systems with Applications. Vol 31. Issue 2. August 2006. pp 231-240.

M. Ghaemi, M.R.F Derakhshi. Feature selection using Forest Optimization Algorithm. Pattern Recognition.Vol. 60. December 2016. pp. 121-129.

T.F. Cox, M.A.A. Cox. Multidimensional Scaling. London: Chapman and Hall.1994

G.J. McLachlan. Discriminant Analysis and Statistical Pattern Recognition. New York: Willey. 1992

S.T. Roweis, L.K. Saul. Nonlinear Dimensionality Reduction by Locally Linear Embedding. Science. Vol.290. December 2000. pp: 2323-2326.

UCI Machine Learning Repositories http:// http://archive.ics.uci.edu/ml/

D. Anguita, A. Ghio, L. Oneto, X. Parra and J.L.R.Ortiz. Human Activity Recognition on Smartphones using a Multiclass Hardware-Friendly Support Vector Machine. International Workshop of Ambient Assisted Living (IWAAL 2012). Vitoria-Gasteiz, Spain. Dec 2012.

D. Anguita, A. Ghio, L. Oneto, X. Parra, J.L.R.Ortiz. Energy Efficient Smartphone-Based Activity Recognition using Fixed-Point Arithmetic. Journal of Universal Computer Science. Special Issue in Ambient Assisted Living: Home Care. Volume 19. Issue 9. May 2013.

J.L.R. Ortiz, A. Ghio, X. Parra, D. Anguita, J. Cabestany, A. Catala. Human Activity and Motion Disorder Recognition: Towards Smarter Interactive Cognitive Environments. 21st European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, ESANN 2013. Bruges, Belgium, 24-26 April 2013.

M. van Breukelen, R.P.W. Duin, D.M.J. Tax, J.E. den Hartog, Handwritten digit recognition by combined classifiers, Kybernetika. vol. 34. no. 4. 1998. pp 381-386.

M. van Breukelen, R.P.W. Duin. Neural Network Initialization by Combined Classifiers, in: A.K. Jain, S. Venkatesh, B.C. Lovell (eds.). ICPR'98. Proc. 14th Int. Conference on Pattern Recognition (Brisbane, Aug. 16-20).1998.

Downloads

Published

2020-05-19

Issue

Section

Articles