Neural network technique with deep structure for improving author homonym and synonym classification in digital libraries
Firdaus Firdaus, Siti Nurmaini, Varindo Ockta Keneddi Putra, Annisa Darmawahyuni, Reza Firsandaya Malik, Muhammad Naufal Rachmatullah, Andre Herviant Juliano, Tio Artha Nugraha
Author name disambiguation (AND), also recognized as name-identification, has long been seen as a challenging issue in bibliographic data. In other words, the same author may appear under separate names, synonyms, or distinct authors may have similar to those referred to as homonyms. Some previous research has proposed AND problem. To the best of our knowledge, no study discussed specifically synonym and homonym, whereas such cases are the core in AND topic. This paper presents the classification of non-homonym-synonym, homonym-synonym, synonym, and homonym cases by using the DBLP computer science bibliography dataset. Based on the DBLP raw data, the classification process is proposed by using deep neural networks (DNNs). In the classification process, the DBLP raw data divided into five features, including name, author, title, venue, and year. Twelve scenarios are designed with a different structure to validate and select the best model of DNNs. Furthermore, this paper is also compared DNNs with other classifiers, such as support vector machine (SVM) and decision tree. The results show DNNs outperform SVM and decision tree methods in all performance metrics. The DNNs performances with three hidden layers as the best model, achieve accuracy, sensitivity, specificity, precision, and F1-score are 98.85%, 95.95%, 99.26%, 94.80%, and 95.36%, respectively. In the future, DNNs are more performing with the automated feature representation in AND processing.
author name disambiguation; bibliographic data; deep neural networks; homonym; symonym;