Performance of Lexical Resource and Manual Labeling on Long Short-Term Memory Model for Text Classification

Mardhiya Hayaty, Aqsal Harris Pratama

Abstract


Data labeling is a critical aspect of sentiment analysis that requires assigning labels to text data to reflect the sentiment expressed. Traditional methods of data labeling involve manual annotation by human annotators, which can be both time-consuming and costly when handling large volumes of text data. Automation of the data labeling process can be achieved through the utilization of lexicon resources, which consist of pre-labeled dictionaries or databases of words and phrases in sentiment information. The contribution of this study is an evaluation of the performance of lexicon resources in document labeling. The evaluation aims to provide insight into the accuracy of using lexicon resources and inform future research. In this study, a publicly available dataset was utilized and labeled as negative, neutral, and positive. To generate new labels, a lexicon resource such as VADER, AFINN, SentiWordNet, and Liu & Hu was employed. An LSTM model was then trained using the newly generated labels. The performance of the trained model was evaluated by testing it on data that had been manually labeled. The study found manual labeling led to highest accuracy of 0.79, 0.80, and 0.80 for training, validation, and testing respectively. This is likely due to manual creation of test data labels, enabling the model to learn and capture balanced patterns. Models using lexicon resources (VADER and AFINN) had lower accuracy of 0.54 and 0.56. SentiWordNet had lowest accuracy among all models with 0.49, and Liu&Hu model had the lowest testing score of 0.26. Our research indicates that lexicon resources alone are not sufficient for sentiment data labeling as they are dependent on pre-defined dictionaries and may not fully capture the context of words within a sentence, thus, manual labeling is necessary to complement lexicon-based methods to achieve better result.

Keywords


Lexicon resource; Data Labeling; Long Short- Term Memory

Full Text:

PDF


DOI: http://dx.doi.org/10.26555/jiteki.v9i1.25375

Refbacks

  • There are currently no refbacks.


Copyright (c) 2023 Mardhiya Hayaty, Aqsal Harris Pratama

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.


 
About the JournalJournal PoliciesAuthor Information
 


Jurnal Ilmiah Teknik Elektro Komputer dan Informatika
ISSN 2338-3070 (print) | 2338-3062 (online)
Organized by Electrical Engineering Department - Universitas Ahmad Dahlan
Published by Universitas Ahmad Dahlan
Website: http://journal.uad.ac.id/index.php/jiteki
Email 1: jiteki@ee.uad.ac.id
Email 2: alfianmaarif@ee.uad.ac.id
Office Address: Kantor Program Studi Teknik Elektro, Lantai 6 Sayap Barat, Kampus 4 UAD, Jl. Ringroad Selatan, Tamanan, Kec. Banguntapan, Bantul, Daerah Istimewa Yogyakarta 55191, Indonesia