Performance of Lexical Resource and Manual Labeling on Long Short-Term Memory Model for Text Classification

Authors

DOI:

https://doi.org/10.26555/jiteki.v9i1.25375

Keywords:

Lexicon resource, Data Labeling, Long Short- Term Memory

Abstract

Data labeling is a critical aspect of sentiment analysis that requires assigning labels to text data to reflect the sentiment expressed. Traditional methods of data labeling involve manual annotation by human annotators, which can be both time-consuming and costly when handling large volumes of text data. Automation of the data labeling process can be achieved through the utilization of lexicon resources, which consist of pre-labeled dictionaries or databases of words and phrases in sentiment information. The contribution of this study is an evaluation of the performance of lexicon resources in document labeling. The evaluation aims to provide insight into the accuracy of using lexicon resources and inform future research. In this study, a publicly available dataset was utilized and labeled as negative, neutral, and positive. To generate new labels, a lexicon resource such as VADER, AFINN, SentiWordNet, and Liu & Hu was employed. An LSTM model was then trained using the newly generated labels. The performance of the trained model was evaluated by testing it on data that had been manually labeled. The study found manual labeling led to highest accuracy of 0.79, 0.80, and 0.80 for training, validation, and testing respectively. This is likely due to manual creation of test data labels, enabling the model to learn and capture balanced patterns. Models using lexicon resources (VADER and AFINN) had lower accuracy of 0.54 and 0.56. SentiWordNet had lowest accuracy among all models with 0.49, and Liu&Hu model had the lowest testing score of 0.26. Our research indicates that lexicon resources alone are not sufficient for sentiment data labeling as they are dependent on pre-defined dictionaries and may not fully capture the context of words within a sentence, thus, manual labeling is necessary to complement lexicon-based methods to achieve better result.

Downloads

Published

2023-02-03

How to Cite

[1]
M. Hayaty and A. H. Pratama, “Performance of Lexical Resource and Manual Labeling on Long Short-Term Memory Model for Text Classification”, J. Ilm. Tek. Elektro Komput. Dan Inform, vol. 9, no. 1, pp. 74–84, Feb. 2023.

Issue

Section

Articles

Similar Articles

1 2 > >> 

You may also start an advanced similarity search for this article.