Performance of Lexical Resource and Manual Labeling on Long Short-Term Memory Model for Text Classification

Mardhiya Hayaty; Aqsal Harris Pratama

doi:10.26555/jiteki.v9i1.25375

Performance of Lexical Resource and Manual Labeling on Long Short-Term Memory Model for Text Classification

Authors

Mardhiya Hayaty Universitas Amikom Yogyakarta http://orcid.org/0000-0002-6251-9989
Aqsal Harris Pratama Universitas Amikom Yogyakarta

DOI:

https://doi.org/10.26555/jiteki.v9i1.25375

Keywords:

Lexicon resource, Data Labeling, Long Short- Term Memory

Abstract

Data labeling is a critical aspect of sentiment analysis that requires assigning labels to text data to reflect the sentiment expressed. Traditional methods of data labeling involve manual annotation by human annotators, which can be both time-consuming and costly when handling large volumes of text data. Automation of the data labeling process can be achieved through the utilization of lexicon resources, which consist of pre-labeled dictionaries or databases of words and phrases in sentiment information. The contribution of this study is an evaluation of the performance of lexicon resources in document labeling. The evaluation aims to provide insight into the accuracy of using lexicon resources and inform future research. In this study, a publicly available dataset was utilized and labeled as negative, neutral, and positive. To generate new labels, a lexicon resource such as VADER, AFINN, SentiWordNet, and Liu & Hu was employed. An LSTM model was then trained using the newly generated labels. The performance of the trained model was evaluated by testing it on data that had been manually labeled. The study found manual labeling led to highest accuracy of 0.79, 0.80, and 0.80 for training, validation, and testing respectively. This is likely due to manual creation of test data labels, enabling the model to learn and capture balanced patterns. Models using lexicon resources (VADER and AFINN) had lower accuracy of 0.54 and 0.56. SentiWordNet had lowest accuracy among all models with 0.49, and Liu&Hu model had the lowest testing score of 0.26. Our research indicates that lexicon resources alone are not sufficient for sentiment data labeling as they are dependent on pre-defined dictionaries and may not fully capture the context of words within a sentence, thus, manual labeling is necessary to complement lexicon-based methods to achieve better result.

Downloads

Published

2023-02-03

How to Cite

[1]

M. Hayaty and A. H. Pratama, “Performance of Lexical Resource and Manual Labeling on Long Short-Term Memory Model for Text Classification”, J. Ilm. Tek. Elektro Komput. Dan Inform, vol. 9, no. 1, pp. 74–84, Feb. 2023.

Download Citation

Issue

Vol. 9 No. 1 (2023): March

Section

Articles

License

Authors who publish with JITEKI agree to the following terms:

Authors retain copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (CC BY-SA 4.0) that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.

This work is licensed under a Creative Commons Attribution 4.0 International License

About the Journal	Journal Policies	Author	Information
Focus and Scope Editorial Board Reviewer Open Access Policy Sponsorships Contact Us Google Scholar Most Cited Paper	Publication Ethics Peer Review Process Review Guideline Archiving Advertising	Author Guidelines Online Submission Publication Charge / Fee Plagiarism Policy Article Withdrawal	For Readers For Authors Journal History For Editor For Reviewer

Performance of Lexical Resource and Manual Labeling on Long Short-Term Memory Model for Text Classification

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

License

Similar Articles

special_links

journal_metrics

current_indexing

journal_template_2

Make a Submission

sinta_certificate

visitor_country

visitors

Information