Hate Speech Detection Using Convolutional Neural Network and Gated Recurrent Unit with FastText Feature Expansion on Twitter

Kevin Usmayadhy Wijaya; Erwin Budi Setiawan

doi:10.26555/jiteki.v9i3.26532

Hate Speech Detection Using Convolutional Neural Network and Gated Recurrent Unit with FastText Feature Expansion on Twitter

Authors

Kevin Usmayadhy Wijaya Telkom University http://orcid.org/0009-0008-5376-6524
Erwin Budi Setiawan Telkom University http://orcid.org/0000-0002-2121-8776

DOI:

https://doi.org/10.26555/jiteki.v9i3.26532

Keywords:

Hate speech, FastText, Feature Expansion, Hybrid deep learning, Convolutional neural network, Gated recurrent unit

Abstract

Twitter is a popular social media for sending text messages, but the tweets that can send are limited to 280 characters. Therefore, sending tweets is done in various ways, such as slang, abbreviations, or even reducing letters in words which can cause vocabulary mismatch so that the system considers words with the same meaning differently. Thus, using feature expansion to build a corpus of similarity can mitigate this problem. Two datasets constructed the similarity corpus: the Twitter dataset of 63,984 and the IndoNews dataset of 119,488. The research contribution is to combine deep learning and feature expansion with good performance. This study uses FastText as a feature expansion that focuses on word structure. Also, this study uses four deep learning methods: Convolutional Neural Network (CNN), Gated Recurrent Unit (GRU), and a combination of the two CNN-GRU, GRU-CNN classification with boolean representation as feature extraction. This study uses five scenarios to find the best result: best data split, n-grams, max feature, feature expansion, and dropout percentage. In the final model, CNN has the best performance with an accuracy of 88.79% and an increase of 0.97% from the baseline model, followed by GRU with an accuracy of 88.17% with an increase of 0.93%, CNN-GRU with an accuracy of 87.47% with an increase of 1.86%, and GRU-CNN with an accuracy of 87.55% with an increase of 1.32%. Based on the result of several scenarios, the use of feature expansion using FastText succeeded in avoiding vocabulary mismatch, proven by the highest increase in accuracy of the model than other scenarios. However, this study has a limitation is that the dataset is used in Indonesian.

Author Biographies

Kevin Usmayadhy Wijaya, Telkom University

Student, School of Computing

Erwin Budi Setiawan, Telkom University

Senior Lecturer, School of Computing

Downloads

Published

2023-07-17

How to Cite

[1]

K. U. Wijaya and E. B. Setiawan, “Hate Speech Detection Using Convolutional Neural Network and Gated Recurrent Unit with FastText Feature Expansion on Twitter”, J. Ilm. Tek. Elektro Komput. Dan Inform, vol. 9, no. 3, pp. 619–631, Jul. 2023.

Download Citation

Issue

Vol. 9 No. 3 (2023): September

Section

Articles

License

Authors who publish with JITEKI agree to the following terms:

Authors retain copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (CC BY-SA 4.0) that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.

This work is licensed under a Creative Commons Attribution 4.0 International License

About the Journal	Journal Policies	Author	Information
Focus and Scope Editorial Board Reviewer Open Access Policy Sponsorships Contact Us Google Scholar Most Cited Paper	Publication Ethics Peer Review Process Review Guideline Archiving Advertising	Author Guidelines Online Submission Publication Charge / Fee Plagiarism Policy Article Withdrawal	For Readers For Authors Journal History For Editor For Reviewer

Hate Speech Detection Using Convolutional Neural Network and Gated Recurrent Unit with FastText Feature Expansion on Twitter

Authors

DOI:

Keywords:

Abstract

Author Biographies

Kevin Usmayadhy Wijaya, Telkom University

Erwin Budi Setiawan, Telkom University

Downloads

Published

How to Cite

Issue

Section

License

Similar Articles

Most read articles by the same author(s)

special_links

journal_metrics

current_indexing

journal_template_2

Make a Submission

sinta_certificate

visitor_country

visitors

Information