Hate Speech Detection Using Convolutional Neural Network and Gated Recurrent Unit with FastText Feature Expansion on Twitter

Authors

DOI:

https://doi.org/10.26555/jiteki.v9i3.26532

Keywords:

Hate speech, FastText, Feature Expansion, Hybrid deep learning, Convolutional neural network, Gated recurrent unit

Abstract

Twitter is a popular social media for sending text messages, but the tweets that can send are limited to 280 characters. Therefore, sending tweets is done in various ways, such as slang, abbreviations, or even reducing letters in words which can cause vocabulary mismatch so that the system considers words with the same meaning differently. Thus, using feature expansion to build a corpus of similarity can mitigate this problem. Two datasets constructed the similarity corpus: the Twitter dataset of 63,984 and the IndoNews dataset of 119,488. The research contribution is to combine deep learning and feature expansion with good performance. This study uses FastText as a feature expansion that focuses on word structure. Also, this study uses four deep learning methods: Convolutional Neural Network (CNN), Gated Recurrent Unit (GRU), and a combination of the two CNN-GRU, GRU-CNN classification with boolean representation as feature extraction. This study uses five scenarios to find the best result: best data split, n-grams, max feature, feature expansion, and dropout percentage. In the final model, CNN has the best performance with an accuracy of 88.79% and an increase of 0.97% from the baseline model, followed by GRU with an accuracy of 88.17% with an increase of 0.93%, CNN-GRU with an accuracy of 87.47% with an increase of 1.86%, and GRU-CNN with an accuracy of 87.55% with an increase of 1.32%. Based on the result of several scenarios, the use of feature expansion using FastText succeeded in avoiding vocabulary mismatch, proven by the highest increase in accuracy of the model than other scenarios. However, this study has a limitation is that the dataset is used in Indonesian.

Author Biographies

Kevin Usmayadhy Wijaya, Telkom University

Student, School of Computing

Erwin Budi Setiawan, Telkom University

Senior Lecturer, School of Computing

Downloads

Published

2023-07-17

How to Cite

[1]
K. U. Wijaya and E. B. Setiawan, “Hate Speech Detection Using Convolutional Neural Network and Gated Recurrent Unit with FastText Feature Expansion on Twitter”, J. Ilm. Tek. Elektro Komput. Dan Inform, vol. 9, no. 3, pp. 619–631, Jul. 2023.

Issue

Section

Articles

Most read articles by the same author(s)