Hate Speech Detection Using Convolutional Neural Network and Gated Recurrent Unit with FastText Feature Expansion on Twitter

Kevin Usmayadhy Wijaya, Erwin Budi Setiawan

Abstract


Twitter is a popular social media for sending text messages, but the tweets that can send are limited to 280 characters. Therefore, sending tweets is done in various ways, such as slang, abbreviations, or even reducing letters in words which can cause vocabulary mismatch so that the system considers words with the same meaning differently. Thus, using feature expansion to build a corpus of similarity can mitigate this problem. Two datasets constructed the similarity corpus: the Twitter dataset of 63,984 and the IndoNews dataset of 119,488. The research contribution is to combine deep learning and feature expansion with good performance. This study uses FastText as a feature expansion that focuses on word structure. Also, this study uses four deep learning methods: Convolutional Neural Network (CNN), Gated Recurrent Unit (GRU), and a combination of the two CNN-GRU, GRU-CNN classification with boolean representation as feature extraction. This study uses five scenarios to find the best result: best data split, n-grams, max feature, feature expansion, and dropout percentage. In the final model, CNN has the best performance with an accuracy of 88.79% and an increase of 0.97% from the baseline model, followed by GRU with an accuracy of 88.17% with an increase of 0.93%, CNN-GRU with an accuracy of 87.47% with an increase of 1.86%, and GRU-CNN with an accuracy of 87.55% with an increase of 1.32%. Based on the result of several scenarios, the use of feature expansion using FastText succeeded in avoiding vocabulary mismatch, proven by the highest increase in accuracy of the model than other scenarios. However, this study has a limitation is that the dataset is used in Indonesian.

Keywords


Hate speech; FastText; Feature Expansion; Hybrid deep learning; Convolutional neural network; Gated recurrent unit

Full Text:

PDF


DOI: http://dx.doi.org/10.26555/jiteki.v9i3.26532

Refbacks

  • There are currently no refbacks.


Copyright (c) 2023 Kevin Usmayadhy Wijaya, Erwin Budi Setiawan

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.


 
About the JournalJournal PoliciesAuthor Information
 


Jurnal Ilmiah Teknik Elektro Komputer dan Informatika
ISSN 2338-3070 (print) | 2338-3062 (online)
Organized by Electrical Engineering Department - Universitas Ahmad Dahlan
Published by Universitas Ahmad Dahlan
Website: http://journal.uad.ac.id/index.php/jiteki
Email 1: jiteki@ee.uad.ac.id
Email 2: alfianmaarif@ee.uad.ac.id
Office Address: Kantor Program Studi Teknik Elektro, Lantai 6 Sayap Barat, Kampus 4 UAD, Jl. Ringroad Selatan, Tamanan, Kec. Banguntapan, Bantul, Daerah Istimewa Yogyakarta 55191, Indonesia