Comparison of machine learning for sentiment analysis in detecting anxiety based on social media data
Keywords:
Anxiety Detection, Machine Learning, Sentiment Analysis, Social Media Data, Text Feature ExtractionAbstract
All groups of people felt the impact of the COVID-19 pandemic. This situation triggers anxiety, which is bad for everyone. The government's role is very influential in solving these problems with its work program. It also has many pros and cons that cause public anxiety. For that, it is necessary to detect anxiety to improve government programs that can increase public expectations. This study applies machine learning to detecting anxiety based on social media comments regarding government programs to deal with this pandemic. This concept will adopt a sentiment analysis in detecting anxiety based on positive and negative comments from netizens. The machine learning methods implemented include K-NN, Bernoulli, Decision Tree Classifier, Support Vector Classifier, Random Forest, and XG-boost. The data sample used is the result of crawling YouTube comments. The data used amounted to 4862 comments consisting of negative and positive data with 3211 and 1651. Negative data identify anxiety, while positive data identifies hope (not anxious). Machine learning is processed based on feature extraction of count-vectorization and TF-IDF. The results showed that the sentiment data amounted to 3889 and 973 in testing, and training with the greatest accuracy was the random forest with feature extraction of vectorization count and TF-IDF of 84.99% and 82.63%, respectively. The best precision test is K-NN, while the best recall is XG-Boost. Thus, Random Forest is the best accurate to detect someone's anxiety based-on data from social media.References
World Health Organisation (WHO), “Novel Coronavirus(2019-nCoV) Situation Report-22, 11 February 2020,†2020. [Online]. Available: https://www.who.int/docs/default-source/coronaviruse/situation-reports/20200211-sitrep-22-ncov.pdf?sfvrsn=fb6d49b1_2.
World Health Organisation (WHO), "Novel Coronavirus (2019-nCoV), Situation Report-1, 21 January 2020," 2020. [Online]. Available: https://www.who.int/docs/default-source/coronaviruse/situation-reports/20200121-sitrep-1-2019-ncov.pdf?sfvrsn=20a99c10_4.
N. Zhu et al., "A Novel Coronavirus from Patients with Pneumonia in China, 2019," N. Engl. J. Med., vol. 382, no. 8, pp. 727–733, Feb. 2020, doi: 10.1056/NEJMoa2001017.
A. Asyary and M. Veruswati, "Sunlight exposure increased Covid-19 recovery rates: A study in the central pandemic area of Indonesia," Sci. Total Environ., vol. 729, p. 139016, Aug. 2020, doi: 10.1016/j.scitotenv.2020.139016.
S. Setiati and M. K. Azwar, “COVID-19 and Indonesia,†Acta Med. Indones., vol. 52, no. 1, pp. 84–89, 2020, [Online]. Available: https://www.scopus.com/record/display.uri?eid=2-s2.0-85083414691&origin=inward.
R. Tosepu et al., "Correlation between weather and Covid-19 pandemic in Jakarta, Indonesia," Sci. Total Environ., vol. 725, p. 138436, Jul. 2020, doi: 10.1016/j.scitotenv.2020.138436.
A. V. Somawati et al., "Bali vs COVID-19: Book Chapters," 2020. Available: Google Scholar.
F. N. Nanur, S. A. N. Halu, and E. Juita, “evaluasi ketersediaan fasilitas kesehatan yang memadai terhadap pencapaian revolusi kia di manggarai,†J. Kebidanan, vol. 12, no. 01, pp. 80–92, 2020. Available: Google Scholar.
A. F. Thaha, “Dampak covid-19 terhadap UMKM di Indonesia,†Brand J. Ilm. Manaj. Pemasar., vol. 2, no. 1, pp. 147–153, 2020. Available: Google Scholar.
D. E. Silalahi and R. R. Ginting, “Strategi Kebijakan Fiskal Pemerintah Indonesia Untuk Mengatur Penerimaan dan Pengeluaran Negara Dalam Menghadapi Pandemi Covid-19,†Jesya (Jurnal Ekon. Ekon. Syariah), vol. 3, no. 2, pp. 156–167, 2020. Available: Google Scholar.
T. A. Saputra, “Bentuk kecemasan dan resiliensi mahasiswa pascasarjana aceh-yogyakarta dalam menghadapi pandemi COVID-19,†J. Bimbing. DAN KONSELING AR-RAHMAN, vol. 6, no. 1, pp. 55–61, 2020. Available: Google Scholar.
A. R. Ahmad and H. R. Murad, "The Impact of Social Media on Panic During the COVID-19 Pandemic in Iraqi Kurdistan: Online Questionnaire Study," J. Med. Internet Res., vol. 22, no. 5, p. e19556, May 2020, doi: 10.2196/19556.
A. E. AladaÄŸ, S. Muderrisoglu, N. B. Akbas, O. Zahmacioglu, and H. O. Bingol, "Detecting Suicidal Ideation on Forums: Proof-of-Concept Study," J. Med. Internet Res., vol. 20, no. 6, p. e215, Jun. 2018, doi: 10.2196/jmir.9840.
J. R. Ragini, P. M. R. Anand, and V. Bhaskar, "Big data analytics for disaster response and recovery through sentiment analysis," Int. J. Inf. Manage., vol. 42, pp. 13–24, Oct. 2018, doi: 10.1016/j.ijinfomgt.2018.05.004.
S. Yadav, A. Ekbal, S. Saha, and P. Bhattacharyya, "Medical sentiment analysis using social media: towards building a patient assisted system," in Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), 2018. Available: Google Scholar.
V. Osadchiy, J. N. Mills, and S. V. Eleswarapu, "Understanding Patient Anxieties in the Social Media Era: Qualitative Analysis and Natural Language Processing of an Online Male Infertility Community," J. Med. Internet Res., vol. 22, no. 3, p. e16728, Mar. 2020, doi: 10.2196/16728.
F. A. B. Hamzah et al., "CoronaTracker: worldwide COVID-19 outbreak data analysis and prediction," Bull World Heal. Organ, vol. 1, p. 32, 2020. Available: Google Scholar.
M. Y. Ni et al., "Mental Health, Risk Factors, and Social Media Use During the COVID-19 Epidemic and Cordon Sanitaire Among the Community and Health Professionals in Wuhan, China: Cross-Sectional Survey," JMIR Ment. Heal., vol. 7, no. 5, p. e19009, May 2020, doi: 10.2196/19009.
F. Del Vigna, A. Cimino, F. Dell’Orletta, M. Petrocchi, and M. Tesconi, “Hate me, hate me not: Hate speech detection on Facebook,†in First Italian Conference on Cybersecurity (ITASEC17), 2017, vol. 1816, pp. 86–95, doi: 10.1051/matecconf/201712502035.
A. Almonayyes, "Multiple Explanations Driven Naive Bayes Classifier.," J. off Univers. Comput. Sci., vol. 12, no. 2, pp. 127–139, 2006. Available: Google Scholar.
J. Kléma and A. Almonayyes, "Automatic Categorization of Fanatic Text Using random Forests," Kuwait J. Sci. Eng., vol. 33, no. 2, pp. 1–18, 2006. Available: Google Scholar.
A. Almonayyes, "Classifying Documents By Integrating Contextual Knowledge With Boosting," in International Conference on Artificial Intelligence and Computer Science, 2016, no. November, pp. 28–29. Available: Google Scholar.
A. Almonayyes, "Tweets Classification Using Contextual Knowledge And Boosting," Int. J. Adv. Electron. Comput. Sci., no. 4, pp. 87–92, 2017. Available: Google Scholar.
N. D. Gitari, Z. Zuping, H. Damien, and J. Long, "A Lexicon-based Approach for Hate Speech Detection," Int. J. Multimed. Ubiquitous Eng., vol. 10, no. 4, pp. 215–230, 2015, doi: 10.14257/ijmue.2015.10.4.21.
N. Djuric, J. Zhou, R. Morris, M. Grbovic, V. Radosavljevic, and N. Bhamidipati, "Hate Speech Detection with Comment Embeddings," pp. 29–30, 2015. Available: Google Scholar.
W. Warner and J. Hirschberg, "Detecting Hate Speech on the World Wide Web," in Workshop on Language in Social Media (LSM 2012), 2012, no. Lsm, pp. 19–26. Available: Google Scholar.
A. Schmidt and M. Wiegand, "A Survey on Hate Speech Detection using Natural Language Processing," in Fifth International Workshop on Natural Language Processing for Social Media, 2017, no. 2012, pp. 1–10, doi: 10.18653/v1/W17-1101.
E. Calderón-Monge, "Twitter to Manage Emotions in Political Marketing," J. Promot. Manag., vol. 23, no. 3, pp. 359–371, 2017, doi: 10.1080/10496491.2017.1294870.
D. Chin, A. Zappone, and J. Zhao, "Analyzing Twitter Sentiment of the 2016 Presidential Candidates," Appl. Informatics Technol. Innov. Conf. (AITIC 2016), 2016. Available: Google Scholar.
B.-K. H. Vo and N. Collier, "Twitter Emotion Analysis in Earthquake Situations," Int. J. Comput. Linguist. Appl., vol. 4, no. 1, pp. 159–173, 2013. Available: Google Scholar.
Y. L. Chen, C. L. Chang, and C. S. Yeh, "Emotion Classification of YouTube Videos," Decis. Support Syst., vol. 101, pp. 40–50, 2017, doi: 10.1016/j.dss.2017.05.014.
S. Saifullah, "Fuzzy-AHP approach using Normalized Decision Matrix on Tourism Trend Ranking based-on Social Media," J. Inform., vol. 13, no. 2, pp. 16–23, Jul. 2019, doi: 10.26555/jifo.v13i2.a15268.
I. Rabbimov, I. Mporas, V. Simaki, and S. Kobilov, "Investigating the Effect of Emoji in Opinion Classification of Uzbek Movie Review Comments," arXiv Prepr. arXiv2008.00482, 2020. Available: arXiv.org.
Y. Fauziah, S. Saifullah, and A. S. Aribowo, "Design Text Mining for Anxiety Detection using Machine Learning based-on Social Media Data during COVID-19 pandemic," in Proceeding of LPPM UPN "Veteran" Yogyakarta Conference Series 2020–Engineering and Science Series, 2020, vol. 1, no. 1, pp. 253–261, doi: 10.31098/ess.v1i1.117.
S. Saifullah, Y. Fauziah, and A. S. Aribowo, "Comparison of Machine Learning for Sentiment Analysis in Detecting Anxiety Based on Social Media Data," Jan. 2021, [Online]. Available: arXiv.org.
R. Bhati, "Sentiment analysis a deep survey on methods and approaches," 2020. Available: Google Scholar.
M. Giannakis, R. Dubey, S. Yan, K. Spanaki, and T. Papadopoulos, "Social media and sensemaking patterns in new product development: demystifying the customer sentiment," Ann. Oper. Res., Aug. 2020, doi: 10.1007/s10479-020-03775-6.
M. Rezwanul, A. Ali, and A. Rahman, "Sentiment Analysis on Twitter Data using KNN and SVM," Int. J. Adv. Comput. Sci. Appl., vol. 8, no. 6, 2017, doi: 10.14569/IJACSA.2017.080603.
C. Manning, P. Raghavan, and H. Schütze, An Introduction to Information Retrieval. Cambridge: Cambridge University Press., 2009. Available: https://nlp.stanford.edu/IR-book/pdf/irbookprint.pdf.
N. S. Wardani, A. Prahutama, and P. Kartikasari, “Analisis sentimen pemindahan ibu kota negara dengan klasifikasi naïve bayes untuk model bernoulli dan multinomial,†J. Gaussian, vol. 9, no. 3, pp. 237–246, Aug. 2020, doi: 10.14710/j.gauss.v9i3.27963.
A. McCallum and K. Nigam, "A comparison of event models for naive bayes text classification," AAAI-98 Work. Learn. text Categ., pp. 41–48, 1998. Available: Google Scholar.
I. Sutoyo, “Implementasi Algoritma Decision Tree untuk Klasifikasi Data Peserta Didik,†Pilar Nusa Mandiri J. Comput. Inf. Syst., vol. 14, no. 2, pp. 217–224, 2018, doi: 10.33480/pilar.v14i2.70.
B. W. Sari and F. F. Haranto, “Implementasi support vector machine untuk analisis sentimen pengguna twitter terhadap pelayanan telkom dan biznet,†J. Pilar Nusa Mandiri, vol. 15, no. 2, pp. 171–176, Sep. 2019, doi: 10.33480/pilar.v15i2.699.
K. S. Srujan, S. S. Nikhil, H. Raghav Rao, K. Karthik, B. S. Harish, and H. M. Keerthi Kumar, "Classification of Amazon Book Reviews Based on Sentiment Analysis," pp. 401–411, 2018, doi: 10.1007/978-981-10-7512-4_40.
S. Kumar, M. Yadava, and P. P. Roy, "Fusion of EEG response and sentiment analysis of products review to predict customer satisfaction," Inf. Fusion, vol. 52, pp. 41–52, Dec. 2019, doi: 10.1016/j.inffus.2018.11.001.
B. Gokulakrishnan, P. Priyanthan, T. Ragavan, N. Prasath, and As. Perera, "Opinion mining and sentiment analysis on a Twitter data stream," in International Conference on Advances in ICT for Emerging Regions (ICTer2012), Dec. 2012, pp. 182–188, doi: 10.1109/ICTer.2012.6423033.
J. Jayalekshmi and T. Mathew, "Facial expression recognition and emotion classification system for sentiment analysis," in 2017 International Conference on Networks & Advances in Computational Technologies (NetACT), Jul. 2017, pp. 1–8, doi: 10.1109/NETACT.2017.8076732.
S. Georganos, T. Grippa, S. Vanhuysse, M. Lennert, M. Shimoni, and E. Wolff, "Very High Resolution Object-Based Land Use–Land Cover Urban Classification Using Extreme Gradient Boosting," IEEE Geosci. Remote Sens. Lett., vol. 15, no. 4, pp. 607–611, Apr. 2018, doi: 10.1109/LGRS.2018.2803259.
Y. Zhang and A. Haghani, "A gradient boosting method to improve travel time prediction," in Transportation Research Part C: Emerging Technologies, 2015, vol. 58, pp. 308–324, doi: 10.1016/j.trc.2015.02.019.
Downloads
Published
Issue
Section
License
Authors who publish with Jurnal Informatika (JIFO) agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (CC BY-SA 4.0) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.