Impact of Cosine Similarity Function on SVM Algorithm for Public Opinion Mining About National Sports Week 2024 on X

Authors

  • Abil Mansyur Universitas Negeri Medan
  • Ichwanul Muslim Karo Karo Universitas Negeri Medan
  • Muliawan Firdaus Universitas Negeri Medan
  • Elmanani Simamora Universitas Negeri Medan
  • Muhammad Badzlan Darari Universitas Negeri Medan
  • Rizki Habibi Universitas Negeri Medan
  • Suvriadi Panggabean Universitas Negeri Medan

DOI:

https://doi.org/10.26555/jiteki.v11i2.30605

Keywords:

PON, Opinion mining, SVM, Cosine similarity, Kernel function, Performance

Abstract

Public opinion on PON 2024 (National Sports Week in Indonesia) became a trending topic on X (formerly Twitter), reflecting both positive and negative sentiments. Understanding these sentiments is important for evaluating the event and preparing for the upcoming. However, baseline SVM algorithms using standard kernel functions are not optimized for text similarity and limit performance in sentiment analysis. This research proposes cosine similarity as a substitution for the kernel function on SVM, enhancing the sentiment analyzer's performance on public opinions about PON 2024. The approach leverages cosine similarity's strength in handling text-based data. The key contribution of this research is the integration of cosine similarity into the SVM algorithm as a replacement for kernel functions, improving performance in sentiment analysis. Additionally, this study offers a comprehensive comparison with baseline SVM and provides actionable insights for upcoming PON. The study collected 1,011 tweets related to PON 2024 using web scraping and the Twitter API, followed by labeling sentiments as positive, neutral, or negative. Several preprocessing techniques also were applied to prepare the data. Two models were developed: baseline SVM and another using SVM integrated with cosine similarity, both evaluated through accuracy, precision, recall, and F1-score. The baseline SVM achieved 85.1% accuracy, 85% precision, 83% recall, and 83.3% F1-score, struggling particularly with negative sentiment. Opposite, by integrating cosine similarity on SVM, the performance improved to 88.73% accuracy, 88.3% precision, 89.3% recall, and 88.3% F1-score—a boost of 3.3-6.3%. Additionally, the public opinion revealed that positive sentiments mostly focused on athlete achievements and medal awards, while negative sentiments highlighted issues like referee performance and specific sports (e.g., football). This approach can serve as a valuable tool for event organizers to identify public concerns and maintain positive aspects for the upcoming PON 2028.

References

[1] K. Kogoya, T. S. Guntoro, and M. F. P. Putra, “Sports Event Image, Satisfaction, Motivation, Stadium Atmosphere, Environment, and Perception: A Study on the Biggest Multi-Sport Event in Indonesia during the Pandemic,” Soc Sci, vol. 11, no. 6, 2022, https://doi.org/10.3390/socsci11060241.

[2] N. T. Mhaske and A. S. Patil, “Resource creation for opinion mining: a case study with Marathi movie reviews,” International Journal of Information Technology (Singapore), vol. 13, no. 4, 2021, https://doi.org/10.1007/s41870-021-00698-8.

[3] P. Kiran Kumar, N. Jahna Tejaswi, M. L. Vasanthi, L. L. Srihitha, and B. Phanindra Kumar, “Sentimental Analysis on Multi-domain Sentiment Dataset Using SVM and Naive Bayes Algorithm,” in Communications in Computer and Information Science, pp. 201-213, 2022. https://doi.org/10.1007/978-3-030-95502-1_16.

[4] W. Chansanam and K. Tuamsuk, “Thai twitter sentiment analysis: Performance monitoring of politics in Thailand using text mining techniques,” International Journal of Innovation, Creativity and Change, vol. 11, no. 12, pp. 436-452, 2020, https://www.ijicc.net/images/vol11iss12/111227_Chansanam_2020_E_R.pdf.

[5] H. Hettiarachchi, D. Al-Turkey, M. Adedoyin-Olowe, J. Bhogal, and M. M. Gaber, “TED-S: Twitter Event Data in Sports and Politics with Aggregated Sentiments,” Data (Basel), vol. 7, no. 7, 2022, https://doi.org/10.3390/data7070090.

[6] N. Pombo, M. Rodrigues, Z. Babic, M. Punceva, and N. Garcia, “Computerised Sentiment Analysis on Social Networks. Two Case Studies: FIFA World Cup 2018 and Cristiano Ronaldo Joining Juventus,” in Advances in Intelligent Systems and Computing, vol. 29, pp. 126-140, 2021. https://doi.org/10.1007/978-3-030-72651-5_13.

[7] M. Faisal, Z. Abouelhassan, F. Alotaibi, R. Alsaeedi, F. Alazmi, and S. Alkanadari, “Sentiment Analysis Using Machine Learning Model for Qatar World Cup 2022 among Different Arabic Countries Using Twitter API,” in IEEE World AI IoT Congress, pp. 0222-0228, 2023. https://doi.org/10.1109/AIIoT58121.2023.10188463.

[8] S. S. Arnob, M. A. A. Shikder, T. A. Ovey, E. R. Rhythm, and A. A. Rasel, “Analyzing Public Sentiment on Social Media during FIFA World Cup 2022 using Deep Learning and Explainable AI,” in 26th International Conference on Computer and Information Technology, ICCIT, pp. 1-6, 2023. https://doi.org/10.1109/ICCIT60459.2023.10441156.

[9] R. KORKUSUZ and A. CARUS, “Futbol Müsabakaları ile İlgili Tweetlerin Anlık Duygu Analizi,” European Journal of Science and Technology, pp. 386-396, 2020, https://doi.org/10.31590/ejosat.821200.

[10] K. Jia, Y. Zhu, Y. Zhang, F. Liu, and J. Qi, “International public opinion analysis of four olympic games: From 2008 to 2022,” Journal of Safety Science and Resilience, vol. 3, no. 3, 2022, https://doi.org/10.1016/j.jnlssr.2022.03.002.

[11] A. P. Gopi, R. N. S. Jyothi, V. L. Narayana, and K. S. Sandeep, “Classification of tweets data based on polarity using improved RBF kernel of SVM,” International Journal of Information Technology (Singapore), vol. 15, no. 2, 2023, https://doi.org/10.1007/s41870-019-00409-4.

[12] I. M. Karo Karo, M. F. M. Fudzee, S. Kasim, and A. A. Ramli, “Sentiment Analysis in Karonese Tweet using Machine Learning,” Indonesian Journal of Electrical Engineering and Informatics, vol. 10, no. 1, pp. 219–231, Mar. 2022, https://doi.org/10.52549/ijeei.v10i1.3565.

[13] I. M. K. Karo, M. F. M. Fudzee, S. Kasim, and A. A. Ramli, “Karonese Sentiment Analysis: A New Dataset and Preliminary Result,” International Journal on Informatics Visualization, vol. 6, no. 2–2, 2022, https://doi.org/10.30630/joiv.6.2-2.1119.

[14] A. H. Ali and M. Z. Abdullah, “An efficient model for data classification based on SVM grid parameter optimization and PSO feature weight selection,” International Journal of Integrated Engineering, vol. 12, no. 1, 2020, https://doi.org/10.30880/ijie.2020.12.01.001.

[15] A. P. Gopi, R. N. S. Jyothi, V. L. Narayana, and K. S. Sandeep, “Classification of tweets data based on polarity using improved RBF kernel of SVM,” International Journal of Information Technology (Singapore), vol. 15, no. 2, 2023, https://doi.org/10.1007/s41870-019-00409-4.

[16] G. S. Lumacad and R. A. Namoco, “Multilayer Perceptron Neural Network Approach to Classifying Learning Modalities Under the New Normal,” IEEE Trans Comput Soc Syst, vol. 11, no. 1, 2024, https://doi.org/10.1109/TCSS.2023.3251566.

[17] R. Sundar and M. Punniyamoorthy, “Performance enhanced Boosted SVM for Imbalanced datasets,” Applied Soft Computing Journal, vol. 83, 2019, https://doi.org/10.1016/j.asoc.2019.105601.

[18] L. Muflikhah, W. Widodo, W. F. Mahmudy, and S. Solimun, “A support vector machine based on kernel k-means for detecting the liver cancer disease,” International Journal of Intelligent Engineering and Systems, vol. 13, no. 3, 2020, https://doi.org/10.22266/IJIES2020.0630.27.

[19] T. Y. Kim, H. Ko, S. H. Kim, and H. Da Kim, “Modeling of recommendation system based on emotional information and collaborative filtering,” Sensors, vol. 21, no. 6, 2021, https://doi.org/10.3390/s21061997.

[20] H. Hasanli and S. Rustamov, “Sentiment Analysis of Azerbaijani twits Using Logistic Regression, Naive Bayes and SVM,” in 13th IEEE International Conference on Application of Information and Communication Technologies, AICT 2019 - Proceedings, 2019. https://doi.org/10.1109/AICT47866.2019.8981793.

[21] S. Fahmi, L. Purnamawati, G. F. Shidik, M. Muljono, and A. Z. Fanani, “Sentiment analysis of student review in learning management system based on sastrawi stemmer and SVM-PSO,” in Proceedings International Seminar on Application for Technology of Information and Communication: IT Challenges for Sustainability, Scalability, and Security in the Age of Digital Disruption, iSemantic, pp. 643-648, 2020. https://doi.org/10.1109/iSemantic50169.2020.9234291.

[22] K. Meena, R. Lawrance, S. Suresh, and M. Ahmad Khder, “Evaluation of descriptive type answer using transformed weight and Cosine-SVM,” J Stat Appl Probab, vol. 11, no. 2, 2022, https://doi.org/10.18576/jsap/110206.

[23] Z. Rajabi, M. R. Valavi, and M. Hourali, “A Context-Based Disambiguation Model for Sentiment Concepts Using a Bag-of-Concepts Approach,” Cognit Comput, vol. 12, no. 6, 2020, https://doi.org/10.1007/s12559-020-09729-1.

[24] Y. Li, J. Lou, X. Tan, Y. Xu, J. Zhang, and Z. Jing, “Adaptive Kernel Learning Kalman Filtering With Application to Model-Free Maneuvering Target Tracking,” IEEE Access, vol. 10, 2022, https://doi.org/10.1109/ACCESS.2022.3193101.

[25] M. Y. Saeed et al., “An abstractive summarization technique with variable length keywords as per document diversity,” Computers, Materials and Continua, vol. 66, no. 3, 2021, https://doi.org/10.32604/cmc.2021.014330.

[26] M. A. Khder, “Web scraping or web crawling: State of art, techniques, approaches and application,” International Journal of Advances in Soft Computing and its Applications, vol. 13, no. 3, 2021, https://doi.org/10.15849/ijasca.211128.11.

[27] N. S. Mullah and W. M. N. W. Zainon, “Improving detection accuracy of politically motivated cyber-hate using heterogeneous stacked ensemble (HSE) approach,” J Ambient Intell Humaniz Comput, vol. 14, no. 9, 2023, https://doi.org/10.1007/s12652-022-03763-7.

[28] M. Işik and H. Dağ, “The impact of text preprocessing on the prediction of review ratings,” Turkish Journal of Electrical Engineering and Computer Sciences, vol. 28, no. 3, pp. 1405-1421. 2020, https://doi.org/10.3906/elk-1907-46.

[29] J. H. Lee, M. Lee, and K. Min, “Natural Language Processing Techniques for Advancing Materials Discovery: A Short Review,” International Journal of Precision Engineering and Manufacturing-Green Technology, vol. 10, no. 5, pp. 1337-1349. 2023. https://doi.org/10.1007/s40684-023-00523-6.

[30] A. G. L. Babu and S. Badugu, “Extractive Summarization of Telugu Text Using Modified Text Rank and Maximum Marginal Relevance,” ACM Transactions on Asian and Low-Resource Language Information Processing, vol. 22, no. 9, 2023, https://doi.org/10.1145/3600224.

[31] M. Mujahid, K. Kanwal, F. Rustam, W. Aljadani, and I. Ashraf, “Arabic ChatGPT Tweets Classification Using RoBERTa and BERT Ensemble Model,” ACM Transactions on Asian and Low-Resource Language Information Processing, vol. 22, no. 8, 2023, https://doi.org/10.1145/3605889.

[32] E. Y. Boateng, J. Otoo, and D. A. Abaye, “Basic Tenets of Classification Algorithms K-Nearest-Neighbor, Support Vector Machine, Random Forest and Neural Network: A Review,” Journal of Data Analysis and Information Processing, vol. 08, no. 4, 2020, https://doi.org/10.4236/jdaip.2020.84020.

[33] M. U. Nasir, S. Khan, S. Mehmood, M. A. Khan, M. Zubair, and S. O. Hwang, “Network Meddling Detection Using Machine Learning Empowered with Blockchain Technology,” Sensors, vol. 22, no. 18, 2022, https://doi.org/10.3390/s22186755.

[34] A. Kammoun and M. S. Alouinifellow, “On the Precise Error Analysis of Support Vector Machines,” IEEE Open Journal of Signal Processing, vol. 2, 2021, https://doi.org/10.1109/OJSP.2021.3051849.

[35] D. Valero-Carreras, J. Alcaraz, and M. Landete, “Comparing two SVM models through different metrics based on the confusion matrix,” Comput Oper Res, vol. 152, 2023, https://doi.org/10.1016/j.cor.2022.106131.

[36] R. Kashef, “A boosted SVM classifier trained by incremental learning and decremental unlearning approach,” Expert Syst Appl, vol. 167, 2021, https://doi.org/10.1016/j.eswa.2020.114154.

[37] F. Budiman and E. Sugiarto, “Non-linear Multiclass SVM Classification Optimization using Large Datasets of Geometric Motif Image,” International Journal of Advanced Computer Science and Applications, vol. 12, no. 9, 2021, https://doi.org/10.14569/IJACSA.2021.0120932.

[38] M. S. Reza, U. Hafsha, R. Amin, R. Yasmin, and S. Ruhi, “Improving SVM performance for type II diabetes prediction with an improved non-linear kernel: Insights from the PIMA dataset,” Computer Methods and Programs in Biomedicine Update, vol. 4, 2023, https://doi.org/10.1016/j.cmpbup.2023.100118.

[39] N. Mohd Hatta, Z. Ali Shah, and S. Kasim, “Evaluate the Performance of SVM Kernel Functions for Multiclass Cancer Classification,” International Journal of Data Science, vol. 1, no. 1, 2020, https://doi.org/10.18517/ijods.1.1.37-41.2020.

[40] C. B. Pande et al., “Comparative Assessment of Improved SVM Method under Different Kernel Functions for Predicting Multi-scale Drought Index,” Water Resources Management, vol. 37, no. 3, 2023, https://doi.org/10.1007/s11269-023-03440-0.

[41] P. H. Prastyo, I. Ardiyanto, and R. Hidayat, “Indonesian Sentiment Analysis: An Experimental Study of Four Kernel Functions on SVM Algorithm with TF-IDF,” in International Conference on Data Analytics for Business and Industry: Way Towards a Sustainable Economy, ICDABI, pp. 1-6, 2020. https://doi.org/10.1109/ICDABI51230.2020.9325685.

[42] Z. Gao, R. Xia, and P. Zhang, “Prediction of Anti-proliferation Effect of [1,2,3]Triazolo[4,5-d]pyrimidine Derivatives by Random Forest and Mix-Kernel Function SVM with PSO,” Chem Pharm Bull (Tokyo), vol. 70, no. 10, 2022, https://doi.org/10.1248/cpb.c22-00376.

[43] S. Abuhaimed, M. Al-Jasir, H. Al-Juaid, and A. Alhameidi, “Supervised Learning-Based Indoor Positioning System Using WiFi Fingerprints,” in Lecture Notes in Networks and Systems, pp. 56-71 2023. https://doi.org/10.1007/978-3-031-33743-7_52.

[44] Y. Yuan, “Enhanced EDAS technique for colleges business English teaching quality evaluation based on Euclid distance and cosine similarity measure,” Journal of Intelligent and Fuzzy Systems, vol. 46, no. 1, 2024, https://doi.org/10.3233/JIFS-233786.

[45] H. Öztürk, E. Ozkirimli, and A. Özgür, “A comparative study of SMILES-based compound similarity functions for drug-target interaction prediction,” BMC Bioinformatics, vol. 17, no. 1, 2016, https://doi.org/10.1186/s12859-016-0977-x.

[46] H. Froud, A. Lachkar, and S. A. Ouatik, “Arabic Text Summarization Based on Latent Semantic Analysis to Enhance Arabic Documents Clustering,” International Journal of Data Mining & Knowledge Management Process, vol. 3, no. 1, 2013, https://doi.org/10.5121/ijdkp.2013.3107.

[47] M. Wibowo, C. Quix, N. S. Hussien, H. Yuliansyah, and F. D. Adhinata, “Similarity Identification of Large-scale Biomedical Documents using Cosine Similarity and Parallel Computing,” Knowledge Engineering and Data Science, vol. 4, no. 2, 2022, https://doi.org/10.17977/um018v4i22021p105-116.

[48] I. M. K. Karo, R. Ramdhani, A. W. Ramadhelza, and B. Z. Aufa, “A Hybrid Classification Based on Machine Learning Classifiers to Predict Smart Indonesia Program,” in Proceeding - 3rd International Conference on Vocational Education and Electrical Engineering: Strengthening the framework of Society 5.0 through Innovations in Education, Electrical, Engineering and Informatics Engineering, ICVEE, pp. 1-5, 2020. https://doi.org/10.1109/ICVEE50212.2020.9243195.

[49] N. E. Ramli, Z. R. Yahya, and N. A. Said, “Confusion Matrix as Performance Measure for Corner Detectors,” Journal of Advanced Research in Applied Sciences and Engineering Technology, vol. 29, no. 1, 2022, https://doi.org/10.37934/araset.29.1.256265.

[50] A. Luque, A. Carrasco, A. Martín, and A. de las Heras, “The impact of class imbalance in classification performance metrics based on the binary confusion matrix,” Pattern Recognit, vol. 91, 2019, https://doi.org/10.1016/j.patcog.2019.02.023.

[51] O. V. Putra, F. M. Wasmanson, T. Harmini, and S. N. Utama, “Sundanese Twitter Dataset for Emotion Classification,” in CENIM Proceeding: International Conference on Computer Engineering, Network, and Intelligent Multimedia, pp. 391-395, 2020. https://doi.org/10.1109/CENIM51130.2020.9297929.

Downloads

Published

2025-05-10

How to Cite

[1]
A. Mansyur, “Impact of Cosine Similarity Function on SVM Algorithm for Public Opinion Mining About National Sports Week 2024 on X”, J. Ilm. Tek. Elektro Komput. Dan Inform, vol. 11, no. 2, pp. 263–275, May 2025.

Issue

Section

Articles

Similar Articles

1 2 3 4 5 > >> 

You may also start an advanced similarity search for this article.