Comparing Machine Learning and Human Judge in SATU Indonesia Awarding Processes
DOI:
https://doi.org/10.26555/jiteki.v7i3.22201Keywords:
Machine Learning, Random Forest, Orange Data MiningAbstract
For more than ten years, SATU Indonesia Awards, with PT. Astra International Tbk's support is given to inspiring young Indonesians. Every year, more than 10,000 nominations must be short-listed to 90 nominations within one week with five (5) assessment parameters. The research contributions are (1) creating a machine learning mechanism for the awarding process from ten years of the SATU Indonesia Awards nomination archive, (2) creating two (2) models of training data for the five (5) assessed parameters, namely motivation, obstacle, outcome, outreach, and sustainability, and (3) compare machine learning prediction with 2021 judge's assessment. TEMPO Data and Analysis Center (PDAT) extracts the corpus training data from ten years' SATU Indonesia Awards data in six months. The corpus training data contains nomination texts with Judges' scores on motivation, obstacle, outcome, outreach, and sustainability. Two (2) corpus training data and two models were generated with, namely, (1) the average Judges' parameter value per instance and (2) the Judges' smallest value and stored in two (2) corpus of 1220 instances each. The classification model was generated by Random Forest, which has the slightest error among the classification algorithms tested. The first model aims to predict the nomination assessment parameters. The second model is to detect the outlier in the incoming nominees for extraordinary nominees. The machine learning predictions were compared and found to be similar to the 2021 judge's assessment in the awarding processes at SATU Indonesia Awards. The average Judges' pre-final 2021 nominees' scores are compared to the Random Forest's predictions and found to be reasonably similar, with a small RMSE error around 1.1 to 1.6 for all assessment parameters. The smallest RMSE was obtained in the Sustainability parameter. The Obstacle parameter was found to have the largest RMSE.
References
M. Fathony, A. Khaq, and E. Endri, “The effect of corporate social responsibility and financial performance on stock returns,†Int. J. Innov. Creat. Chang., vol. 13, no. 1, 2020. https://www.ijicc.net/images/vol_13/13120_Fathony_2020_E_R.pdf
N. D. Hidayati, “Pattern of corporate social responsibility programs: A case study,†Soc. Responsib. J., vol. 7, no. 1, 2011. https://doi.org/10.1108/17471111111114576
Y. Liu, R. Huang, and J. Yu, “Towards award prediction based on big data co-author network,†2019. https://doi.org/10.1109/ICCCBDA.2019.8725612
J. Wu et al., “Product Design Award Prediction Modeling: Design Visual Aesthetic Quality Assessment via DCNNs,†IEEE Access, vol. 8, 2020. https://doi.org/10.1109/ACCESS.2020.3039715
M. Kokoç, G. Akçapınar, and M. N. Hasnine, “Unfolding Students’ Online Assignment Submission Behavioral Patterns Using Temporal Learning Analytics,†Educ. Technol. Soc., vol. 24, no. 1, 2021. https://eric.ed.gov/?id=EJ1292999
D. Tempelaar, B. Rienties, and Q. Nguyen, “The Contribution of Dispositional Learning Analytics to Precision Education,†Educ. Technol. Soc., vol. 24, no. 1, 2021. http://oro.open.ac.uk/74065/8/74065VOR.pdf
H. Luan and C. C. Tsai, “A Review of Using Machine Learning Approaches for Precision Education,†Educ. Technol. Soc., vol. 24, no. 1, pp. 250–266, 2021. https://eric.ed.gov/?id=EJ1292868
C. C. Y. Yang, I. Y. L. Chen, and H. Ogata, “Toward Precision Education: Educational Data Mining and Learning Analytics for Identifying Students’ Learning Patterns with Ebook Systems,†Educ. Technol. Soc., vol. 24, no. 1, 2021. https://eric.ed.gov/?id=EJ1292957
X. Chen, H. Xie, D. Zou, and G.-J. Hwang, “Application and theory gaps during the rise of Artificial Intelligence in Education,†Comput. Educ. Artif. Intell., vol. 1, p. 100002, 2020. https://doi.org/10.1016/j.caeai.2020.100002
J. Y. Wu, C. C. Y. Yang, C. H. Liao, and M. W. Nian, “Analytics 2.0 for Precision Education: An Integrative Theoretical Framework of the Human and Machine Symbiotic Learning,†Educ. Technol. Soc., vol. 24, no. 1, 2021. https://eric.ed.gov/?id=EJ1292867
Y. Lan, Y. Hao, K. Xia, B. Qian, and C. Li, “Stacked Residual Recurrent Neural Networks with Cross-Layer Attention for Text Classification,†IEEE Access, vol. 8, 2020. https://doi.org/10.1109/ACCESS.2020.2987101
J. Du, C. M. Vong, and C. L. Philip Chen, “Novel Efficient RNN and LSTM-Like Architectures: Recurrent and Gated Broad Learning Systems and Their Applications for Text Classification,†IEEE Trans. Cybern., vol. 51, no. 3, 2021. https://doi.org/10.1109/TCYB.2020.2969705
M. U. Salur and I. Aydin, “A Novel Hybrid Deep Learning Model for Sentiment Classification,†IEEE Access, vol. 8, 2020. https://doi.org/10.1109/ACCESS.2020.2982538
J. Zheng and L. Zheng, “A Hybrid Bidirectional Recurrent Convolutional Neural Network Attention-Based Model for Text Classification,†IEEE Access, vol. 7, 2019. https://doi.org/10.1109/ACCESS.2019.2932619
Y. S. Mehanna and M. Bin Mahmuddin, “A Semantic Conceptualization Using Tagged Bag-of-Concepts for Sentiment Analysis,†IEEE Access, vol. 9, 2021. https://doi.org/10.1109/ACCESS.2021.3107237
K. Liu and L. Chen, “Medical Social Media Text Classification Integrating Consumer Health Terminology,†IEEE Access, vol. 7, 2019. https://doi.org/10.1109/ACCESS.2019.2921938
H. Tang, Y. Mi, F. Xue, and Y. Cao, “An Integration Model Based on Graph Convolutional Network for Text Classification,†IEEE Access, vol. 8, 2020. https://doi.org/10.1109/ACCESS.2020.3015770
K. Fiok et al., “Text Guide: Improving the Quality of Long Text Classification by a Text Selection Method Based on Feature Importance,†IEEE Access, vol. 9, 2021. https://doi.org/10.1109/ACCESS.2021.3099758
C. N. Tulu, O. Ozkaya, and U. Orhan, “Automatic Short Answer Grading with SemSpace Sense Vectors and MaLSTM,†IEEE Access, vol. 9, 2021. https://doi.org/10.1109/ACCESS.2021.3054346
O. J. Ying, M. M. A. Zabidi, N. Ramli, and U. U. Sheikh, “Sentiment analysis of informal malay tweets with deep learning,†IAES Int. J. Artif. Intell., vol. 9, no. 2, 2020. https://doi.org/10.11591/ijai.v9.i2.pp212-220
A. Amalia, O. S. Sitompul, E. B. Nababan, M. S. Lydia, and N. Rahmatunnisa, “Bahasa Indonesia text corpus generation using web corpora approaches,†J. Theor. Appl. Inf. Technol., vol. 97, no. 24, 2019. http://www.jatit.org/volumes/Vol97No24/14Vol97No24.pdf
B. B. Kadaru, M. Umamaheswararao, and C. Science, “An Overview of General Data Mining Tools,†Int. Res. J. Eng. Technol., vol. 4, no. 9, 2017. https://www.irjet.net/archives/V4/i9/IRJET-V4I9165.pdf
A. Naik and L. Samant, “Correlation Review of Classification Algorithm Using Data Mining Tool: WEKA, Rapidminer, Tanagra, Orange and Knime,†in Procedia Computer Science, 2016, vol. 85. https://doi.org/10.1016/j.procs.2016.05.251
J. Demšar et al., “Orange: Data mining toolbox in python,†J. Mach. Learn. Res., vol. 14, 2013. https://jmlr.org/papers/volume14/demsar13a/demsar13a.pdf
D. N. Gujarati, Linear Regression: A Mathematical Introduction. 2020. https://doi.org/10.4135/9781071802571
A. Pant, “Introduction to Linear Regression and Polynomial Regression,†Towards Data Science, 2019.
M. Pal and P. Bharati, “Introduction to Correlation and Linear Regression Analysis,†in Applications of Regression Techniques, 2019. https://doi.org/10.1007/978-981-13-9314-3
J. Fox and S. Weisberg, An R Companion to Applied Regression, Third edition, Sage publications, 2019.
M. Kwak and S. B. Kim, “Unsupervised Abnormal Sensor Signal Detection with Channelwise Reconstruction Errors,†IEEE Access, vol. 9, 2021. https://doi.org/10.1109/ACCESS.2021.3064563
T. S. Buda, M. Khwaja, and A. Matic, “Outliers in Smartphone Sensor Data Reveal Outliers in Daily Happiness,†Proc. ACM Interactive, Mobile, Wearable Ubiquitous Technol., vol. 5, no. 1, 2021. https://doi.org/10.1145/3448095
H. O. Marques, R. J. G. B. Campello, J. Sander, and A. Zimek, “Internal Evaluation of Unsupervised Outlier Detection,†ACM Trans. Knowl. Discov. Data, vol. 14, no. 4, 2020. https://doi.org/10.1145/3394053
N. Reunanen, T. Räty, and T. Lintonen, “Automatic optimization of outlier detection ensembles using a limited number of outlier examples,†Int. J. Data Sci. Anal., vol. 10, no. 4, 2020. https://doi.org/10.1007/s41060-020-00222-4
H. Wang, M. J. Bah, and M. Hammad, “Progress in Outlier Detection Techniques: A Survey,†IEEE Access, vol. 7, 2019. https://doi.org/10.1109/ACCESS.2019.2932769
M. Kang and E. Choi, Machine Learning. WORLD SCIENTIFIC, 2021. https://doi.org/10.1142/12037
M. Kubat, An Introduction to Machine Learning. Cham: Springer International Publishing, 2021. https://doi.org/10.1007/978-3-030-81935-4
M. Nabipour, P. Nayyeri, H. Jabani, S. Shahab, and A. Mosavi, “Predicting Stock Market Trends Using Machine Learning and Deep Learning Algorithms Via Continuous and Binary Data; A Comparative Analysis,†IEEE Access, vol. 8, 2020. https://doi.org/10.1109/ACCESS.2020.3015966
H. Zhang, Z. Fu, and K. I. Shu, “Recognizing Ping-Pong Motions Using Inertial Data Based on Machine Learning Classification Algorithms,†IEEE Access, vol. 7, 2019. https://doi.org/10.1109/ACCESS.2019.2953772
Y. Nieto, V. Gacia-Diaz, C. Montenegro, C. C. Gonzalez, and R. Gonzalez Crespo, “Usage of Machine Learning for Strategic Decision Making at Higher Educational Institutions,†IEEE Access, vol. 7, 2019. https://doi.org/10.1109/ACCESS.2019.2919343
I. Kaur and A. Kaur, “A Novel Four-Way Approach Designed with Ensemble Feature Selection for Code Smell Detection,†IEEE Access, vol. 9, 2021. https://doi.org/10.1109/ACCESS.2021.3049823
C. Yin, Y. Zhu, J. Fei, and X. He, “A Deep Learning Approach for Intrusion Detection Using Recurrent Neural Networks,†IEEE Access, vol. 5, 2017. https://doi.org/10.1109/ACCESS.2017.2762418
J. Hartmann, J. Huppertz, C. Schamp, and M. Heitmann, “Comparing automated text classification methods,†Int. J. Res. Mark., vol. 36, no. 1, 2019. https://doi.org/10.1016/j.ijresmar.2018.09.009
A. Mamgain, “Guidance to Data Mining in Python,†Int. J. Sci. Res. Comput. Sci. Eng. Inf. Technol., vol. 3, no. 6, 2018. https://ijsrcseit.com/CSEIT1836128
D. Lilja, Linear Regression Using R: An Introduction to Data Modeling. 2016. https://doi.org/10.24926/8668/1301
Downloads
Published
Issue
Section
License
Authors who publish with JITEKI agree to the following terms:
- Authors retain copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (CC BY-SA 4.0) that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.
This work is licensed under a Creative Commons Attribution 4.0 International License