Comparing Machine Learning and Human Judge in SATU Indonesia Awarding Processes

Received November 09, 2021 Revised November 25, 2021 Accepted December 23, 2021 For more than ten years, SATU Indonesia Awards, with PT. Astra International Tbk's support is given to inspiring young Indonesians. Every year, more than 10,000 nominations must be short-listed to 90 nominations within one week with five (5) assessment parameters. The research contributions are (1) creating a machine learning mechanism for the awarding process from ten years of the SATU Indonesia Awards nomination archive, (2) creating two (2) models of training data for the five (5) assessed parameters, namely motivation, obstacle, outcome, outreach, and sustainability, and (3) compare machine learning prediction with 2021 judge's assessment. TEMPO Data and Analysis Center (PDAT) extracts the corpus training data from ten years' SATU Indonesia Awards data in six months. The corpus training data contains nomination texts with Judges' scores on motivation, obstacle, outcome, outreach, and sustainability. Two (2) corpus training data and two models were generated with, namely, (1) the average Judges' parameter value per instance and (2) the Judges' smallest value and stored in two (2) corpus of 1220 instances each. The classification model was generated by Random Forest, which has the slightest error among the classification algorithms tested. The first model aims to predict the nomination assessment parameters. The second model is to detect the outlier in the incoming nominees for extraordinary nominees. The machine learning predictions were compared and found to be similar to the 2021 judge's assessment in the awarding processes at SATU Indonesia Awards. The average Judges' pre-final 2021 nominees' scores are compared to the Random Forest's predictions and found to be reasonably similar, with a small RMSE error around 1.1 to 1.6 for all assessment parameters. The smallest RMSE was obtained in the Sustainability parameter. The Obstacle parameter was found to have the largest RMSE.


INTRODUCTION
PT. Astra International Tbk. has fully supported SATU Indonesia Awards for more than ten years to inspire young Indonesians [1] [2]. The judges for the SATU Indonesia Awards are some of the finest in Indonesia, including former ministers, such as Prof. Emil Salim, Prof. Nila Moeloek, Prof. Fasli Jalal, Ms. Tri Mumpuni, Ms. Dian Sastrowardoyo, Mr. Riza Deliansyah, Mr. Boy Kelana Soebroto, Ms. Diah Suran Febrianti, Mr. Arief Zulkifli, Mr. Billy Boen, and the author. The total yearly nomination increases so that in 2021, received more than 10,000 nominations.
The awarding processes begin with the public submission of nominations in written form via the Internet. The judges assess the nomination write-up. The assessment is focused on five parameters, namely, motive, obstacle, outcome, outreach, and sustainability, with discrete scores of 1, 3, 7, and 9. The most significant problem in the SATU Indonesia Awards is to select 90 pre-finalist nominations out of more than  10,000 submitted nominations in less than one week. To get the pre-finalist nomination marking consistently in a short time requires the help of machine learning for the classification processes.
Thus, machine learning mechanisms need to be developed. Machine Learning in award prediction is rare and difficult to perform [3] [4]. In this work, the machine learning predictions were compared and found to be similar to the 2021 judge's evaluation of the awarding processes at the SATU Indonesia Awards. The corpus training data is extracted from ten years of historical data, nominations write-up, and the corresponding Judge's score. TEMPO Data & Analysis Center (PDAT) for six months, specifically Ms. Ai Mulyani, Ms. Rum Hayati, Ms. Ratih Virgorini (decd), and Mr. Arief Priandono, extract more than six thousand instances out of tens of thousands of archive data. The corpus is reprocessed to obtain two (2) types of corpus training data: average score and minimum score from the Judges, each consisting of 1220 instances of corpus training data.

Text Marking and Text Rating
The effort to automatically evaluate writings is exciting research. Many machine-learning marking methods have been developed, especially for precision education [5]- [10]. Some researchers use deep learning to regress numerical ratings from review texts or recommendation systems in supervised learning [11]- [15]. For lengthy text, more rigorous machine learning for text classification was developed [13][15]- [18]. Some researchers have gone so far as to make it more automated by using English Natural Language Processing (NLP) to assess short answers [11][13] [19]. The last technique may be challenging to do as the Indonesian NLP is not as mature as English [20] [21].
The SATU Indonesia Awards machine learning process avoids NLP and utilizes regressions from thousands of previous historical texts. The machine learning at the SATU Indonesia Awards has many distinctions, among others are,  Indonesian NLP is not as mature as English NLP. Consequently, The Indonesian corpus text regression is used.  The corpus of SATU Indonesia Awards nominations is a collection of long texts of thousands of nominees.  In this work, two types of assessment classification models were developed and used, namely, (1) continuous with an average Judges' assessment value per instance, and (2) discrete with a minimum Judges' assessment value with possible values of 1, 3, 7, 9 per instance.

METHOD 2.1 ORANGE to enable non-data scientist
The most uncomplicated, intuitive machine learning techniques with a graphical interface are necessary for implementing machine learning in the SATU Indonesia Awards. ORANGE data mining is easy, very intuitive, and free to download from the Internet. Furthermore, ORANGE is attractive because it has many plugins for various complex data analysis, such as text analysis and image analysis, freely available on the Internet [22]- [24]. Thus, the ORANGE data mining application with a graphical interface enabled SATU Indonesia Awards personnel to conduct the necessary analysis.
Machine Learning used for SATU Indonesia Awards was simply a regression of the history text data with corresponding Judge's assessment value of each parameter to obtain a model to predict the necessary assessment classification.

Corpus Training Data Preparation
Machine learning requires us to generate corpus training data. The corpus contains extracted data from the nomination archive in the last ten years. It includes the text of nominees' activities with corresponding Judge's assessment on motive, obstacle, outcome, outreach, and sustainability parameters of each nominee.
TEMPO PDAT had to work for more than six (6) months to extract tens of thousands of historical data for the SATU Indonesia Awards to produce more than 6 thousand instances of corpus data. The resulting corpus training data is an excel file tagged for text analysis in the ORANGE data mining application. The data must be sorted and processed to separate motive, obstacle, outcome, outreach, and sustainable parameter values as separate corpus training data.
For each instance, the nominee is usually judged by several Judges at the same time. Thus, there are several choices of Judge's scores per instance. Although, there are some possible modeling techniques. In this work, two (2)  obtain a model to predict the precise nomination parameter's value. The predicted value is between 1 to 9. Hence, linear regression can be employed [25]- [28].  Corpus training data use the Judge's minimum assessment value to predict the outliers. In this approach, the Judge's minimum assessment value of each instance is used as corpus training data. There are only four (4) possible classification values, namely, 1, 3, 7, and 9. Thus, it is reasonably similar to logistic regression. The prediction results will not precisely predict nominee assessment value, instead of detecting unique outliers [29]- [33] from the average nominees. These exceptional outliers are usually those who perform differently from the average.
These two (2) approaches represent two (2) corpus training data stored in two (2) different files, with each having 1220 instances. The corpus training data is an excel file tagged for ORANGE data mining.

Machine Learning to Evaluate SATU Indonesia Awards Nomination
There It is crucial to test the best algorithm for our data prior to any machine learning processes [34] [35]. The ORANGE program to test the best algorithm is shown in Fig. 1. The test and score widget compares several algorithms for the same test data to assess the best algorithm for model generation. Text from uploaded Corpus processed by the Bag of Words widget to obtain a corpus with word counts for each data instance before being analyzed in the test & score widget. In the above program, seven (7) algorithms are tested and compared. The two (2) corpus training data are used to find the best algorithm. ORANGE program to generate and save the resulting model is relatively simple, as shown in Fig. 2. The loaded corpus training data is processed through pre-processed text widget to remove stop words, then to the bag of words widget to obtain corpus with word counts for each data instance. Corpus with word counts process through Random Forest [36]- [39] to generate the required model and save for the next prediction or classification processes. Random Forest is used because it produces minor errors compared to other algorithms [ The incoming nominee corpus text data is loaded into the ORANGE program in Fig. 3 to predict the nominee's parameter value. The loaded Corpus is then processed through preprocess text and a bag of words widget, and the data is injected into the Prediction widget, which gets the model from the load model widget. The output of the scores can be viewed or saved in a spreadsheet file that can be combined with other new nomination data. The next ORANGE program in Fig. 4 is specifically made to compare the Random Forest's predicted values with the 2021 average Judge's assessment value. In this particular example program, the evaluated parameter is the nominee's motive. The corpus training data on motive is processed into the pre-processed text and a bag of words widget before passing it into the test & scores widget as data used to generate a model and prediction using Random Forest. The 2021 Judge's assessment values are loaded and processed before being injected into the test & score widget as test data. The test and score widget will evaluate the predicted value from data and the test data and report the error or the deviation.

RESULTS AND DISCUSSION
In this section, we will discuss the results of the implementation of machine learning at the SATU Indonesia Awards, especially the resulting accuracy.

Error measurement of the machine learning algorithm
The algorithm test on corpus training data with average Judge's assessment values is shown in Table 1. It turns out that only three (3)

Comparing machine learning vs. 2021 Judges'
The 2021 predicted parameter value based on corpus training data is compared to the 2021 Judge's assessment on 90 pre-finalist nominations to evaluate the model's accuracy. The program shown in Fig. 4 is used. The model was generated using Random Forest [42] from the average Judge's assessment corpus training data. The 2021 Judge's assessment value and its predicted value are compared in the test & scores widget. The test & score widget will then report the error between the Random Forest predicted values and the Judges' values on the motive, obstacle, outcome, outreach, and sustainable parameters are shown in Table  3. It clearly shows that the 2021 Random Forest predicted values vs. the 2021 average Judge's assessment values are not much different, and the error is relatively small with RMSE around 1.1-1.6. The slightest error is obtained in the Sustainability assessment, while the highest error is found in the Obstacle assessment. Thus, we can use Random Forest [42] in machine learning to predict the SATU Indonesia Awards nomination value with a small RMSE error of around 1.1 to 1.6 for all assessment parameters.
We found Linear Regression algorithm produces a twice higher error as compared with Random Forest. Tests on the Sustainability value show RMSE Linear Regression 2,106, while RMSE Random Forest is much smaller at 1,140.
A slight difference between predicted and Judge's value in sustainability may be because, in reality, it is very challenging for most nominees to reach long-term sustainability in their activities. It is easily detected by all Judges, including in the corpus history data. Thus, the deviation of the Sustainability value by 2021 Judge's and machine learning predicted is slight.
On the other hand, the obstacles experienced by the nominees are more subjective. Thus, depending on the Judge's understanding of the obstacle faced by the nominee, it is more difficult to obtain an agreement from all Judges. It is shown by more volatile obstacle values between judges and a higher RMSE in machine learning prediction results.

CONCLUSION
Machine learning has been successfully predicting the parameter values of the SATU Indonesia Awards nominee with a small RMSE error of around 1.1 to 1.6 for all assessment parameters. Random Forest regression generates a model of the nominee's text and its corresponding Judges' parameter values. TEMPO Data and Analysis Center has to spend more than six long months extracting the corpus training data from the ten-year assessment archive. Two models generated from two types of training data were developed and used. The model based on the average Judge's assessment values accurately predicts the nominee's parameter values. At the same time, the model based on Judge's minimum assessment value is to classify the outliers in the nominees. It is found that the 2021 Random Forest predicted results vs. the 2021 average Judge's values are not much different, and the error is relatively small. The slightest error is obtained in the Sustainability assessment, while the highest error is found in the Obstacle assessment.