The Combination of Naive Bayes and Particle Swarm Optimization Methods of Student’s Graduation Prediction

ABSTRACT


INTRODUCTION
Problems with graduation rates are experienced at many universities, one of which is at IGM University. Graduation rates cannot be predicted yet, so there is no management to avoid a drop in graduation because all of this will have an impact on university accreditation, so graduation predictions need to be done. The quality of a university is influenced by many factors; one of them is the success of students who graduate within the stipulated time. The higher numbers of students who graduate timely fashion give effect to the accreditation of the department in a university [1].
Several factors that influence the number of a college graduate in a university are required to be analyzed in order to make the anticipation policy so that all students can timely graduate. There are many factors that influence graduation rate, namely lower academic skills, lecture program, achievement index, or other factors [2].
Some former researches related to some applications on the prediction of timely graduation have been conducted in several fields, which can be explained as follows. The research applied C 4.5 algorithm from the performance calculation of algorithms, resulting in the prediction of this student graduation algorithm C 4.5 reaching 82% of accuracy level [3]. The accuracy level of dataset testing, which used the Naive Bayes method towards the college graduation rate in this research, was 80.72% of 1.162 of datasets used for training data and 587 datasets for testing data in a university using 6 attributes of student data [4].
Research on student performance using the Naive Bayes Method showed that the accuracy rate was 85% of students will timely graduate. The use of Naive Bayes would be better if it applied more training data [5]. A study related to the predictions of the study duration of a college student using the Naive Bayes Algorithm revealed that the accuracy of dataset testing has resulted in an average accuracy rate of 85.17% [6].
Research on student performance prediction with Naive Bayes calculation result obtained for this classification using Naive Bayes method showed the accuracy value of this research reached 70% by using several attributes, including Indonesian national exam scores and English national exam scores, the average score of national exam, any achievement related to languages, and the number of books read each month [7].
Research on Student Graduation Prediction using Naive Bayes Classifier (NBC) the main objective of this research is to produce a system on the NBC algorithm that can predict a student's success to be able to complete the study period in a timely manner based on student performance, which is the student's GPA from semester 1 to semester 4. Researchers conduct trials with 2 stages. Namely, the first stage is a literature review, and the second stage determines the prediction of Computer Science student achievement using the NBC method [8].
Research on detection cases, with the Classification method using the Naive Bayes algorithm and Improved Particle Swarm Optimization (PSO), aims at classification to help in the prediction of release in online learning. This study concludes that a simple modification to test standard PSO with Naive Bayes through front attribute selection using accuracy estimates in online learning systems. The calculation result of the trial shows that the use of variables does not appear to be related to a class attribute, which tends to reduce the accuracy and reliability of the classification model. However, when using the hybrid PSO algorithm, it clearly tends to find a small subset of the attributes of PSO with Naive Bayes. Naive Bayes is combined with the PSO method because hybrids have been proven to be the best in the ability to choose features without reducing an accuracy value in the classification method [9].
Research that discusses a prediction with the Naive Bayes algorithm and Particle Swarm Optimization, the authors submit that the proposed model is very effective and efficient to improve the accuracy of the Naive Bayes algorithm by using particle optimization in the feature selection section that achieves the same performance or maybe better classification. Researchers have successfully developed new algorithms that maximize classification performance and can minimize the number of features. The conclusion from the results of the researchers, analyzing this algorithm is automatically able to evolve feature choices with fewer features and improve classification performance compared to using all features in the dataset [10].
Research on Particle Swarm Optimization (PSO) based algorithm with Naive Bayes algorithm suggested a new model diagnose a disease based on a combination of PSO algorithms and Naive Bayes classification, due to the fact that the presence of algorithms to choose the best data training and avoid selecting those who are decreasing classification accuracy. This test obtained an accuracy rate of 97.95%, and according to the results obtained that in improving classification accuracy is not always necessary to present a new classification method by selecting the best training data and eliminating inappropriate training data, but the classification accuracy rate can increase [11].
Studies using the Naive Bayes classification method is often used as a classification method to analyze the college graduation rate the dataset, but those have different results, although those researches resulted in a high accuracy rate and some low accuracy value. There are several researchers who have classified Naive Bayes with Particle Swarm Optimization to generate an accuracy rate. Researchers who have made improved Particle Swarm Optimization (PSO) generated Improved Particle Swarm Optimization (IPSO). This improvement aims to strengthen PSO capability in developing results efficiently [12]. The researcher proposed Hybrid Naive Bayes testing and Improved Particle Swarm Optimization to obtain optimal accuracy rate in classifying case study of college graduate data.

RESEARCH METHOD
In this stage, this study is conducted by reviewing several papers related to this research. When it is found that some issues have not been resolved from previous research, a literature review is conducted to figure out those issues that will be analyzed. This study uses the Naive Bayes method and is improved using PSO in analyzing or classification case studies of UIGM student graduation data so as to produce better accuracy.

Collecting Data
In this stage, the collection data techniques are performed by the researcher to collect datasets. In collecting data, there are secondary and primary data sources. This research used datasets from Indo Global Mandiri University data, which was the secondary data on SIMAK through the Bureau of Academic Administration, which had 11 departments of undergraduate student data consisting of graduation years from 2011 to 2014 with the total of 893 Data. The training dataset is shown in Table 1

Naive Bayes Algorithm Model
Naive Bayes is a classifier or statistical grouping that can be used to predict the possibility of membership in a class. Based on the Bayes Theorem is able to classify something similar to a decision tree [13].
Naive Bayes algorithm is one of the algorithms in the classification method. Naive Bayes is a classification with a probability, and the statistical method that has been raised by the British scientist Thomas Bayes is a prediction of future opportunities based on experience from the past, so it is known as the Bayes Theorem. The theorem is then combined with Naive, where it is assumed that the conditions between attributes are independent. The Naive Bayes classification has assumed that having or not having certain characteristics of a class will not have anything to do with the characteristics of another class [6].
Naive Bayes formula used is: Where X = Data with unknown classes H = The data X hypothesis is a more specific class P (H | X) = H hitosesis probability based on condition X (posteriori probability) P (H) = Hitosesis probability H (prior probability) P (X | H) = Probability X is based on the conditions in hypothesis H P (H) = Probability X

PSO
Particle Swarm Optimization (PSO) is a Bioinspired algorithm that can be used to solve problems and find solutions given space by calculating the probability of formation of the particles that give space. The calculation is carried out until the best solution is found [14]. Particle swarm optimization is a metaheuristic that was introduced by Kennedy and Eberhart in 1995, whose algorithmic process was motivated by social behavior in animals. PSO is a population-based stochastic approach to solving continuous and discrete optimization problems. At each iteration, the velocity of the The Combination of Naive Bayes and PSO Methods of Students' Graduation Prediction (Evi Purnamasari, dkkl) particle is adjusted stochastically according to the effect on the best solution and the best solution of its neighbors, then calculates new points to be evaluated [15].

PSO Based Naive Bayes Model
The steps that will be performed in implementing PSO combination algorithm in finding a solution to optimal as follows: 1. Initialization In the initial stage of PSO initialization usually use code binary to simplify a problem -Initialization of Initial Velocity In initial velocity of all particles is 0. -Initialization of initial particle position The zero Iteration on the early position of particle rising using the equation as follows: = |min + rand[0,1] × ( |max + |min) -Initialization PBest and Gbest 2. Update Velocity Calculate new velocity using

Position Update
Calculate function using

RESULTS AND DISCUSSION
This research applied the classification method using the Naive Bayes algorithm to produce a better accuracy rate from the previous research, which used hybrid NBimproved PSO. NB -Improved PSO is a combination of Naive Bayes and improved Particle Swarm Optimization. The main component of the proposed algorithm is the NB algorithms application for the PSO optimization method.
NB is to determine the accuracy level of UIGM student's graduation based on SIMAK UIGM data, using student graduation data with Naive Bayes. This model calculates the number of timely and not timely graduate alumni using attributes; they are IP Semester 1, IP Semester 2, IP Semester 3, IP Semester 4, gender, course program, origin, Senior High School origin SMA, employment status.
From Table 1 training dataset, there are 9 attributes. There are two class values, i.e., On time and not, to determine the data class requires the calculation of possible values that can be known from the training data. After calculating the Probability value from the amount of data, a Particle value is randomly determined between 0 and 1, so the values in Table 2 are obtained.  The Combination of Naive Bayes and PSO Methods of Students' Graduation Prediction (Evi Purnamasari, dkkl) After getting random Particle values between 0 and 1, the particles are counted on time and not on time using these random values where the results are obtained in Table 3 and Table 4.    Naive Bayes's fitness results are obtained on time, and particle calculations are not on time where the fitness is checked between the graduation status data with the Naive Bayes system graduation status, so the fitness value in Table 5 is obtained. The results of the Naive Bayes fitness value obtained an accuracy value of 90%. Then accuracy value result from Naive Bayes will be optimized using Particle Swarm Optimization. So it will be obtained from a combination of those two methods in Table 6.