Clustering analysis of learning style on anggana high school student

The inability of students to absorb the knowledge conveyed by the teacher is’nt caused by the inability of understanding and by the teacher which isn’t able to teach too, but because of the mismatch of learning styles between students and teachers, so that students feel uncomfortable in learning to a particular teacher. It also happens in senior high school (SHS/SMAN) 1 Anggana, so it is necessary to do this research, to analyze cluster (group) of student learning style by applying data mining method that is k-Means and Fuzzy C-Means. The purpose was to know the effectiveness of this learning style cluster on the development of absorptive power and improving student achievement. The method used to cluster the learning style with data mining process starts from the data cleaning stage, data selection, data transformation, data mining, pattern evolution, and knowledge development.

groups of learning style outcomes of K-Means and Fuzzy C-Means methods, and formulate appropriate learning style decisions for each class of students.

Related Works
Research on the game with the same technique has been widely done among others: -Comparative Analysis of K-Means and Fuzzy C-Means Algorithms [1].
-K-Means Cluster Analysis for Students Graduation: Case Study: STMIK Widya Cipta Dharma [6]. -Application of learning analytics using clustering data Mining for Students' disposition analysis [7]. -Impact of Distance Metrics on The Performance of K-Means and Fuzzy C-Means Clustering an Approach to Assess Student's Performance In E-Learning Environment [8]. -Cluster Analysis for Learning Style of Vocational High School Student Using K-Means and Fuzzy C-Means (FCM) [9]. -Comparative Study of K-Means and Fuzzy C-Means Algorithms on the Breast Cancer Data [10]. -Performance Assessment of K-Means, FCM, ARKFCM and PSO Segmentation Algorithms for MR Brain Tumour Images [11]. In a study conducted by Ghosh and Dubey [1], the comparative K-Means and FCM algorithms were measured by looking at the iteration of centroid point movement. This study looked at the accuracy and weaknesses of both methods in solving the clustering problems in some experimental cases. Research conducted by Wijayanti, et.al [6], their study examines the comparative application of methods K-Means in a case study, namely graduate student grouping for academy (STMIK) Widya Cipta Dharma based on the characteristics of the GPA, the study period, Department of Study Programs and Predicate. Determination of the number of groups is done through a validity index.
In the Bharara's team research [7], the main objective of their research work is to find meaningful indicators or metrics in a learning context and to study the inter-relationships between these metrics using the concepts of learning analytics and educational data mining, thereby, analyzing the effects of different features on student's performance using disposition analysis. Their project, K-Means clustering data mining technique is used to obtain clusters which are further mapped to find the important features of a learning context. Relationships between these features are identified to assess the student's performance.
Research comparing these two methods was also carried out by several researchers. Mahatme, et.al [8], their study helps the researchers to take quick decision about choice of metric for clustering. In clustering algorithm, distance metrics is a key constitute in finding regularities in the data objects. In this paper, impact of three different metrics Euclidean, Manhattan and Pearson correlation coefficient on the performance of K-Means and fuzzy C-Means clustering is presented. In clustering, detection of similarity using distance metrics affects the accuracy of the algorithm [8]. The other case studies, Dubey, et.al [10] also compare these methods. The two main objectives of their work were: firstly, to compare the performance of K-Means and fuzzy C-Means (FCM) clustering algorithms; and secondly, to make an attempt to carefully consider and examine, from multiple points of view, the combination of different computational measures for K-Means and FCM algorithms for a potential to achieve better clustering accuracy. The computational results indicate that FCM algorithm was found to be prominent and consistent than K-Means algorithm when executed with different iterations, fuzziness values, and termination criteria. It is more potentially capable in classifying breast cancer Wisconsin dataset as the classification accuracy is more important than time.
Still in the health topic, in Karegowda's team research [11], they compare the performance of K-Means, Fuzzy C-Means (FCM), Particle Swarm Optimisation (PSO) and Adaptive Regularised Kernel Fuzzy C-Means (ARKFCM)-based segmentation techniques for accurate delineation of tumour using clinical brain tumour Magnetic Resonance images. Their experimental evaluation revealed K-Means and FCM segmentation algorithms out performed compared with PSO and ARKFCM segmentation algorithms. Andrea, et.al [9], their research is similar to this research, they analyze cluster (group) type of student learning by applying K-Means and Fuzzy C-Means (FCM), but their paper case study is High School Student Penajam Paser Utara. The differences in this research are emphasized on the application of

Research Stages
The research method used was experiment with research stages as follows: -Data collection Collecting questionnaire data from 100 students -Preliminary data processing (data cleaning) The collected data was processed by soft-computing algorithm to reduce irrelevant data. While relevant data and analysis tasks were returned into the database (selection process). -Formation of proposed model (data transformation) In this method, data mining would be described schematically and accompanied by a calculation formula. The model would be formed from the data that already processed. The result of model processing would be measured with the current model.

-Experiments and Model Testing
Describes how experiments were carried out until the formation of the model and explains how to test the model that was formed. -Evaluation and validation of results (pattern evaluation) The evaluation was performed by observing the cluster results with both soft-computing algorithms. Validation was performed by measuring the cluster results and compared with the original data. Performance measurement was performed by comparing the error value of cluster result of each algorithm so that it can be known more accurate algorithm. -Knowledge presentation An overview of visualization and knowledge techniques was used to provide knowledge to users. At this stage the development of knowledge was used by the school to take policy in determining the appropriate teaching model in school.

Data Collection
The collected data consists of secondary data and primary data. Primary data directly from questionnaires and interviews in SHS 1 Anggana. While the secondary data was obtained by studying literature studies in the form of written rules or documents that have relation to the title research. In addition, the data was obtained through observation or direct observation of conditions in the field that is in the environment of SHS 1 Anggana.

Research Methods
After the data was collected then the next stage was to prepare the data in order to be used for data mining process. The raw data can be used for the data mining process. The raw data to be used in this application was obtained from the questionnaire of 100 students. Preliminary data processing is part of the data preparation. The steps taken include eliminating the double data and cleaning the data that was plagued, combine the data, determined the attribute to be processed and change the data. Data preparation was performed manually by using excel format *.csv. The result of data preparation process was presented in tabular form Table 1.
Student data in SMAN 1 Anggana based on the type of learning questionnaire that was filled with 100 students of random samples from classes of 1, 2 and 3 from various departments. Where: X1 is the percentage of learning style with visual learning; X2 is the percentage of auditory learning; X3 is the percentage of kinesthetic learning. Data from Table 1 can be grouped into several groups according to the attributes that have been determined in the form of X1 (Visual), X2 (Auditory), X3 (Kinesthetic).

K-Means Algorithm
K-Means was first published by Stuart Lloyd in 1984 and is a widely used clustering algorithm. K-Means works by segmenting existing objects into clusters or so-called segments so that objects within each group are more similar to each other than objects in different groups. The clustering algorithm is putting a similar value in one segment, and putting different values in different clusters [19]. K-Means separates data optimally with a loop that maximizes the result of the partition until no data changes in each segment. K-Means works with a top-down approach because it starts with pre-defined segmentation [20]. So the result of data of a segment is not possible mixed between one segments with other segment [21]. This approach also speeds up the computation process for large amounts of data.
The K-Means algorithm applies to objects represented in d-dimensional vector dots. K-Means clustered all the data in each dimension where the point in the same segmentation was given cluster ID. The value of k is the basic input of the algorithm that determines the number of segments to be formed. Partition will be formed from a set of object n into cluster k so as to form the similarity of object in each k-segmentation. The K-Means algorithm is a widely used algorithm for determining clusters [22], because it is easy to use, has exact and modifiable calculations to meet the needs of use.

Fuzzy C-Means Algorithm
The famous fuzzy clustering algorithm is FCM introduced by Jim Bezdek. He introduced the idea of the fuzzification parameters (m) within the range [1, n] that determines the fuzzy degree of the cluster. When cluster m=1, the effect is a clustering crips from some point, but when m>1 the fuzzy degree between points in the decision space becomes increased [20,23]. FCM clustering involves two processes: the calculation of the cluster center and the mastery of the point toward the center by using a form of Euclidean distance. This process is repeated until the center of the cluster has stabilized. FCM executes a direct constraint of the fuzzy membership function connected to each point. The purpose of the FCM algorithm is the assignment of data points into clusters with varying degrees of membership. This membership reflects the degree to which points are more representative of one cluster [24].

Results Analysis
Find the right number of groups with optimal cluster number recommendations can be seen on the evacluster chart. The evalcluster graph is the best recommendation graph for group assignment, which will be used for grouping of data. The first highest peak of evalclusters chart will be used for the best cluster determination of some existing clusters [25], the best cluster according to the evalcusters for the above data is in cluster 4 of 96.863 that shown on Figure 1. Based on the cluster formed in Table 2, type of student learning in SHS 1 Anggana can be grouped into four groups according to the values that meet on each variable in each cluster and can be seen in the silhouette of cluster 4 in Figure 2. Figure 2 shows, 4 clusters of the silhouette image. It can be seen that very few cluster elements were in negative territory. Thus the result of this cluster was quite good and represents similar groups.

K-Means Cluster Analysis
The process of centroid deployment into 4 clusters by using a 3D graph that compares the attributes used, shown in Figure 3. Figure 3 shows, obtained the percentage value of 100 student samples: Cluster 1: 37%; Cluster 2: 20%; Cluster 3: 13%; Cluster 4: 30%. Then K-Means analysis can be drawn: a. 37% of students have auditory learning dominant only. b. 20% of students have visual learning style and little audio help (mixed visual-auditory) c. 13% of students have balanced blend of the three styles d. 30% of students are like cluster 3, they have visual learning style and little audio help (mixed visual-auditory), but this cluster has kinesthetic point more than kinesthetic point in cluster 3, and visual-auditory point less than visual-auditory point in cluster 3.

FCM Analysis
The FCM grouping was conducted with the same group as the optimal group of K-Means clusters (4 clusters), in order to compare the results of cluster patterns formed. From Figure 4, it was obtained FCM algorithm on 4 clusters showed that the clustering process stopped at 100 th iteration with the objective function value was 0.8977×10 4 . The number of iterations of 4 clusters was less and effective compared to 5 or 6 clusters. Figure 5 shows, obtained the percentage value of 100 student samples: Cluster 1: 21%; Cluster 2: 33%; Cluster 3: 24%; Cluster 4: 22%. Then FCM analysis can be drawn: -21% of students have auditory learning dominant only.
-33% of students have auditory learning style and little visualization help (mixed auditory-visual) -24% of students have balanced blend of the three styles -22% of students have visual learning style and little audio help (mixed visual-auditory)

Comparison of K-Mean and FCM Analysis
Both algorithms resulted in nearly identical clustering of 4 clusters, and with numbers that had a small percentage increment. The two percentages of clusters can be seen in Figure 6. Figure 6, the highest percentage of auditory learning, and the second highest is mixed auditory-visual or visual-auditory, while the low percentage is in balanced blend of the three styles. But there is little difference in the results of K-Means and FCM cluster analysis that is in the 2 nd cluster. K-Means analyzed that the 2 nd cluster was a group of students have visual learning more dominant than auditory (mixed visual-auditory), while FCM analyzed that the 2 nd cluster was a group of students have auditory learning more dominant than auditory (mixed visual-auditory).

Conclusion
From analysis results of the two cluster algorithms used can be drawn conclusion: The classification of learning style of high school students SHS 1 Anggana by using K-Means and FCM can be formed into 4 clusters. Many students of SHS 1 Anggana liked to learn with auditory learning, that assisted with visualization rather than learning just by reading or self-practice. This conclusion is drawn from the merging of clusters percentage of students who favor mixed auditory-visual learning plus the percentages of who only favor auditory learning. This research can help the teachers of SHS 1 Anggana to find the right method of teaching to their students in class.