Gaussian Based-SMOTE Method for Handling Imbalanced Small Datasets

Authors

  • Muhammad Misdram Universitas Dian Nuswantoro
  • Edi Noersasongko Universitas Dian Nuswantoro
  • Purwanto Purwanto Universitas Dian Nuswantoro
  • Muljono Muljono Universitas Dian Nuswantoro
  • Fandi Yulian Pamuji Merdeka University

DOI:

https://doi.org/10.26555/jiteki.v9i4.26881

Keywords:

SMOTE, Gaussian, Imbalance, Oversampling, Undersampling

Abstract

The problem of dataset imbalance needs special handling, because it often creates obstacles to the classification process. A very important problem in classification is to overcome a decrease in classification performance. There have been many published researches on the topic of overcoming dataset imbalances, but the results are still unsatisfactory. This is proven by the results of the average accuracy increase which is still not significant. There are several common methods that can be used to deal with dataset imbalances. For example, oversampling, undersampling, Synthetic Minority Oversampling Technique (SMOTE), Borderline-SMOTE, Adasyn, Cluster-SMOTE methods. These methods in testing the results of the classification accuracy average are still relatively low. In this research the selected dataset is a medical dataset which is classified as a small dataset of less than 200 records. The proposed method is Gaussian Based-SMOTE which is expected to work in a normal distribution and can determine excess samples for minority classes. The Gaussian Based-SMOTE method is a contribution of this research and can produce better accuracy than the previous research. The way the Gaussian Based-SMOTE method works is to start by determining the random location of synthesis candidates, determining the Gaussian distribution. The results of these two methods are substituted to produce perfect synthetic values. Generated synthetic values are combined with SMOTE sampling of the majority data from the training data, produce balanced data. The result of the balanced data classification trial from the influence of the Gaussian Based SMOTE result in a significant increase in accuracy values of 3% on average.

Downloads

Published

2023-10-16

How to Cite

[1]
M. Misdram, E. Noersasongko, P. Purwanto, M. Muljono, and F. Y. Pamuji, “Gaussian Based-SMOTE Method for Handling Imbalanced Small Datasets”, J. Ilm. Tek. Elektro Komput. Dan Inform, vol. 9, no. 4, pp. 973–982, Oct. 2023.

Issue

Section

Articles