Application of SMOTE to Handle Imbalance Class in Deposit Classification Using the Extreme Gradient Boosting Algorithm

Authors

  • Dina Arifah Lambung Mangkurat University
  • Triando Hamonangan Saragih Lambung Mangkurat University
  • Dwi Kartini Lambung Mangkurat University
  • Muliadi Muliadi Lambung Mangkurat University
  • Muhammad Itqan Mazdadi Lambung Mangkurat University

DOI:

https://doi.org/10.26555/jiteki.v9i2.26155

Keywords:

Classification, SMOTE, Imbalance Class, Deposit, Extreme Gradient Boosting, Bank Marketing, Customer Identification

Abstract

Deposits became one of the main products and funding sources for banks and increasing deposit marketing is very important. However, telemarketing as a form of deposit marketing is less effective and efficient as it requires calling every customer for deposit offers. Therefore, the identification of potential deposit customers was necessary so that telemarketing became more effective and efficient by targeting the right customers, thus improving bank marketing performance with the ultimate goal of increasing sources of funding for banks. To identify customers, data mining is used with the UCI Bank Marketing Dataset from a Portuguese banking institution. This dataset consists of 45,211 records with 17 attributes. The classification algorithm used is Extreme Gradient Boosting (XGBoost) which is suitable for large data. The data used has a high-class imbalance, with "yes" and "no" percentages of 11.7% and 88.3%, respectively. Therefore, the proposed solution in the research, which focused on addressing the Imbalance Class in the Bank marketing dataset, was to use Synthetic Minority Over-sampling (SMOTE) and the XGBoost method. The result of the XGBoost study was an accuracy of 0.91016, precision of 0.79476, recall of 0.72928, F1-Score of 0.56198, ROC Area of 0.93831, and AUCPR of 0.63886. After SMOTE was applied, the accuracy was 0.91072, the precision was 0.78883, the recall was 0.75588, F1-Score was 0.59153, ROC Area was 0.93723, and AUCPR was 0.63733. The results showed that XGBoost and SMOTE could outperform other algorithms such as K-Nearest Neighbor, Random Forest, Logistic Regression, Artificial Neural Network, Naïve Bayes, and Support Vector Machine in terms of accuracy. This study contributes to the development of effective machine learning models that can be used as a support system for information technology experts in the finance and banking industries to identify potential customers interested in subscribing to deposits and increasing bank funding sources.

Downloads

Published

2023-06-01

Issue

Section

Articles