Evaluating Sampling Techniques for Healthcare Insurance Fraud Detection in Imbalanced Dataset

Joanito Agili Lopo; Kristoko Dwi Hartomo

doi:10.26555/jiteki.v9i2.25929

Evaluating Sampling Techniques for Healthcare Insurance Fraud Detection in Imbalanced Dataset

Authors

Joanito Agili Lopo Universitas Kristen Satya Wacana http://orcid.org/0009-0001-3183-7132
Kristoko Dwi Hartomo Universitas Kristen Satya Wacana http://orcid.org/0000-0003-0237-851X

DOI:

https://doi.org/10.26555/jiteki.v9i2.25929

Keywords:

Healthcare Insurance, Imbalanced Dataset, Oversampling, XGBoost, Fraud Detection, Undersampling

Abstract

Detecting fraud in the healthcare insurance dataset is challenging due to severe class imbalance, where fraud cases are rare compared to non-fraud cases. Various techniques have been applied to address this problem, such as oversampling and undersampling methods. However, there is a lack of comparison and evaluation of these sampling methods. Therefore, the research contribution of this study is to conduct a comprehensive evaluation of the different sampling methods in different class distributions, utilizing multiple evaluation metrics, including , , , Precision, and Recall. In addition, a model evaluation approach be proposed to address the issue of inconsistent scores in different metrics. This study employs a real-world dataset with the XGBoost algorithm utilized alongside widely used data sampling techniques such as Random Oversampling and Undersampling, SMOTE, and Instance Hardness Threshold. Results indicate that Random Oversampling and Undersampling perform well in the 50% distribution, while SMOTE and Instance Hardness Threshold methods are more effective in the 70% distribution. Instance Hardness Threshold performs best in the 90% distribution. The 70% distribution is more robust with the SMOTE and Instance Hardness Threshold, particularly in the consistent score in different metrics, although they have longer computation times. These models consistently performed well across all evaluation metrics, indicating their ability to generalize to new unseen data in both the minority and majority classes. The study also identifies key features such as costs, diagnosis codes, type of healthcare service, gender, and severity level of diseases, which are important for accurate healthcare insurance fraud detection. These findings could be valuable for healthcare providers to make informed decisions with lower risks. A well-performing fraud detection model ensures the accurate classification of fraud and non-fraud cases. The findings also can be used by healthcare insurance providers to develop more effective fraud detection and prevention strategies.

Downloads

Published

2023-04-18

How to Cite

[1]

J. A. Lopo and K. D. Hartomo, “Evaluating Sampling Techniques for Healthcare Insurance Fraud Detection in Imbalanced Dataset”, J. Ilm. Tek. Elektro Komput. Dan Inform, vol. 9, no. 2, pp. 223–238, Apr. 2023.

Download Citation

Issue

Vol. 9 No. 2 (2023): June

Section

Articles

License

Authors who publish with JITEKI agree to the following terms:

Authors retain copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (CC BY-SA 4.0) that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.

This work is licensed under a Creative Commons Attribution 4.0 International License

About the Journal	Journal Policies	Author	Information
Focus and Scope Editorial Board Reviewer Open Access Policy Sponsorships Contact Us Google Scholar Most Cited Paper	Publication Ethics Peer Review Process Review Guideline Archiving Advertising	Author Guidelines Online Submission Publication Charge / Fee Plagiarism Policy Article Withdrawal	For Readers For Authors Journal History For Editor For Reviewer

Evaluating Sampling Techniques for Healthcare Insurance Fraud Detection in Imbalanced Dataset

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

License

Similar Articles

Most read articles by the same author(s)

special_links

journal_metrics

current_indexing

journal_template_2

Make a Submission

sinta_certificate

visitor_country

visitors

Information