Tailoring Data Storage Configuration for Efficient Fraud Detection Model Training
DOI:
https://doi.org/10.26555/jiteki.v10i4.30013Keywords:
Machine Learning, Fraud Detection, Model Prediction, Fine Tuning, Model Update, Real-Time DetectionAbstract
The rapid growth of e-commerce in Indonesia, with a record 88.1% growth rate, has been accompanied by a surge in online fraud, leading to an estimated loss of 4.62 trillion rupiahs. Current fraud prevention methods, such as the widely used 3D-Secure system, though effective, result in a high rate of transaction abandonment (approximately 16%), which is undesirable for merchants. To address this, we propose an AI-based fraud detection system that leverages machine learning models to identify potentially fraudulent transactions. By employing a combination of classification algorithms, including logistic regression and neural networks, security protocols are activated only for high-risk transactions, optimizing transaction processing efficiency and improving detection accuracy. Our study focuses on fine-tuning key parameters of the AI-Fraud Detector model, specifically some parameters such as ∆ttrain, ∆tlag and f rac hr pass, to enhance detection performance over time. Simulation performances using ROCAUC, false positive rate (fpr), and true positive rate (tpr) metrics show that a configuration with a training period (∆ttrain) of 180 days, a lag period (∆tlag ) of 90 days, and a high-risk pass fraction (f rac hr pass) of 10% yields a balance between detection efficiency (∼ 50%) and a reduced false positive rate. It means that the model is able to identify approximately 50% of the actual high-risk events while minimizing the number of times it incorrectly identifies a low-risk event as high-risk. However, further research is required to refine these results, explore parameter optimization strategies, and enhance the model’s adaptability to evolving fraud patterns. Future work will focus on optimizing thresholds, improving model robustness over time, and ensuring effective detection of new fraud schemes. This research improves model performance by optimizing key parameters and enhancing detection accuracy while minimizing false positives
References
[1] Cybersource Article, Online - last access 14 april 2024, https://www.cybersource.com/content/dam/cybersource/2017_Fraud_Benchmark_Report.pdf.
[2] M. A. Ali and A. V. Moorsel, “Designed to be broken: A reverse engineering study of the 3D Secure 2.0 Payment Protocol,” Financial Cryptography and Data Security, vol. 11598, pp. 201–221, 2019, https://doi.org/10.1007/978-3-030-32101-7_13.
[3] C. -L. Tsai, C. -J. Chen and D. -J. Zhuang, “Secure OTP and Biometric Verification Scheme for Mobile Banking,” 2012 Third FTRA International Conference on Mobile, Ubiquitous, and Intelligent Computing, Vancouver, pp. 138-141, 2012, https://doi.org/10.1109/MUSIC.2012.31.
[4] “Data Indonesia”, Online - Last access 29 June 2024, https://dataindonesia.id/digital/detail/daftar-ecommerce-dengan-pengunjung-terbanyak-per-kuartal-i2022.
[5] S. Maes, K. Tuyls, B. Vanschoenwinkel, and B. Manderick,“Credit Card Fraud Detection Using Bayesian and Neural Networks,” Engineering Applications of Artificial Intelligence, vol. 261, 2020, https://www.researchgate.net/profile/Karl-Tuyls/publication/254198382_Machine_Learning_Techniques_for_Fraud_Detection/links/555f695508ae6f4dcc926e88/Machine-Learning-Techniques-for-Fraud-Detection.pdf.
[6] M. Khodabakhshi and M. Fartash, “Fraud detection in banking using knn (k-nearest neighbor) algorithm,” In International conference on research in science and technology, vol. 5, pp. 26–34, 2016.
[7] H. Zhang, Y. Shi, X. Yang, and R. Zhou, “A firefly algorithm modified support vector machine for the credit risk assessment of supply chain finance,” Research in International Business and Finance, vol. 58, 2021, https://doi.org/10.1016/j.ribaf.2021.101482.
[8] Z. Huang, H. Zheng, C. Li, and C. Che, “Application of machine learning-based k-means clustering for financial fraud detection,” Academic Journal of Science and Technology, vol. 10, no. 1, pp. 33-39, 2024, https://doi.org/10.54097/74414c90.
[9] A. Gupta, M. C. Lohani, and M. Manchanda, “Financial fraud detection using naive bayes algorithm in highly im balance data set,” Journal of Discrete Mathematical Sciences and Cryptography, vol. 24, no. 5, pp. 1559-1572, 2021, https://doi.org/10.1080/09720529.2021.1969733.
[10] C. Liu, Y. Chan, S. H. A. Kazmi, and H. Fu, “Financial fraud detection model: Based on random forest,” International journal of economics and finance, vol. 7, no. 7, 2015, https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2625215.
[11] N. F. R. Tubb, and F. Nick, “How Artificial Intelligence and Machine Learning Research Impacts Payment Card Fraud Detection: A Survey and Industry Benchmark,” Engineering Applications of Artificial Intelligence, vol. 76, pp. 130–157, 2018, https://doi.org/10.1016/j.engappai.2018.07.008.
[12] C. Phua et al., “A comprehensive survey of data mining-based fraud detection research,” Artificial Intelligence Review, vol. 34, no. 4, pp. 1-19, 2010, https://doi.org/10.48550/arXiv.1009.6119.
[13] U. Fiore et al., “Deep learning-based approaches for credit card fraud detection,” Expert Systems with Applications, vol. 117, pp. 267-277, 2019.
[14] U. Fiore et al., “Using generative adversarial networks for improving classification effectiveness in credit card fraud detection,” Information Sciences, vol. 479, pp. 448-455, 2019, https://doi.org/10.1016/j.ins.2017.12.030.
[15] P. Singh, K. Singla, P. Piyush and B. Chugh, “Anomaly Detection Classifiers for Detecting Credit Card Fraudulent Transactions,” 2024 Fourth International Conference on Advances in Electrical, Computing, Communication and Sustainable Technologies (ICAECT), pp. 1-6, 2024, https://doi.org/10.1109/ICAECT60202.2024.10469194.
[16] K. Leena Kurien and A. Chikkamannur, “An Ameliorated hybrid model for Fraud Detection based on Tree based algorithms and Benford’s Law,” 2020 Third International Conference on Advances in Electronics, Computers and Communications (ICAECC), pp. 1-6, 2020, https://doi.org/10.1109/ICAECC50550.2020.9339471.
[17] A. A. Almazroi and N. Ayub, “Online Payment Fraud Detection Model Using Machine Learning Techniques,” in IEEE Access, vol. 11, pp. 137188-137203, 2023, https://doi.org/10.1109/ACCESS.2023.3339226.
[18] J. Devlin et al., “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 4171-4186, 2019, https://eva.fing.edu.uy/pluginfile.php/524749/mod_folder/content/0/BERT%20Pre-training%20of%20Deep%20Bidirectional%20Transformers%20for%20Language%20Understanding.pdf.
[19] N. V. Chawla et al., “SMOTE: Synthetic Minority Over-sampling Technique,”, Journal of Artificial Intelligence Research, vol. 16, pp. 321-357, 2020, https://doi.org/10.1613/jair.953.
[20] T. Chen and C. Guestrin, “XGBoost: A Scalable Tree Boosting System,” Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785-794, 2016, https://doi.org/10.1145/2939672.2939785.
[21] V. R. K. Changalreddy and A. Jain, “Evolving Fraud Detection Models with Simulated and Real-World Financial Data,” International Journal of Research and Analytical Reviews (IJRAR), vol. 11, no, 4, pp. 182–202, 2024, https://www.researchgate.net/profile/Vybhav-Reddy-Kammireddy-Changalreddy/publication/388177379_Evolving_Fraud_Detection_Models_with_Simulated_and_Real-World_Financial_Data/links/678d5bf475d4ab477e4fc3f6/Evolving-Fraud-Detection-Models-with-Simulated-and-Real-World-Financial-Data.pdf.
[22] J. Zhou et al., “Evaluating the Quality of Machine Learning Explanations: A Survey on Methods and Metrics.”, Electronics, vol. 10, no. 5, 2021, https://doi.org/10.3390/electronics10050593.
[23] D. Varmedja, M. Karanovic, S. Sladojevic, M. Arsenovic and A. Anderla, “Credit Card Fraud Detection – Machine Learning methods,” 2019 18th International Symposium INFOTEH-JAHORINA (INFOTEH), pp. 1-5, 2019, https://doi.org/10.1109/INFOTEH.2019.8717766.
[24] T. Akilandeswari, D. Kamalesh, H. S. Bontha and D. M. Kumar, “Ensuring Secure Transactions through Cutting edge Machine Learning and Preprocessing Techniques in Credit Card Fraud Detection,” 2024 Third International Conference on Electrical, Electronics, Information and Communication Technologies (ICEEICT), pp. 1-5, 2024, https://doi.org/10.1109/ICEEICT61591.2024.10718456.
[25] S. Falkner, A. Klein, and F. Hutter, “Bohb: Robust and efficient hyperparameter optimization at scale,” arXiv, 2018, https://arxiv.org/pdf/1807.01774.
[26] X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feedforward neural networks,” Proceedings of Machine Learning Research, vol. 9, pp. 249–256, 2010, http://proceedings.mlr.press/v9/glorot10a.html.
[27] E. Bisong, Logistic Regression. In: Building Machine Learning and Deep Learning Models on Google Cloud Platform, Apress Berkeley, CA, 2019, https://doi.org/10.1007/978-1-4842-4470-8.
[28] F. J. J. Joseph, S. Nonsiri, and A. Monsakul, “Keras and TensorFlow: A hands-on experience,” Advanced deep learning for engineers and scientists, pp. 85–111, 2021, https://doi.org/10.1007/978-3-030-66519-7_4.
[29] ”Scikit-Learn site”, Online - Last access 09 July 2024, https://scikit-learn.org.
[30] A. Géron, Hands-On Machine Learning with Scikit-Learn, Keras, and Tensorflow: Concepts, Tools, and Techniques to Build Intelligent Systems, O’REILLY, 2019, https://books.google.co.id/books?id=X5ySEAAAQBAJ&hl=id&source=gbs_navlinks_s.
[31] B. W. G. van Rossum and N. Coghlan, “Pep 8 – style guide for python code,” 2001, https://www.python.org/dev/peps/pep-0008/.
[32] Tim Peters, ”The zen of python,” 2004, https://www.python.org/dev/peps/pep-0020/#the-zen-of-python.
[33] R. B. Junior and J. Batista, Overcoming Imbalanced Class Distribution and Overfitting in Financial Fraud Detection: An Investigation Using A Modified Form of K-Fold Cross Validation Approach to Reach Representativeness, Doctoral dissertation, 2023, https://dspace.library.uvic.ca/handle/1828/15268.
[34] G. N. Ahmad, H. Fatima, S. Ullah, A. Salah Saidi and Imdadullah, “Efficient Medical Diagnosis of Human Heart Diseases Using Machine Learning Techniques With and Without GridSearchCV,” in IEEE Access, vol. 10, pp. 80151-80173, 2022, https://doi.org/10.1109/ACCESS.2022.3165792.
[35] H. I. Fawaz et al., “Deep learning for time series classification: a review,” Data Mining and Knowledge Discovery, vol. 33, pp. 917–963, 2019, https://doi.org/10.1007/s10618-019-00619-1.
[36] T. Hagendorff and K. Meding, “Ethical considerations and statistical analysis of industry involvement in machine learning research,” AI & Society, vol. 38, pp. 35–45, 2023, https://doi.org/10.1007/s00146-021-01284-z.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Abdusy Syarif, Muhammad Haikal Satria, Hanene Gabteni

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with JITEKI agree to the following terms:
- Authors retain copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (CC BY-SA 4.0) that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.
This work is licensed under a Creative Commons Attribution 4.0 International License