Analisis Perbandingan Model Regresi Logistik Dan Probit Dengan K-Fold Cross Validation Dalam Mengidentifikasi Faktor Signifikan Pada Penyakit Diabetes Melitus
DOI:
https://doi.org/10.26555/jim.v10i2.30879Keywords:
Analisis Regresi Logistik ,, Analisis Regresi Probit ,, Algoritma BHHH ,, k-fold cross validation ,Abstract
Statistika adalah cabang matematika yang berkaitan dengan pengumpulan, analisis, interpretasi, presentasi, dan organisasi data. Statistika digunakan dalam berbagai disiplin ilmu untuk membuat keputusan berdasarkan data. Terdapat dua jenis utama statistika, yaitu statistika deskriptif dan statistika inferensial. Salah satu metode statistika inferensial yang biasa digunakan adalah Analisis Regresi Logistik dan Regresi Probit. Regresi Logistik dan Regresi Probit merupakan teknik dalam statistika inferensial yang digunakan untuk menemukan hubungan di antara variabel prediktor dan variabel respons yang bersifat dikotomus (memiliki dua kategori) atau polikotomus (memiliki lebih dari dua kategori). Regresi logistik menggunakan fungsi distribusi kumulatif dari distribusi logistik sedangkan Regresi Probit menggunakan fungsi distribusi kumulatif dari distribusi normal. Tujuan dari penelitian ini adalah mengetahui model terbaik antara model logit dan model probit berdasarkan validasi model menggunakan k-fold cross validation. Data yang digunakan adalah data sekunder tentang prediksi diabetes yang tersedia dalam Kaggle. Berdasarkan hasil yang diterapkan pada data tersebut didapatkan faktor faktor yang berpengaruh signifikan terhadap penyakit diabetes adalah jenis kelamin, usia, riwayat hipertensi, riwayat penyakit jantung, riwayat merokok, BMI, kadar HbA1c, dan kadar gula darah. Hasil perbandingan model didapatkan dari rata-rata akurasi yang sama menggunakan k-fold cross validation untuk model logit dan probit yaitu sebesar 93.7%. Perbandingan ini diperkuat dengan empat kriteria dalam pemilihan model terbaik yaitu AIC, Pseudo-R2, AUC, Logloss keakuratan klasifikasi, dan uji kesesuaian model (Goodness of fit). Secara keseluruhan dapat disimpulkan bahwa model probit lebih baik daripada model logit dalam kasus data tersebut.
ABSTRACT
Statistics is a discipline of mathematics concerned with the collection, analysis, interpretation, presentation, and organization of data. Statistics is used in various disciplines to make decisions based on data. There are two general types of statistics, descriptive statistics and inferential statistics. One of the commonly used inferential statistical methods is Logistic Regression Analysis and Probit Regression. Logistic regression and Probit regression are techniques in inferential statistics used to find the relationship between predictor variables and response variables that are dichotomous (have two categories) or polycotomous (have more than two categories). Logistic regression uses the cumulative distribution function of the logistic distribution while Probit regression uses the cumulative distribution function of the normal distribution. The purpose of this study is to determine the best model between the Logistic Regression model and the Probit Regression model based on model validation using k-fold cross validation. The data used is secondary data on diabetes prediction available in Kaggle. Based on the results applied to the data, the factors that have a significant effect on diabetes are gender, age, history of hypertension, history of heart disease, smoking history, BMI, HbA1c levels, and blood sugar levels. The results of the model comparison showed the same average accuracy using k-fold cross validation for logistic regression and probit regression models, which was 93.7%. This comparison is supported by four criteria in selecting the best model, namely AIC, Pseudo-R2, AUC, Logloss, classification accuracy, and goodness of fit test. Overall, it can be concluded that the Probit Regression model is better than the Logistic Regression model in the case of these data.
References
J. Neter, Applied linear regression models. 1983.
K. Kumari and S. Yadav, “Linear regression analysis study,” J. Pract. Cardiovasc. Sci., vol. 4, no. 1,
p. 33, 2018, doi: 10.4103/jpcs.jpcs_8_18.
D. Rika Widianita, “PREVALENSI KEJADIAN PENYAKIT TIDAK MENULAR (PTM),” ATTAWASSUTH J. Ekon. Islam, vol.VIII, no. I, pp. 1–19, 2023.
A. S. Marbun, “Pelaksanaan Empat Pilar pada Penderita Diabetes Melitus,” J. Abdimas Mutiara,
vol. 3, no. 1, pp. 366–371, 2022.
Direktorat P2PTM, “Penyakit diabetes melitus,” Kemenkes. [Online]. Available:
https://p2ptm.kemkes.go.id/informasi-p2ptm/penyakit-diabetes-melitus S. Hafni, R. Begum Suroyo, J. T. Sibero, Z. Nasution, and M. Wulan, “Faktor-Faktor yang
Mempengaruhi Kejadian Hipertensi Pada Lansia di Puskesmas Pijorkoling Kecamatan
Padangsidimpuan Tenggara Kota Padangsidimpuan,” J. Healthc. Technol. Med., vol. 7, no. 2, pp.
2615–109, 2021.
H. D. Sitanggang, I. T. Ramadhanti, and R. Halim, “Faktor risiko kejadian diare pada anak balita
(12-59 bulan) di Puskesmas ‘X’ Kota Jambi,” Ris. Inf. Kesehat., vol. 11, no. 1, p. 54, 2022, doi:
10.30644/rik.v11i1.624.
Hasna and A. I. Achmad, “Metode Regresi Probit Biner untuk Pemodelan Faktor-Faktor yang
Mempengaruhi Diagnosis Penyakit Jantung,” J. Ris. Stat., pp. 28–34, 2022, doi:
10.29313/jrs.vi.721.
C. Dewanti, V. Ratnasari, and A. T. Rumiati, “Pemodelan Faktor-Faktor yang Memengaruhi Status
Balita Stunting di Provinsi Jawa Timur Menggunakan Regresi Probit Biner,” J. Sains dan Seni ITS,
vol. 8, no. 2, 2020, doi: 10.12962/j23373520.v8i2.48519.
I. N. I. Sari and V. Ratnasari, “Pemodelan Regresi Logistik dan Probit Biner Target Unmet Need di
Provinsi Jawa Barat,” Sains dan Seni, vol. 9, no. 2, 2020.
H. Bai, “Preparing Teacher Education Students to Integrate Mobile Learning into Elementary
Education,” TechTrends, vol. 63, no. 6, pp. 723–733, Nov. 2019, doi: 10.1007/s11528-019-00424z.
F. Giannakas, A. Papasalouros, G. Kambourakis, and S. Gritzalis, “A comprehensive cybersecurity
learning platform for elementary education,” Inf. Secur. J. A Glob. Perspect., vol. 28, no. 3, pp. 81–
106, May 2019, doi: 10.1080/19393555.2019.1657527.
R. M. Vink et al., “Self-reported adverse childhood experiences and quality of life among children
in the two last grades of Dutch elementary education,” Child Abuse Negl., vol. 95, p. 104051, Sep.
2019, doi: 10.1016/j.chiabu.2019.104051.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Aqilla Khairunnisa

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
1. Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.
This work is licensed under a Creative Commons Attribution-ShareAlike 2.0 Generic License.