Towards transparent machine learning models using feature sensitivity algorithm

Authors

  • Ali A. Abaker Department of Accounting Information Systems, Faculty of Computer Science, AL Neelain University, Khartoum
  • Fakhreldeen A. Saeed Department of Software Engineering, Faculty of Computer Science, AL Neelain University, Khartoum

Abstract

Despite advances in health care, diabetic ketoacidosis (DKA) remains a potentially serious risk for diabetes. Directing diabetes patients to the appropriate unit of care is very critical for both lives and healthcare resources. Missing data occurs in almost all machine learning models, especially in production. Missing data can reduce the predictive power and produce biased estimates of models. Estimating a missing value around a 50 percent probability may lead to a completely different decision. The objective of this paper was to introduce a feature sensitivity score using the proposed feature sensitivity algorithm. The data were electronic health records contained 644 records and 28 attributes. We designed a model using a random forest classifier that predicts the likelihood of a developing patient DKA at the time of admission. The model achieved an accuracy of 80 percent using five attributes; this new model has fewer features than any model mentioned in the literature review. Also, Feature sensitivity score (FSS) was introduced, which identifies within feature sensitivity; the proposed algorithm enables physicians to make transparent, and accurate decisions at the time of admission. This method can be applied to different diseases and datasets.

References

A. Usman, S. A. Syed Sulaiman, A. H. Khan, and A. S. Adnan, "Profiles of diabetic ketoacidosis in multiethnic diabetic population of Malaysia," Trop. J. Pharm. Res., vol. 14, no. 1, pp. 179–185, 2015, doi: 10.4314/tjpr.v14i1.25.

V. R. Balpande and R. D. Wajgi, "Prediction and severity estimation of diabetes using data mining technique," IEEE Int. Conf. Innov. Mech. Ind. Appl. ICIMIA 2017 - Proc., no. Icimia, pp. 576–580, 2017, doi: 10.1109/ICIMIA.2017.7975526.

L. M. Duca, B. Wang, M. Rewers, and A. Rewers, "Diabetic ketoacidosis at diagnosis of type 1 diabetes predicts poor long-term glycemic control," Diabetes Care, vol. 40, no. 9, pp. 1249–1255, 2017, doi: 10.2337/dc17-0558.

P. Vellanki and G. E. Umpierrez, "Increasing hospitalizations for DKA: A need for prevention programs," Diabetes Care, vol. 41, no. 9, pp. 1839–1841, 2018, doi: 10.2337/dci18-0004.

M. A. Harris, D. V. Wagner, M. Heywood, D. Hoehn, H. Bahia, and K. Spiro, "Youth repeatedly hospitalized for dka: Proof of concept for novel interventions in children's healthcare (NICH)," Diabetes Care, vol. 37, no. 6, pp. 2013–2015, 2014, doi: 10.2337/dc13-2232.

K. Alexiadou and J. Doupis, "Management of diabetic foot ulcers," Diabetes Ther., vol. 3, no. 1, pp. 1–15, 2012, doi: 10.1007/s13300-012-0004-9.

Y. Wang et al., "Concurrent Diabetic Ketoacidosis in Hypertriglyceridemia-Induced Pancreatitis: How Does It Affect the Clinical Course and Severity Scores?," Pancreas, vol. 46, no. 10, pp. 1336–1340, 2017, doi: 10.1097/MPA.0000000000000937.

G. D. Betrie, R. Sadiq, S. Tesfamariam, and K. A. Morin, “Zur Problematik Von Unvollständigen Und Fehlenden Beschaffenheitsdaten in Datenbanken Von Bergbaustandorten: Vergleich Von Drei Berechnungsmethoden,†Mine Water Environ., vol. 35, no. 1, pp. 3–9, 2016, doi: 10.1007/s10230-014-0322-4.

G. Chhabra, V. Vashisht, and J. Ranjan, "A classifier ensemble machine learning approach to improve efficiency for missing value imputation," 2018 Int. Conf. Comput. Power Commun. Technol. GUCON 2018, pp. 23–27, 2019, doi: 10.1109/GUCON.2018.8674904.

H. He, Y. Cao, Y. Cao, and J. Wen, "Ensemble learning for wind profile prediction with missing values," Neural Comput. Appl., vol. 22, no. 2, pp. 287–294, 2013, doi: 10.1007/s00521-011-0708-1.

Y. Liu and V. Gopalakrishnan, "An overview and evaluation of recent machine learning imputation methods using cardiac imaging data," Data, vol. 2, no. 1, 2017, doi: 10.3390/data2010008.

P. Liu, E. El-Darzi, L. Lei, C. Vasilakis, P. Chountas, and W. Huang, "An analysis of missing data treatment methods and their application to health care dataset," Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 3584 LNAI, pp. 583–590, 2005, doi: 10.1007/11527503_69.

Doreswamy, I. Gad, and B. R. Manjunatha, "Performance evaluation of predictive models for missing data imputation in weather data," 2017 Int. Conf. Adv. Comput. Commun. Informatics, ICACCI 2017, vol. 2017–Janua, pp. 1327–1334, 2017, doi: 10.1109/ICACCI.2017.8126025.

A. Kota Gopalakrishna, T. Özcelebi, A. Liotta, and J. J. Lukkien, "Treatment of missing data in intelligent lighting applications," Proc. - IEEE 9th Int. Conf. Ubiquitous Intell. Comput. IEEE 9th Int. Conf. Auton. Trust. Comput. UIC-ATC 2012, pp. 1–8, 2012, doi: 10.1109/UIC-ATC.2012.135.

P. Brakel and B. Schrauwen, "Energy-based temporal neural networks for imputing missing values," Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 7664 LNCS, no. PART 2, pp. 575–582, 2012, doi: 10.1007/978-3-642-34481-7_70.

I. Fortes, L. Mora-López, R. Morales, and F. Triguero, "Inductive learning models with missing values," Math. Comput. Model., vol. 44, no. 9–10, pp. 790–806, 2006, doi: 10.1016/j.mcm.2006.02.013.

K. Pelckmans, J. De Brabanter, J. A. K. Suykens, and B. De Moor, “Handling missing values in support vector machine classifiers,†Neural Networks, vol. 18, no. 5–6, pp. 684–692, 2005, doi: 10.1016/j.neunet.2005.06.025.

P. S. Rajawat, D. K. Gupta, S. S. Rathore, and A. Singh, "Predictive Analysis of Medical Data using a Hybrid Machine Learning Technique," ICSCCC 2018 - 1st Int. Conf. Secur. Cyber Comput. Commun., pp. 228–233, 2018, doi: 10.1109/ICSCCC.2018.8703302.

S. P. Efstathiou et al., "A mortality prediction model in diabetic ketoacidosis," Clin. Endocrinol. (Oxf)., vol. 57, no. 5, pp. 595–601, 2002, doi: 10.1046/j.1365-2265.2002.01636.x.

A. Deeb et al., "Implementation of a Diabetes Educator Care Model to Reduce Paediatric Admission for Diabetic Ketoacidosis," J. Diabetes Res., vol. 2016, 2016, doi: 10.1155/2016/3917806.

S. Suwarto, B. Sutrisna, S. Waspadji, and H. T. Pohan, "Predictors of five days mortality in diabetic ketoacidosis patients: a prospective cohort study," Acta Med. Indones., vol. 46, no. 1, pp. 18–23, 2014.

N. N. Siregar, P. Soewondo, I. Subekti, and M. Muhadi, "Seventy-two hour mortality prediction model in patients with diabetic ketoacidosis: A retrospective cohort study," J. ASEAN Fed. Endocr. Soc., vol. 33, no. 2, pp. 124–129, 2018, doi: 10.15605/jafes.033.02.03.

A. Razmjoo, P. Xanthopoulos, and Q. P. Zheng, "Online feature importance ranking based on sensitivity analysis," Expert Syst. Appl., vol. 85, pp. 397–406, 2017, doi: 10.1016/j.eswa.2017.05.016.

X. Deng, Y. Li, J. Weng, and J. Zhang, "Feature selection for text classification: A review," Multimed. Tools Appl., vol. 78, no. 3, pp. 3797–3816, 2019, doi: 10.1007/s11042-018-6083-5.

D. Panda, R. Ray, A. A. Abdullah, and S. R. Dash, "Predictive Systems: Role of Feature Selection in Prediction of Heart Disease," J. Phys. Conf. Ser., vol. 1372, no. 1, 2019, doi: 10.1088/1742-6596/1372/1/012074.

A. Zien, N. Krämer, S. Sonnenburg, and G. Rätsch, "The feature importance ranking measure," Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 5782 LNAI, no. PART 2, pp. 694–709, 2009, doi: 10.1007/978-3-642-04174-7_45.

J. S. Markowitz, "Mortality and Its Risk Factors Among Professional Athletes. A Comparison Between Former NBA and NFL Players," pp. 39–49, 2018, doi: 10.1007/978-3-319-77203-5.

Z. Masetic and A. Subasi, "Congestive heart failure detection using random forest classifier," Comput. Methods Programs Biomed., vol. 130, pp. 54–64, 2016, doi: 10.1016/j.cmpb.2016.03.020.

M. Monirul Kabir, M. Monirul Islam, and K. Murase, "A new wrapper feature selection approach using neural network," Neurocomputing, vol. 73, no. 16–18, pp. 3273–3283, 2010, doi: 10.1016/j.neucom.2010.04.003.

F. Baumann, F. Li, A. Ehlers, and B. Rosenhahn, "Thresholding a Random Forest classifier," Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 8888, pp. 95–106, 2014, doi: 10.1007/978-3-319-14364-4_10.

Downloads

Published

2020-01-01