Classification of IGF1R ligand compounds for Identification of herbal extracts using extreme gradient boosting

Mohammad Hamim Zajuli Al Faroby; Siti Amiroch; Bernadus Anggo Seno Aji; Avriono Aritonang

Authors

Mohammad Hamim Zajuli Al Faroby Department of Data Science, Faculty Information Technology and Business, Institut Teknologi Telkom Surabaya
Siti Amiroch Department of Mathematics, Faculty of Mathematics and Natural Science, Universitas Islam Darul â€˜Ulum, Lamongan, Indonesia
Bernadus Anggo Seno Aji Department of Information Technology, Faculty of Information Technology and Business, Institut Teknologi Telkom Surabaya, Surabaya, Indonesia
Avriono Aritonang Department of Data Science, Faculty of Information Technology and Business, Institut Teknologi Telkom Surabaya, Surabaya, Indonesia

Keywords:

Molecular Fingerprint, Extreme Gradient Boosting, Herbal Compound, Machine Learning, IGF1R

Abstract

Diabetes Mellitus is a serious disease that requires serious treatment. The cause of this disease is due to malfunctions in insulin and insulin-producing organs. One of the proteins that become insulin signaling receptors is IGF1R, which has an important role in activating and maximizing insulin performance. In this study, we aimed to obtain herbal compounds that can activate the function of the IGF1R protein by utilizing compound data in an open database and modeling it using the ensemble method, namely extreme gradient boosting. We found that this method produces the best classification model than with other algorithms. We predicted 844 data for herbal compounds, but only 15 data met the threshold of 0.6. We got one plant from the fifteen herbal compounds, namely Zostera Marine, which was confirmed to have compounds that bind to IGF1R. These compounds have the highest probability value in the classification model that we formed compared to others.

Author Biographies

Mohammad Hamim Zajuli Al Faroby, Department of Data Science, Faculty Information Technology and Business, Institut Teknologi Telkom Surabaya

IÂ completed my masterâ€™s degree at the Department of Mathematics, Institut Teknologi Sepuluh Nopember in 2020. My thesis is in the field of bioinformatics. Until now, I have been a lecturer in the data science study program at the Telkom Institute of Technology Surabaya; my research is inÂ Bioinformatics, especially in protein data analysis and drug design

Siti Amiroch, Department of Mathematics, Faculty of Mathematics and Natural Science, Universitas Islam Darul â€˜Ulum, Lamongan, Indonesia

She is a doctoral student in the mathematics department of ITS. Her interest is in computer science, and his dissertation topic is in the area of bioinformatics.

Bernadus Anggo Seno Aji, Department of Information Technology, Faculty of Information Technology and Business, Institut Teknologi Telkom Surabaya, Surabaya, Indonesia

His research interest in artificial intelligence and data mining

Avriono Aritonang, Department of Data Science, Faculty of Information Technology and Business, Institut Teknologi Telkom Surabaya, Surabaya, Indonesia

Student in the department of Data Science, ITTS.

References

A. Sapra and P. Bhandari, Diabetes Mellitus. StatPearls Publishing, Treasure Island (FL), 2019.

J. B. Cole and J. C. Florez, â€œGenetics of diabetes mellitus and diabetes complications,â€ Nat. Rev. Nephrol. 2020 167, vol. 16, no. 7, pp. 377â€“390, May 2020, doi: 10.1038/s41581-020-0278-5.

O. O. Oguntibeju, â€œType 2 diabetes mellitus, oxidative stress and inflammation: examining the links,â€ Int. J. Physiol. Pathophysiol. Pharmacol., vol. 11, no. 3, p. 45, 2019, [Online]. Available: /pmc/articles/PMC6628012/.

H. D. McIntyre, P. Catalano, C. Zhang, G. Desoye, E. R. Mathiesen, and P. Damm, â€œGestational diabetes mellitus,â€ Nat. Rev. Dis. Prim. 2019 51, vol. 5, no. 1, pp. 1â€“19, Jul. 2019, doi: 10.1038/s41572-019-0098-8.

E. N. Gonc et al., â€œGenetic IGF1R defects: new cases expand the spectrum of clinical features,â€ J. Endocrinol. Investig. 2020 4312, vol. 43, no. 12, pp. 1739â€“1748, Apr. 2020, doi: 10.1007/S40618-020-01264-Y.

M. Hamim, Z. Al, M. I. Irawan, N. Nyoman, and T. Puspaningsih, â€œPrediction insulin-protein interactions associated based on ontology genes using extreme gradient boosting and centrality method,â€ Kinet. Game Technol. Inf. Syst. Comput. Network, Comput. Electron. Contr, vol. 4, no. 5, pp. 253â€“262, 2020, doi: https://doi.org/10.22219/kinetik.v5i4.107.

Y. Khajebishak, L. Payahoo, M. Alivand, and B. Alipour, â€œPunicic acid: A potential compound of pomegranate seed oil in Type 2 diabetes mellitus management,â€ J. Cell. Physiol., vol. 234, no. 3, pp. 2112â€“2120, Mar. 2019, doi: 10.1002/JCP.27556.

K. A. Carpenter and X. Huang, â€œMachine Learning-based Virtual Screening and Its Applications to Alzheimerâ€™s Drug Discovery: A Review,â€ Curr. Pharm. Des., vol. 24, no. 28, pp. 3347â€“3358, Dec. 2018, doi: 10.2174/1381612824666180607124038.

Y. Peng and M. H. Nagata, â€œAn empirical overview of nonlinearity and overfitting in machine learning using COVID-19 data,â€ Chaos, Solitons & Fractals, vol. 139, p. 110055, Oct. 2020, doi: 10.1016/J.CHAOS.2020.110055.

Y. Zhou et al., â€œQuantitative Structure-Activity Relationship (QSAR) Model for the Severity Prediction of Drug-Induced Rhabdomyolysis by Using Random Forest,â€ Chem. Res. Toxicol., vol. 34, no. 2, pp. 514â€“521, Feb. 2021, doi: 10.1021/ACS.CHEMRESTOX.0C00347/SUPPL_FILE/TX0C00347_SI_001.ZIP.

A. Capecchi, D. Probst, and J. L. Reymond, â€œOne molecular fingerprint to rule them all: Drugs, biomolecules, and the metabolome,â€ J. Cheminform., vol. 12, no. 1, pp. 1â€“15, Jun. 2020, doi: 10.1186/S13321-020-00445-4/FIGURES/8.

M. M. Mysinger, M. Carchia, J. J. Irwin, and B. K. Shoichet, â€œDirectory of useful decoys, enhanced (DUD-E): Better ligands and decoys for better benchmarking,â€ J. Med. Chem., vol. 55, no. 14, pp. 6582â€“6594, 2012, doi: 10.1021/jm300687e.

S. Kim et al., â€œPubChem in 2021: new data content and improved web interfaces,â€ Nucleic Acids Res., vol. 49, no. D1, pp. D1388â€“D1395, Jan. 2021, doi: 10.1093/NAR/GKAA971.

M. Bagherian, E. Sabeti, K. Wang, M. A. Sartor, Z. Nikolovska-Coleska, and K. Najarian, â€œMachine learning approaches and databases for prediction of drugâ€“target interaction: a survey paper,â€ Brief. Bioinform., vol. 22, no. 1, pp. 247â€“269, Jan. 2021, doi: 10.1093/BIB/BBZ157.

Y. Y. S. Rahayu, T. Araki, and D. Rosleine, â€œFactors affecting the use of herbal medicines in the universal health coverage system in Indonesia,â€ J. Ethnopharmacol., vol. 260, p. 112974, Oct. 2020, doi: 10.1016/J.JEP.2020.112974.

P. I. Koukos, M. RÃ©au, and A. M. J. J. Bonvin, â€œShape-Restrained Modeling of Protein-Small-Molecule Complexes with High Ambiguity Driven DOCKing,â€ J. Chem. Inf. Model., vol. 61, no. 9, pp. 4807â€“4818, Sep. 2021, doi: 10.1021/ACS.JCIM.1C00796/SUPPL_FILE/CI1C00796_SI_002.XLSX.

N. R. Das, S. P. Mishra, and P. G. R. Achary, â€œEvaluation of molecular structure based descriptors for the prediction of pEC50(M) for the selective adenosine A2A Receptor,â€ J. Mol. Struct., vol. 1232, p. 130080, May 2021, doi: 10.1016/J.MOLSTRUC.2021.130080.

H. Kuswanto, R. Y. Nurhidayah, and H. Ohwada, â€œComparison of Feature Selection Methods to Classify Inhibitors in DUD-E Database,â€ in Procedia Computer Science, Jan. 2018, vol. 144, pp. 194â€“202, doi: 10.1016/j.procs.2018.10.519.

A. Salazar, L. Vergara, and G. Safont, â€œGenerative Adversarial Networks and Markov Random Fields for oversampling very small training sets,â€ Expert Syst. Appl., vol. 163, p. 113819, Jan. 2021, doi: 10.1016/J.ESWA.2020.113819.

A. Fitriawan, I. Wasito, A. F. Syafiandini, M. Amien, and A. Yanuar, â€œDeep belief networks using hybrid fingerprint feature for virtual screening of drug design,â€ in 2016 International Conference on Advanced Computer Science and Information Systems, ICACSIS 2016, Mar. 2017, pp. 417â€“420, doi: 10.1109/ICACSIS.2016.7872737.

A. Capecchi, M. Awale, D. Probst, and J. Reymond, â€œPubChem and ChEMBL beyond Lipinski,â€ Mol. Inform., vol. 38, no. 5, p. 1900016, May 2019, doi: 10.1002/minf.201900016.

K. DÃ¼hrkop et al., â€œSIRIUS 4: a rapid tool for turning tandem mass spectra into metabolite structure information,â€ Nat. Methods, vol. 16, no. 4, pp. 299â€“302, Apr. 2019, doi: 10.1038/S41592-019-0344-8.

S. Kim, P. A. Thiessen, E. E. Bolton, and S. H. Bryant, â€œPUG-SOAP and PUG-REST: Web services for programmatic access to chemical information in PubChem,â€ Nucleic Acids Res., vol. 43, no. W1, pp. W605â€“W611, 2015, doi: 10.1093/NAR/GKV396.

T. Chen and C. Guestrin, â€œXGBoost: A scalable tree boosting system,â€ in Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016, vol. 13-17-Augu, pp. 785â€“794, doi: 10.1145/2939672.2939785.

M. Rahman, Y. Cao, X. Sun, B. Li, and Y. Hao, â€œDeep pre-trained networks as a feature extractor with XGBoost to detect tuberculosis from chest X-ray,â€ Comput. Electr. Eng., vol. 93, p. 107252, Jul. 2021, doi: 10.1016/J.COMPELECENG.2021.107252.

M. R. Mohammadi et al., â€œModeling hydrogen solubility in hydrocarbons using extreme gradient boosting and equations of state,â€ Sci. Reports 2021 111, vol. 11, no. 1, pp. 1â€“20, Sep. 2021, doi: 10.1038/s41598-021-97131-8.

R. R. Syahdi, J. T. Iqbal, A. Munim, and A. Yanuar, â€œHerbalDB 2.0: Optimization of construction of three-dimensional chemical compound structures to update Indonesian medicinal plant database,â€ Pharmacogn. J., vol. 11, no. 6, pp. 1189â€“1194, Jan. 2019, doi: 10.5530/PJ.2019.11.184.

S. Bagui and K. Li, â€œResampling imbalanced data for network intrusion detection datasets,â€ J. Big Data, vol. 8, no. 1, pp. 1â€“41, Dec. 2021, doi: 10.1186/S40537-020-00390-X/TABLES/49.

R. Singh et al., â€œClassification of beta-site amyloid precursor protein cleaving enzyme 1 inhibitors by using machine learning methods,â€ Chem. Biol. Drug Des., vol. 98, no. 6, pp. 1079â€“1097, Dec. 2021, doi: 10.1111/CBDD.13965.

R. CouronnÃ©, P. Probst, and A. L. Boulesteix, â€œRandom forest versus logistic regression: A large-scale benchmark experiment,â€ BMC Bioinformatics, vol. 19, no. 1, 2018, doi: 10.1186/s12859-018-2264-5.

N. K. Hepler, A. Bowman, R. E. Carey, and D. J. Cosgrove, â€œExpansin gene loss is a common occurrence during adaptation to an aquatic environment,â€ Plant J., vol. 101, no. 3, pp. 666â€“680, Feb. 2020, doi: 10.1111/TPJ.14572.

Classification of IGF1R ligand compounds for Identification of herbal extracts using extreme gradient boosting

Authors

Keywords:

Abstract

Author Biographies

Mohammad Hamim Zajuli Al Faroby, Department of Data Science, Faculty Information Technology and Business, Institut Teknologi Telkom Surabaya

Siti Amiroch, Department of Mathematics, Faculty of Mathematics and Natural Science, Universitas Islam Darul â€˜Ulum, Lamongan, Indonesia

Bernadus Anggo Seno Aji, Department of Information Technology, Faculty of Information Technology and Business, Institut Teknologi Telkom Surabaya, Surabaya, Indonesia

Avriono Aritonang, Department of Data Science, Faculty of Information Technology and Business, Institut Teknologi Telkom Surabaya, Surabaya, Indonesia

References

Downloads

Published

Issue

Section

License

quicklinks

Information

Current Issue

template

tools

crossref

Developed By