Impact of Feature Selection on XGBoost Model with VGG16 Feature Extraction for Carbon Stock Estimation Using GEE and Drone Imagery
DOI:
https://doi.org/10.26555/jiteki.v10i4.30484Keywords:
XGBoost, VGG16, Feature Importance, Information Gain, RFEAbstract
Carbon stocks are critical to climate change mitigation by capturing atmospheric carbon and storing it in biomass. However, carbon stock estimation faces challenges due to data complexity and the need for efficient analytical methods. This study introduces a carbon stock estimation method that integrates the XGBoost algorithm with VGG16 feature extraction and feature selection techniques to analyze GEE and Drone image datasets. The model is evaluated through four scenarios: without feature selection, using Information Gain, using Feature Importance, and using Recursive Feature Elimination. These scenarios aim to compare feature selection methods to identify the best one for processing complex environmental data. The experimental results show that RFE significantly outperforms other methods, achieving an average RMSE of 6651.62, MAE of 2297.57, and R² of 0.7673. These findings underscore the importance of feature selection in optimizing model performance, particularly for high-dimensional environmental datasets. RFE shows superior accuracy and efficiency by retaining the most relevant features but requires more computational resources. For applications that prioritize time and resource efficiency, Information Gain or Feature Importance can serve as a practical alternative with slightly reduced accuracy. This research highlights the value of integrating feature selection techniques into machine learning models for environmental data analysis. Future research could explore alternative feature extraction methods, combine RFE with other approaches, or apply advanced techniques such as Boruta or genetic algorithms. These efforts will further refine carbon stock estimation models, paving the way for broader applications in environmental data analysis.
References
[1] W. H. Zeng, S. D. Zhu, Y. H. Luo, W. Shi, Y. Q. Wang, and K. F. Cao, “Aboveground biomass stocks of species-rich natural forests in southern China are influenced by stand structural attributes, species richness and precipitation,” Plant Divers, vol. 46, no. 4, pp. 530–536, Jul. 2024, https://doi.org/10.1016/j.pld.2024.04.012.
[2] A. Raihan, R. A. Begum, M. N. M. Said, and J. J. Pereira, “Assessment of carbon stock in forest biomass and emission reduction potential in Malaysia,” Forests, vol. 12, no. 10, Oct. 2021, https://doi.org/10.3390/f12101294.
[3] J. H. Lee, J. G. Lee, S. T. Jeong, H. S. Gwon, P. J. Kim, and G. W. Kim, “Straw recycling in rice paddy: Trade-off between greenhouse gas emission and soil carbon stock increase,” Soil Tillage Res, vol. 199, p. 104598, May 2020, https://doi.org/10.1016/J.STILL.2020.104598.
[4] A. A. Dar and N. Parthasarathy, “Patterns and drivers of tree carbon stocks in Kashmir Himalayan forests: implications for climate change mitigation,” Ecol Process, vol. 11, no. 1, p. 58, 2022, https://doi.org/10.1186/s13717-022-00402-z.
[5] D. D. T. L. Dayathilake, E. Lokupitiya, and V. P. I. S. Wijeratne, “Estimation of Soil Carbon Stocks of Urban Freshwater Wetlands in the Colombo Ramsar Wetland City and their Potential Role in Climate Change Mitigation,” Wetlands, vol. 41, no. 2, Feb. 2021, https://doi.org/10.1007/s13157-021-01424-7.
[6] D. Rajasugunasekar, A. K. Patel, K. B. Devi, A. Singh, P. Selvam, and A. Chandra, “An Integrative Review for the Role of Forests in Combating Climate Change and Promoting Sustainable Development,” International Journal of Environment and Climate Change, vol. 13, no. 11, pp. 4331–4341, 2023, http://classical.goforpromo.com/id/eprint/4861/.
[7] Z. Zhang, J. He, M. Huang, and W. Zhou, “Is Regulation Protection? Forest Logging Quota Impact on Forest Carbon Sinks in China,” Sustainability (Switzerland), vol. 15, no. 18, Sep. 2023, https://doi.org/10.3390/su151813740.
[8] L. Nel et al., “InVEST Soil Carbon Stock Modelling of Agricultural Landscapes as an Ecosystem Service Indicator,” Sustainability (Switzerland), vol. 14, no. 16, Aug. 2022, https://doi.org/10.3390/su14169808.
[9] S. R. Byrapu Reddy, P. Kanagala, P. Ravichandran, D. R. Pulimamidi, P. V. Sivarambabu, and N. S. A. Polireddi, “Effective fraud detection in e-commerce: Leveraging machine learning and big data analytics,” Measurement: Sensors, vol. 33, p. 101138, Jun. 2024, https://doi.org/10.1016/J.MEASEN.2024.101138.
[10] S. Uniyal, S. Purohit, K. Chaurasia, S. S. Rao, and E. Amminedu, “Quantification of carbon sequestration by urban forest using Landsat 8 OLI and machine learning algorithms in Jodhpur, India,” Urban For Urban Green, vol. 67, p. 127445, Jan. 2022, https://doi.org/10.1016/J.UFUG.2021.127445.
[11] J. Lei et al., “Prediction of soil organic carbon stock combining Sentinel-1 and Sentinel-2 images in the Zoige Plateau, the northeastern Qinghai-Tibet Plateau,” Ecol Process, vol. 13, no. 1, Dec. 2024, https://doi.org/10.1186/s13717-024-00515-7.
[12] M. Emadi, R. Taghizadeh-Mehrjardi, A. Cherati, M. Danesh, A. Mosavi, and T. Scholten, “Predicting and mapping of soil organic carbon using machine learning algorithms in Northern Iran,” Remote Sens (Basel), vol. 12, no. 14, Jul. 2020, https://doi.org/10.3390/rs12142234.
[13] Z. Ashani, “Comparative Analysis of Deepfake Image Detection Method Using VGG16, VGG19 and ResNet50,” Journal of Advanced Research in Applied Sciences and Engineering Technology, vol. 47, pp. 16–28, Dec. 2024, https://doi.org/10.37934/araset.47.1.1628.
[14] S. Kumar and H. Kumar, “Classification of COVID-19 X-ray images using transfer learning with visual geometrical groups and novel sequential convolutional neural networks,” MethodsX, vol. 11, p. 102295, Dec. 2023, https://doi.org/10.1016/J.MEX.2023.102295.
[15] Y. Xia, S. Jiang, L. Meng, and X. Ju, “XGBoost-B-GHM: An Ensemble Model with Feature Selection and GHM Loss Function Optimization for Credit Scoring,” Systems, vol. 12, p. 254, Dec. 2024, https://doi.org/10.3390/systems12070254.
[16] E. Sahin, “Assessing the predictive capability of ensemble tree methods for landslide susceptibility mapping using XGBoost, gradient boosting machine, and random forest,” SN Appl Sci, vol. 2, Dec. 2020, https://doi.org/10.1007/s42452-020-3060-1.
[17] Y. Cai, J. Feng, Y. Wang, Y. Ding, Y. Hu, and H. Fang, “The Optuna–LightGBM–XGBoost Model: A Novel Approach for Estimating Carbon Emissions Based on the Electricity–Carbon Nexus,” Applied Sciences, vol. 14, p. 4632, Dec. 2024, https://doi.org/10.3390/app14114632.
[18] N. Pudjihartono, T. Fadason, A. Kempa-Liehr, and J. O’Sullivan, “A Review of Feature Selection Methods for Machine Learning-Based Disease Risk Prediction,” Frontiers in Bioinformatics, vol. 2, p. 927312, Dec. 2022, https://doi.org/10.3389/fbinf.2022.927312.
[19] P. V Agrawal and D. D. Kshirsagar, “Information Gain-based Feature Selection Method in Malware Detection for MalDroid2020,” in 2022 International Conference on Smart Technologies and Systems for Next Generation Computing (ICSTSN), pp. 1–5, 2022, https://doi.org/10.1109/ICSTSN53084.2022.9761336.
[20] K. Qu, J. Xu, Q. Hou, K. Qu, and Y. Sun, “Feature selection using Information Gain and decision information in neighborhood decision system,” Appl Soft Comput, vol. 136, p. 110100, Mar. 2023, https://doi.org/10.1016/J.ASOC.2023.110100.
[21] H.-T. Wen, H.-Y. Wu, and K.-C. Liao, “Using XGBoost Regression to Analyze the Importance of Input Features Applied to an Artificial Intelligence Model for the Biomass Gasification System,” Inventions, vol. 7, p. 126, Dec. 2022, https://doi.org/10.3390/inventions7040126.
[22] A. Velastegui-Montoya, N. Montalván-Burbano, P. Carrión-Mero, H. Rivera-Torres, L. Sadeck, and M. Adami, “Google Earth Engine: A Global Analysis and Future Trends,” Multidisciplinary Digital Publishing Institute (MDPI), vol.15, no. 14, p. 3675, 2023, https://doi.org/10.3390/rs15143675.
[23] M. Amani et al., “Google Earth Engine Cloud Computing Platform for Remote Sensing Big Data Applications: A Comprehensive Review,” IEEE J Sel Top Appl Earth Obs Remote Sens, vol. 13, pp. 5326–5350, 2020, https://doi.org/10.1109/JSTARS.2020.3021052.
[24] C. Ning, H. Gan, M. Shen, and T. Zhang, “Learning-based padding: From connectivity on data borders to data padding,” Eng Appl Artif Intell, vol. 121, p. 106048, May 2023, https://doi.org/10.1016/J.ENGAPPAI.2023.106048.
[25] C. Yu, P.-H. Hung, J.-H. Hong, and H.-Y. Chiang, “Efficient Max Pooling Architecture with Zero-Padding for Convolutional Neural Networks,” in IEEE 12th Global Conference on Consumer Electronics (GCCE), pp. 747–748, 2023, https://doi.org/10.1109/GCCE59613.2023.10315268.
[26] Y.-H. Huang, M. Proesmans, and L. Van Gool, “Padding Investigations for CNNs in Scene Parsing Tasks,” in 2023 18th International Conference on Machine Vision and Applications (MVA), pp. 1–5, 2023, https://doi.org/10.23919/MVA57639.2023.10216084.
[27] F. Alrasheedi, X. Zhong, and P.-C. Huang, “Padding Module: Learning the Padding in Deep Neural Networks,” IEEE Access, vol. 11, pp. 7348–7357, 2023, https://doi.org/10.1109/ACCESS.2023.3238315.
[28] S. Ullah and S.-H. Song, “Design of compensation algorithms for zero padding and its application to a patch based deep neural network,” PeerJ Comput Sci, vol. 10, p. e2287, Aug. 2024, https://doi.org/10.7717/peerj-cs.2287.
[29] H. Hassan et al., “Review and classification of AI-enabled COVID-19 CT imaging models based on computer vision tasks,” Comput Biol Med, vol. 141, p. 105123, Feb. 2022, https://doi.org/10.1016/J.COMPBIOMED.2021.105123.
[30] K. Alomar, H. I. Aysel, and X. Cai, “Data Augmentation in Classification and Segmentation: A Survey and New Strategies,” J Imaging, vol. 9, no. 2, Feb. 2023, https://doi.org/10.3390/jimaging9020046.
[31] R. Akter and M. I. Hosen, “CNN-based Leaf Image Classification for Bangladeshi Medicinal Plant Recognition,” in 2020 Emerging Technology in Computing, Communication and Electronics (ETCCE), pp. 1–6, 2020, https://doi.org/10.1109/ETCCE51779.2020.9350900.
[32] W. Zeng, “Image data augmentation techniques based on deep learning: A survey,” Mathematical Biosciences and Engineering, vol. 21, pp. 6190–6224, Dec. 2024, https://doi.org/10.3934/mbe.2024272.
[33] A. Moisés, I. Vitoria, J. J. Imas, and C. Zamarreño, “Data Augmentation Techniques for Machine Learning Applied to Optical Spectroscopy Datasets in Agrifood Applications: A Comprehensive Review,” Sensors, vol. 23, p. 8562, Dec. 2023, https://doi.org/10.3390/s23208562.
[34] C. Xu, W. Liu, Y. Zheng, S. Wang, and C.-H. Chang, “An Imperceptible Data Augmentation Based Blackbox Clean-Label Backdoor Attack on Deep Neural Networks,” IEEE Transactions on Circuits and Systems I: Regular Papers, pp. 1–14, Dec. 2023, https://doi.org/10.1109/TCSI.2023.3298802.
[35] P. Thanapol, K. Lavangnananda, P. Bouvry, F. Pinel, and F. Leprevost, “Reducing Overfitting and Improving Generalization in Training Convolutional Neural Network (CNN) under Limited Sample Sizes in Image Recognition,” International Conference on Information Technology (InCIT), pp. 300–305, 2020, https://doi.org/10.1109/InCIT50588.2020.9310787.
[36] G. Singh, K. Guleria, and S. Sharma, “A Transfer Learning-based Pre-trained VGG16 Model for Skin Disease Classification,” in 2023 IEEE 3rd Mysore Sub Section International Conference (MysuruCon), pp. 1–6, 2023, https://doi.org/10.1109/MysuruCon59703.2023.10396942.
[37] W. Bakasa and S. Viriri, “VGG16 Feature Extractor with Extreme Gradient Boost Classifier for Pancreas Cancer Prediction,” J Imaging, vol. 9, no. 7, Jul. 2023, https://doi.org/10.3390/jimaging9070138.
[38] Y. Bouchlaghem, Y. Akhiat, and S. Amjad, “Feature Selection: A Review and Comparative Study,” E3S Web of Conferences, vol. 351, p. 1046, Dec. 2022, https://doi.org/10.1051/e3sconf/202235101046.
[39] B. Akyapı, “Machine learning and feature selection: Applications in economics and climate change,” Environmental Data Science, vol. 2, 2023, https://doi.org/10.1017/eds.2023.36.
[40] S. Seydi, Y. Kanani-Sadat, M. Hasanlou, R. Sahraei, J. Chanussot, and M. Amani, “Comparison of Machine Learning Algorithms for Flood Susceptibility Mapping,” Remote Sens (Basel), vol. 15, p. 192, Dec. 2022, https://doi.org/10.3390/rs15010192.
[41] A. Shahin-Shamsabadi and J. Cappuccitti, “Proteomics and machine learning: Leveraging domain knowledge for feature selection in a skeletal muscle tissue meta-analysis,” Heliyon, vol. 10, p. e40772, Jan. 2024, https://doi.org/10.1016/j.heliyon.2024.e40772.
[42] M. Büyükkeçeci and M. Okur, “A Comprehensive Review of Feature Selection and Feature Selection Stability in Machine Learning,” Gazi University Journal of Science, vol. 36, Dec. 2022, https://doi.org/10.35378/gujs.993763.
[43] O. Salem, F. Liu, Y.-P. Chen, and X. Chen, “Ensemble Fuzzy Feature Selection Based on Relevancy, Redundancy, and Dependency Criteria,” Entropy, vol. 22, p. 757, Dec. 2020, https://doi.org/10.3390/e22070757.
[44] H. Polat, O. Polat, and A. Çetin, “Detecting DDoS Attacks in Software-Defined Networks Through Feature Selection Methods and Machine Learning Models,” Sustainability, vol. 12, no. 3, p/ 1035. 2020, https://doi.org/10.3390/su12031035.
[45] F. G. F. Niquini et al., “Recursive Feature Elimination and Neural Networks Applied to the Forecast of Mass and Metallurgical Recoveries in A Brazilian Phosphate Mine,” Minerals, vol. 13, no. 6, Jun. 2023, https://doi.org/10.3390/min13060748.
[46] F. Jiménez, G. Sánchez, J. Palma, L. Miralles-Pechuán, and J. A. Botía, “Multivariate Feature Ranking With High-Dimensional Data for Classification Tasks,” IEEE Access, vol. 10, pp. 60421–60437, 2022, https://doi.org/10.1109/ACCESS.2022.3180773.
[47] T. Suryakanthi, “Evaluating the Impact of GINI Index and Information Gain on Classification using Decision Tree Classifier Algorithm*,” International Journal of Advanced Computer Science and Applications, vol. 11, Jan. 2020, https://doi.org/10.14569/IJACSA.2020.0110277.
[48] T. Esaki, “Appropriate Evaluation Measurements for Regression Models,” Chem-Bio Informatics Journal, vol. 21, pp. 59–69, 2021, https://doi.org/10.1273/cbij.21.59.
[49] N. Hassan, S. Sheikh Abdul Kadir, M. Husain, B. Satyanarayana, M. Ambak, and M. A.G., “Weight Prediction for Fishes in Setiu Wetland, Terengganu, using Machine Learning Regression Model,” BIO Web Conf, vol. 73, Dec. 2023, https://doi.org/10.1051/bioconf/20237301007.
[50] Y. Fissha, H. Ikeda, H. Toriya, N. Owada, T. Adachi, and Y. Kawamura, “Evaluation and Prediction of Blast-Induced Ground Vibrations: A Gaussian Process Regression (GPR) Approach,” Mining, vol. 3, no. 4, pp. 659–682, 2023, https://doi.org/10.3390/mining3040036.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 I Made Darma Cahya Adyatma, Erwin Budi Setiawan
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with JITEKI agree to the following terms:
- Authors retain copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (CC BY-SA 4.0) that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.
This work is licensed under a Creative Commons Attribution 4.0 International License