Missing Data Imputation using K-Nearest Neighbour for Software Project Effort Prediction

Authors

  • Sri Handayaningsih Universitas Ahmad Dahlan
  • Ardiansyah Ardiansyah Universitas Ahmad Dahlan

Keywords:

missing data, KNN, analogy, multiple linear regression, software effort prediction

Abstract

The accurate of software development effort prediction plays an important role to estimate how much effort should be prepared during the works of a software project so that it can be completed on time and budget. Achieving good prediction accuracy is rely on the quality of data set. Unfortunately, missing data is one of big problem regards to the software effort data set, beside imbalance, noisy and irrelevant problem. Low quality of data set would decrease the performance of prediction model. This study aims to investigating the accuracy of software effort prediction with missing data set by using KNN missing data imputation and List Wise Deletion (LWD) techniques. It was continued by applying stepwise regression with backward elimination for feature selection and implementing two effort prediction methods of Multiple Linear Regression (MLR) and Analogy. The result shows that missing data imputation using KNN and listwise deletion with multiple linear regression approach outperforms the Analogy approach significantly (p>0.05).

References

J. Huang et al., “Cross-validation based K nearest neighbor imputation for software quality datasets: An empirical study,†J. Syst. Softw., vol. 132, pp. 226–252, Oct. 2017.

I. Abnane and A. Idri, “Evaluating Fuzzy Analogy on incomplete software projects data,†in 2016 IEEE Symposium Series on Computational Intelligence (SSCI), 2016, pp. 1–8.

I. Myrtveit, E. Stensrud, and U. H. Olsson, “Analyzing data sets with missing data: an empirical evaluation of imputation methods and likelihood-based methods,†IEEE Trans. Softw. Eng., vol. 27, no. 11, pp. 999–1013, Nov. 2001.

J. Han, M. Kamber, and J. Pei, Data Mining: Concepts and Techniques, 3rd ed. Morgan Kaufman, 2012.

X.-Y. Jing, F. Qi, F. Wu, and B. Xu, “Missing data imputation based on low-rank recovery and semi-supervised regression for software effort estimation,†in Proceedings of the 38th International Conference on Software Engineering - ICSE ’16, 2016, pp. 607–618.

A. Petrozziello and I. Jordanov, “Column-wise Guided Data Imputation,†in Procedia Computer Science, 2017, vol. 108, pp. 2282–2286.

W. Zhang, Y. Yang, and Q. Wang, “Using Bayesian regression and EM algorithm with missing handling for software effort prediction,†Inf. Softw. Technol., vol. 58, pp. 58–70, 2015.

A. Idri, I. Abnane, and A. Abran, “Missing data techniques in analogy-based software development effort estimation,†J. Syst. Softw., vol. 117, pp. 595–611, 2016.

N.-H. Chiu and S.-J. Huang, “The adjusted analogy-based software effort estimation based on similarity distances,†J. Syst. Softw., vol. 80, no. 4, pp. 628–640, Apr. 2007.

M. Shepperd and C. Schofield, “Estimating software project effort using analogies,†IEEE Trans. Softw. Eng., vol. 23, no. 11, pp. 736–743, Nov. 1997.

O. Fedotova, L. Teixeira, and A. H. Alvelos, “Software effort estimation with multiple linear regression: Review and practical application,†J. Inf. Sci. Eng., vol. 29, no. 5, pp. 925–945, 2013.

A. Idri, F. A. Amazal, and A. Abran, “Analogy-based software development effort estimation: A systematic mapping and review,†Inf. Softw. Technol., vol. 58, pp. 206–230, 2015.

E. Khatibi and V. K. Bardsiri, “An Improved Algorithmic Method for Software Development Effort Estimation,†J. Adv. Comput. Res., vol. 9, no. 1, pp. 41–49, 2018.

M. Azzeh, A. B. Nassif, and S. Banitaan, “A better case adaptation method for case-based effort estimation using multi-objective optimization,†Proc. - 2014 13th Int. Conf. Mach. Learn. Appl. ICMLA 2014, no. 3, pp. 409–414, 2014.

B. Kitchenham, S. L. Pfleeger, B. McColl, and S. Eagan, “An empirical study of maintenance and development estimation accuracy,†J. Syst. Softw., vol. 64, no. 1, pp. 57–77, 2002.

E. Mendes, S. Counsell, and N. Mosley, “Measurement and Effort Prediction for Web Applications,†in Web Engineering, 2001, pp. 295–310.

B. Kitchenham and E. Mendes, “Why comparative effort prediction studies may be invalid,†in Proceedings of the 5th International Conference on Predictor Models in Software Engineering - PROMISE ’09, 2009, p. 1.

Downloads

Published

2022-01-05

Issue

Section

Computational Intelligence