Missing Data Imputation using K-Nearest Neighbour for Software Project Effort Prediction

Sri Handayaningsih; Ardiansyah Ardiansyah

Authors

Sri Handayaningsih Universitas Ahmad Dahlan
Ardiansyah Ardiansyah Universitas Ahmad Dahlan

Keywords:

missing data, KNN, analogy, multiple linear regression, software effort prediction

Abstract

The accurate of software development effort prediction plays an important role to estimate how much effort should be prepared during the works of a software project so that it can be completed on time and budget. Achieving good prediction accuracy is rely on the quality of data set. Unfortunately, missing data is one of big problem regards to the software effort data set, beside imbalance, noisy and irrelevant problem. Low quality of data set would decrease the performance of prediction model. This study aims to investigating the accuracy of software effort prediction with missing data set by using KNN missing data imputation and List Wise Deletion (LWD) techniques. It was continued by applying stepwise regression with backward elimination for feature selection and implementing two effort prediction methods of Multiple Linear Regression (MLR) and Analogy. The result shows that missing data imputation using KNN and listwise deletion with multiple linear regression approach outperforms the Analogy approach significantly (p>0.05).

References

J. Huang et al., â€œCross-validation based K nearest neighbor imputation for software quality datasets: An empirical study,â€ J. Syst. Softw., vol. 132, pp. 226â€“252, Oct. 2017.

I. Abnane and A. Idri, â€œEvaluating Fuzzy Analogy on incomplete software projects data,â€ in 2016 IEEE Symposium Series on Computational Intelligence (SSCI), 2016, pp. 1â€“8.

I. Myrtveit, E. Stensrud, and U. H. Olsson, â€œAnalyzing data sets with missing data: an empirical evaluation of imputation methods and likelihood-based methods,â€ IEEE Trans. Softw. Eng., vol. 27, no. 11, pp. 999â€“1013, Nov. 2001.

J. Han, M. Kamber, and J. Pei, Data Mining: Concepts and Techniques, 3rd ed. Morgan Kaufman, 2012.

X.-Y. Jing, F. Qi, F. Wu, and B. Xu, â€œMissing data imputation based on low-rank recovery and semi-supervised regression for software effort estimation,â€ in Proceedings of the 38th International Conference on Software Engineering - ICSE â€™16, 2016, pp. 607â€“618.

A. Petrozziello and I. Jordanov, â€œColumn-wise Guided Data Imputation,â€ in Procedia Computer Science, 2017, vol. 108, pp. 2282â€“2286.

W. Zhang, Y. Yang, and Q. Wang, â€œUsing Bayesian regression and EM algorithm with missing handling for software effort prediction,â€ Inf. Softw. Technol., vol. 58, pp. 58â€“70, 2015.

A. Idri, I. Abnane, and A. Abran, â€œMissing data techniques in analogy-based software development effort estimation,â€ J. Syst. Softw., vol. 117, pp. 595â€“611, 2016.

N.-H. Chiu and S.-J. Huang, â€œThe adjusted analogy-based software effort estimation based on similarity distances,â€ J. Syst. Softw., vol. 80, no. 4, pp. 628â€“640, Apr. 2007.

M. Shepperd and C. Schofield, â€œEstimating software project effort using analogies,â€ IEEE Trans. Softw. Eng., vol. 23, no. 11, pp. 736â€“743, Nov. 1997.

O. Fedotova, L. Teixeira, and A. H. Alvelos, â€œSoftware effort estimation with multiple linear regression: Review and practical application,â€ J. Inf. Sci. Eng., vol. 29, no. 5, pp. 925â€“945, 2013.

A. Idri, F. A. Amazal, and A. Abran, â€œAnalogy-based software development effort estimation: A systematic mapping and review,â€ Inf. Softw. Technol., vol. 58, pp. 206â€“230, 2015.

E. Khatibi and V. K. Bardsiri, â€œAn Improved Algorithmic Method for Software Development Effort Estimation,â€ J. Adv. Comput. Res., vol. 9, no. 1, pp. 41â€“49, 2018.

M. Azzeh, A. B. Nassif, and S. Banitaan, â€œA better case adaptation method for case-based effort estimation using multi-objective optimization,â€ Proc. - 2014 13th Int. Conf. Mach. Learn. Appl. ICMLA 2014, no. 3, pp. 409â€“414, 2014.

B. Kitchenham, S. L. Pfleeger, B. McColl, and S. Eagan, â€œAn empirical study of maintenance and development estimation accuracy,â€ J. Syst. Softw., vol. 64, no. 1, pp. 57â€“77, 2002.

E. Mendes, S. Counsell, and N. Mosley, â€œMeasurement and Effort Prediction for Web Applications,â€ in Web Engineering, 2001, pp. 295â€“310.

B. Kitchenham and E. Mendes, â€œWhy comparative effort prediction studies may be invalid,â€ in Proceedings of the 5th International Conference on Predictor Models in Software Engineering - PROMISE â€™09, 2009, p. 1.

Missing Data Imputation using K-Nearest Neighbour for Software Project Effort Prediction

Authors

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

quicklinks

Information

Current Issue

template

tools

crossref

Developed By