Missing Data Imputation using K-Nearest Neighbour for Software Project Effort Prediction
Keywords:
missing data, KNN, analogy, multiple linear regression, software effort predictionAbstract
The accurate of software development effort prediction plays an important role to estimate how much effort should be prepared during the works of a software project so that it can be completed on time and budget. Achieving good prediction accuracy is rely on the quality of data set. Unfortunately, missing data is one of big problem regards to the software effort data set, beside imbalance, noisy and irrelevant problem. Low quality of data set would decrease the performance of prediction model. This study aims to investigating the accuracy of software effort prediction with missing data set by using KNN missing data imputation and List Wise Deletion (LWD) techniques. It was continued by applying stepwise regression with backward elimination for feature selection and implementing two effort prediction methods of Multiple Linear Regression (MLR) and Analogy. The result shows that missing data imputation using KNN and listwise deletion with multiple linear regression approach outperforms the Analogy approach significantly (p>0.05).References
J. Huang et al., “Cross-validation based K nearest neighbor imputation for software quality datasets: An empirical study,†J. Syst. Softw., vol. 132, pp. 226–252, Oct. 2017.
I. Abnane and A. Idri, “Evaluating Fuzzy Analogy on incomplete software projects data,†in 2016 IEEE Symposium Series on Computational Intelligence (SSCI), 2016, pp. 1–8.
I. Myrtveit, E. Stensrud, and U. H. Olsson, “Analyzing data sets with missing data: an empirical evaluation of imputation methods and likelihood-based methods,†IEEE Trans. Softw. Eng., vol. 27, no. 11, pp. 999–1013, Nov. 2001.
J. Han, M. Kamber, and J. Pei, Data Mining: Concepts and Techniques, 3rd ed. Morgan Kaufman, 2012.
X.-Y. Jing, F. Qi, F. Wu, and B. Xu, “Missing data imputation based on low-rank recovery and semi-supervised regression for software effort estimation,†in Proceedings of the 38th International Conference on Software Engineering - ICSE ’16, 2016, pp. 607–618.
A. Petrozziello and I. Jordanov, “Column-wise Guided Data Imputation,†in Procedia Computer Science, 2017, vol. 108, pp. 2282–2286.
W. Zhang, Y. Yang, and Q. Wang, “Using Bayesian regression and EM algorithm with missing handling for software effort prediction,†Inf. Softw. Technol., vol. 58, pp. 58–70, 2015.
A. Idri, I. Abnane, and A. Abran, “Missing data techniques in analogy-based software development effort estimation,†J. Syst. Softw., vol. 117, pp. 595–611, 2016.
N.-H. Chiu and S.-J. Huang, “The adjusted analogy-based software effort estimation based on similarity distances,†J. Syst. Softw., vol. 80, no. 4, pp. 628–640, Apr. 2007.
M. Shepperd and C. Schofield, “Estimating software project effort using analogies,†IEEE Trans. Softw. Eng., vol. 23, no. 11, pp. 736–743, Nov. 1997.
O. Fedotova, L. Teixeira, and A. H. Alvelos, “Software effort estimation with multiple linear regression: Review and practical application,†J. Inf. Sci. Eng., vol. 29, no. 5, pp. 925–945, 2013.
A. Idri, F. A. Amazal, and A. Abran, “Analogy-based software development effort estimation: A systematic mapping and review,†Inf. Softw. Technol., vol. 58, pp. 206–230, 2015.
E. Khatibi and V. K. Bardsiri, “An Improved Algorithmic Method for Software Development Effort Estimation,†J. Adv. Comput. Res., vol. 9, no. 1, pp. 41–49, 2018.
M. Azzeh, A. B. Nassif, and S. Banitaan, “A better case adaptation method for case-based effort estimation using multi-objective optimization,†Proc. - 2014 13th Int. Conf. Mach. Learn. Appl. ICMLA 2014, no. 3, pp. 409–414, 2014.
B. Kitchenham, S. L. Pfleeger, B. McColl, and S. Eagan, “An empirical study of maintenance and development estimation accuracy,†J. Syst. Softw., vol. 64, no. 1, pp. 57–77, 2002.
E. Mendes, S. Counsell, and N. Mosley, “Measurement and Effort Prediction for Web Applications,†in Web Engineering, 2001, pp. 295–310.
B. Kitchenham and E. Mendes, “Why comparative effort prediction studies may be invalid,†in Proceedings of the 5th International Conference on Predictor Models in Software Engineering - PROMISE ’09, 2009, p. 1.
Downloads
Published
Issue
Section
License
Authors who publish with Jurnal Informatika (JIFO) agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (CC BY-SA 4.0) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.