Support Vector Regression (SVR) is a new generation of Machine Learning algorithms, suitable for predictive data modeling problems. The objective of this paper is twofold: first, to investigate the effectiveness of SVR for Web effort estimation using a cross-company dataset; second, to compare different SVR configurations looking at the one that presents the best performance. In particular, we took into account three variables’ preprocessing strategies (no-preprocessing, normalization, and logarithmic), in combination with two different dependent variables (effort and inverse effort). As a result, SVR was applied using six different data configurations. Moreover, to understand the suitability of kernel functions to handle non-linear problems, SVR was applied without a kernel, and in combination with the Radial Basis Function (RBF) and the Polynomial kernels, thus obtaining 18 different SVR configurations. To identify, for each configuration, which were the best values for each of the parameters we defined a procedure based on a leave-one-out cross-validation approach. The dataset employed was the Tukutuku database, which has been adopted in many previous Web effort estimation studies. Three different training and test set splits were used, including respectively 130 and 65 projects. The SVR-based predictions were also benchmarked against predictions obtained using Manual StepWise Regression and Case-Based Reasoning. Our results showed that the configuration corresponding to the logarithmic features’ preprocessing, in combination with the RBF kernel provided the best results for all three data splits. In addition, SVR provided significantly superior prediction accuracy than all the considered benchmarking techniques.

Investigating the use of Support Vector Regression for Web Effort Estimation

CORAZZA, ANNA;DI MARTINO, SERGIO;
2011

Abstract

Support Vector Regression (SVR) is a new generation of Machine Learning algorithms, suitable for predictive data modeling problems. The objective of this paper is twofold: first, to investigate the effectiveness of SVR for Web effort estimation using a cross-company dataset; second, to compare different SVR configurations looking at the one that presents the best performance. In particular, we took into account three variables’ preprocessing strategies (no-preprocessing, normalization, and logarithmic), in combination with two different dependent variables (effort and inverse effort). As a result, SVR was applied using six different data configurations. Moreover, to understand the suitability of kernel functions to handle non-linear problems, SVR was applied without a kernel, and in combination with the Radial Basis Function (RBF) and the Polynomial kernels, thus obtaining 18 different SVR configurations. To identify, for each configuration, which were the best values for each of the parameters we defined a procedure based on a leave-one-out cross-validation approach. The dataset employed was the Tukutuku database, which has been adopted in many previous Web effort estimation studies. Three different training and test set splits were used, including respectively 130 and 65 projects. The SVR-based predictions were also benchmarked against predictions obtained using Manual StepWise Regression and Case-Based Reasoning. Our results showed that the configuration corresponding to the logarithmic features’ preprocessing, in combination with the RBF kernel provided the best results for all three data splits. In addition, SVR provided significantly superior prediction accuracy than all the considered benchmarking techniques.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/11588/374660
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 39
  • ???jsp.display-item.citation.isi??? 27
social impact