In this paper, we present the results of a manual assessment on the coherence between the comments and the implementation of 3636 methods in three open source soft- ware applications (for one of these applications, we considered two different subsequent versions) implemented in Java. The results of this assessment have been collected in a dataset we made publicly available on the Web. The creation of this dataset is based on a protocol that is detailed in this paper. We present that protocol to let researchers evaluate the goodness of our dataset and to ease its future possible extensions. Another contribution of this paper consists in preliminarily investigating on the effectiveness of adopting a Vec- tor Space Model (VSM) with the tf-idf schema to discriminate coherent and non-coherent methods. We observed that the lexical similarity alone is not sufficient for this distinc- tion, while encouraging results have been obtained by applying an Support Vector Machine (SVM) classifier on the whole vector space.

Coherence of comments and method implementations: a dataset and an empirical investigation / Corazza, Anna; Maggio, Valerio; Scanniello, Giuseppe. - In: SOFTWARE QUALITY JOURNAL. - ISSN 0963-9314. - (2018), pp. 751-777. [10.1007/s11219-016-9347-1]

Coherence of comments and method implementations: a dataset and an empirical investigation

CORAZZA, ANNA;
2018

Abstract

In this paper, we present the results of a manual assessment on the coherence between the comments and the implementation of 3636 methods in three open source soft- ware applications (for one of these applications, we considered two different subsequent versions) implemented in Java. The results of this assessment have been collected in a dataset we made publicly available on the Web. The creation of this dataset is based on a protocol that is detailed in this paper. We present that protocol to let researchers evaluate the goodness of our dataset and to ease its future possible extensions. Another contribution of this paper consists in preliminarily investigating on the effectiveness of adopting a Vec- tor Space Model (VSM) with the tf-idf schema to discriminate coherent and non-coherent methods. We observed that the lexical similarity alone is not sufficient for this distinc- tion, while encouraging results have been obtained by applying an Support Vector Machine (SVM) classifier on the whole vector space.
2018
Coherence of comments and method implementations: a dataset and an empirical investigation / Corazza, Anna; Maggio, Valerio; Scanniello, Giuseppe. - In: SOFTWARE QUALITY JOURNAL. - ISSN 0963-9314. - (2018), pp. 751-777. [10.1007/s11219-016-9347-1]
File in questo prodotto:
File Dimensione Formato  
SQJ_2015_Coherence.pdf

non disponibili

Tipologia: Documento in Pre-print
Licenza: Accesso privato/ristretto
Dimensione 519.8 kB
Formato Adobe PDF
519.8 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11588/656617
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 24
  • ???jsp.display-item.citation.isi??? 10
social impact