Information in source code comments and identifiers names represent a valuable resource for programmers to maintain and evolve software. During the evolution of a software it could happen that the information in comments and the corresponding source code is not aligned, so hampering the execution of software evolution and maintenance tasks. This kind of misalignment is known as lack of coherence and can happen for several reasons, e.g., programmers modify the intent of source code while executing a maintenance task without updating its comment accordingly. We study the problem of detecting lack of coherence between comments and source code by exploiting Word Embeddings (WEs), a tool which has shown to be very effective in natural language processing. We introduce four models based on WEs and tested them using six different WE variants. These models and WEs have been empirically assessed through an experiment conducted on a publicly available dataset and compared them with a baseline approach. The results indicate that, while maintaining performance very close to the baseline, the considered models and WE variants are more efficient in terms of execution time. The explanation for such an improvement is that WEs are able to concentrate the important information in a much more compact representation of the input. This represents one of the most important take-away lesson from our experiment.

Word Embeddings for Comment Coherence / Cimasa, Alfonso; Corazza, Anna; Coviello, Carmen; Scanniello, Giuseppe. - (2019), pp. 244-251. (Intervento presentato al convegno 45th Euromicro Conference on Software Engineering and Advanced Applications (SEAA) tenutosi a Kallithea-Chalkidiki, Greece nel 28-30 agosto 2019).

Word Embeddings for Comment Coherence

Alfonso Cimasa;Anna Corazza;
2019

Abstract

Information in source code comments and identifiers names represent a valuable resource for programmers to maintain and evolve software. During the evolution of a software it could happen that the information in comments and the corresponding source code is not aligned, so hampering the execution of software evolution and maintenance tasks. This kind of misalignment is known as lack of coherence and can happen for several reasons, e.g., programmers modify the intent of source code while executing a maintenance task without updating its comment accordingly. We study the problem of detecting lack of coherence between comments and source code by exploiting Word Embeddings (WEs), a tool which has shown to be very effective in natural language processing. We introduce four models based on WEs and tested them using six different WE variants. These models and WEs have been empirically assessed through an experiment conducted on a publicly available dataset and compared them with a baseline approach. The results indicate that, while maintaining performance very close to the baseline, the considered models and WE variants are more efficient in terms of execution time. The explanation for such an improvement is that WEs are able to concentrate the important information in a much more compact representation of the input. This represents one of the most important take-away lesson from our experiment.
2019
Word Embeddings for Comment Coherence / Cimasa, Alfonso; Corazza, Anna; Coviello, Carmen; Scanniello, Giuseppe. - (2019), pp. 244-251. (Intervento presentato al convegno 45th Euromicro Conference on Software Engineering and Advanced Applications (SEAA) tenutosi a Kallithea-Chalkidiki, Greece nel 28-30 agosto 2019).
File in questo prodotto:
File Dimensione Formato  
Euromicro.pdf

non disponibili

Tipologia: Documento in Pre-print
Licenza: Accesso privato/ristretto
Dimensione 259.99 kB
Formato Adobe PDF
259.99 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11588/772246
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 6
  • ???jsp.display-item.citation.isi??? 7
social impact