Information in source code comments and identifiers names represent a valuable resource for programmers to maintain and evolve software. During the evolution of a software it could happen that the information in comments and the corresponding source code is not aligned, so hampering the execution of software evolution and maintenance tasks. This kind of misalignment is known as lack of coherence and can happen for several reasons, e.g., programmers modify the intent of source code while executing a maintenance task without updating its comment accordingly. We study the problem of detecting lack of coherence between comments and source code by exploiting Word Embeddings (WEs), a tool which has shown to be very effective in natural language processing. We introduce four models based on WEs and tested them using six different WE variants. These models and WEs have been empirically assessed through an experiment conducted on a publicly available dataset and compared them with a baseline approach. The results indicate that, while maintaining performance very close to the baseline, the considered models and WE variants are more efficient in terms of execution time. The explanation for such an improvement is that WEs are able to concentrate the important information in a much more compact representation of the input. This represents one of the most important take-away lesson from our experiment.
Word Embeddings for Comment Coherence / Cimasa, Alfonso; Corazza, Anna; Coviello, Carmen; Scanniello, Giuseppe. - (2019), pp. 244-251. (Intervento presentato al convegno 45th Euromicro Conference on Software Engineering and Advanced Applications (SEAA) tenutosi a Kallithea-Chalkidiki, Greece nel 28-30 agosto 2019).
Word Embeddings for Comment Coherence
Alfonso Cimasa;Anna Corazza;
2019
Abstract
Information in source code comments and identifiers names represent a valuable resource for programmers to maintain and evolve software. During the evolution of a software it could happen that the information in comments and the corresponding source code is not aligned, so hampering the execution of software evolution and maintenance tasks. This kind of misalignment is known as lack of coherence and can happen for several reasons, e.g., programmers modify the intent of source code while executing a maintenance task without updating its comment accordingly. We study the problem of detecting lack of coherence between comments and source code by exploiting Word Embeddings (WEs), a tool which has shown to be very effective in natural language processing. We introduce four models based on WEs and tested them using six different WE variants. These models and WEs have been empirically assessed through an experiment conducted on a publicly available dataset and compared them with a baseline approach. The results indicate that, while maintaining performance very close to the baseline, the considered models and WE variants are more efficient in terms of execution time. The explanation for such an improvement is that WEs are able to concentrate the important information in a much more compact representation of the input. This represents one of the most important take-away lesson from our experiment.File | Dimensione | Formato | |
---|---|---|---|
Euromicro.pdf
non disponibili
Tipologia:
Documento in Pre-print
Licenza:
Accesso privato/ristretto
Dimensione
259.99 kB
Formato
Adobe PDF
|
259.99 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.