Background: In the context of End-to-End testing of web applications , automated exploration techniques (a.k.a. crawling) are widely used to infer state-based models of the site under test. These models, in which states represent features of the web application and transitions represent reachability relationships, can be used for several model-based testing tasks, such as test case generation. However, current exploration techniques often lead to models containing many near-duplicate states, i.e., states representing slightly different pages that are in fact instances of the same feature. This has a negative impact on the subsequent model-based testing tasks, adversely affecting, for example, size, running time, and achieved coverage of generated test suites. Aims: As a web page can be naturally represented by its tree-structured DOM representation, we propose a novel near-duplicate detection technique to improve the model inference of web applications, based on Tree Kernel (TK) functions. TKs are a class of functions that compute similarity between tree-structured objects, largely investigated and successfully applied in the Natural Language Processing domain. Method: To evaluate the capability of the proposed approach in detecting near-duplicate web pages, we conducted preliminary classification experiments on a freely-available massive dataset of about 100k manually annotated web page pairs. We compared the classification performance of the proposed approach with other state-of-the-art near-duplicate detection techniques. Results: Preliminary results show that our approach performs better than state-of-the-art techniques in the near-duplicate detection classification task. Conclusions: These promising results show that TKs can be applied to near-duplicate detection in the context of web application model inference, and motivate further research in this direction to assess the impact of the technique on the quality of the inferred models and on the subsequent application of model-based testing techniques.

Web application testing: Using tree kernels to detect near-duplicate states in automated model inference / Corazza, Anna; DI MARTINO, Sergio; Peron, Adriano; Starace, LUIGI LIBERO LUCIO. - (2021), pp. 1-6. (Intervento presentato al convegno 15th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM) tenutosi a Bari nel 10-12 Ottobre 2021) [10.1145/3475716.3484187].

Web application testing: Using tree kernels to detect near-duplicate states in automated model inference

Anna Corazza
Co-primo
;
Sergio Di Martino
Co-primo
;
Adriano Peron
Co-primo
;
Luigi Libero Lucio Starace
Co-primo
2021

Abstract

Background: In the context of End-to-End testing of web applications , automated exploration techniques (a.k.a. crawling) are widely used to infer state-based models of the site under test. These models, in which states represent features of the web application and transitions represent reachability relationships, can be used for several model-based testing tasks, such as test case generation. However, current exploration techniques often lead to models containing many near-duplicate states, i.e., states representing slightly different pages that are in fact instances of the same feature. This has a negative impact on the subsequent model-based testing tasks, adversely affecting, for example, size, running time, and achieved coverage of generated test suites. Aims: As a web page can be naturally represented by its tree-structured DOM representation, we propose a novel near-duplicate detection technique to improve the model inference of web applications, based on Tree Kernel (TK) functions. TKs are a class of functions that compute similarity between tree-structured objects, largely investigated and successfully applied in the Natural Language Processing domain. Method: To evaluate the capability of the proposed approach in detecting near-duplicate web pages, we conducted preliminary classification experiments on a freely-available massive dataset of about 100k manually annotated web page pairs. We compared the classification performance of the proposed approach with other state-of-the-art near-duplicate detection techniques. Results: Preliminary results show that our approach performs better than state-of-the-art techniques in the near-duplicate detection classification task. Conclusions: These promising results show that TKs can be applied to near-duplicate detection in the context of web application model inference, and motivate further research in this direction to assess the impact of the technique on the quality of the inferred models and on the subsequent application of model-based testing techniques.
2021
9781450386654
Web application testing: Using tree kernels to detect near-duplicate states in automated model inference / Corazza, Anna; DI MARTINO, Sergio; Peron, Adriano; Starace, LUIGI LIBERO LUCIO. - (2021), pp. 1-6. (Intervento presentato al convegno 15th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM) tenutosi a Bari nel 10-12 Ottobre 2021) [10.1145/3475716.3484187].
File in questo prodotto:
File Dimensione Formato  
3475716.3484187.pdf

accesso aperto

Tipologia: Versione Editoriale (PDF)
Licenza: Dominio pubblico
Dimensione 612.78 kB
Formato Adobe PDF
612.78 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11588/860635
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 7
  • ???jsp.display-item.citation.isi??? ND
social impact