In order to plan for failure recovery, the designers of cloud systems need to understand how their system can potentially fail. Unfortunately, analyzing the failure behavior of such systems can be very difficult and time-consuming, due to the large volume of events, non-determinism, and reuse of third-party components. To address these issues, we propose a novel approach that joins fault injection with anomaly detection to identify the symptoms of failures. We evaluated the proposed approach in the context of the OpenStack cloud computing platform. We show that our model can significantly improve the accuracy of failure analysis in terms of false positives and negatives, with a low computational cost.

Enhancing Failure Propagation Analysis in Cloud Computing Systems / Cotroneo, Domenico; DE SIMONE, Luigi; Liguori, Pietro; Natella, Roberto. - (2019), pp. 139-150. (Intervento presentato al convegno 30th International Symposium on Software Reliability Engineering (ISSRE) tenutosi a Berlino nel Ottobre 2019) [10.1109/ISSRE.2019.00023].

Enhancing Failure Propagation Analysis in Cloud Computing Systems

Domenico Cotroneo;Luigi De Simone;Pietro Liguori
;
Roberto Natella
2019

Abstract

In order to plan for failure recovery, the designers of cloud systems need to understand how their system can potentially fail. Unfortunately, analyzing the failure behavior of such systems can be very difficult and time-consuming, due to the large volume of events, non-determinism, and reuse of third-party components. To address these issues, we propose a novel approach that joins fault injection with anomaly detection to identify the symptoms of failures. We evaluated the proposed approach in the context of the OpenStack cloud computing platform. We show that our model can significantly improve the accuracy of failure analysis in terms of false positives and negatives, with a low computational cost.
2019
978-1-7281-4982-0
Enhancing Failure Propagation Analysis in Cloud Computing Systems / Cotroneo, Domenico; DE SIMONE, Luigi; Liguori, Pietro; Natella, Roberto. - (2019), pp. 139-150. (Intervento presentato al convegno 30th International Symposium on Software Reliability Engineering (ISSRE) tenutosi a Berlino nel Ottobre 2019) [10.1109/ISSRE.2019.00023].
File in questo prodotto:
File Dimensione Formato  
Enhancing_Failure_Propagation_Analysis_in_Cloud_Computing_Systems.pdf

solo utenti autorizzati

Tipologia: Versione Editoriale (PDF)
Licenza: Copyright dell'editore
Dimensione 853.44 kB
Formato Adobe PDF
853.44 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11588/766368
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 15
  • ???jsp.display-item.citation.isi??? 13
social impact