We propose a fault injection framework to assess hang detection facilities within the Linux operating system (OS). The novelty of the framework consists in the adoption of a more representative fault load than existing ones, and in the effectiveness in terms of number of hang failures produced; representativeness is supported by a field data study on the Linux OS. Using the proposed fault injection framework, along with realistic workloads, we find that the Linux OS is unable to detect hangs in several cases. We experience a relative coverage of 75%. To improve detection facilities, we propose a simple yet effective hang detector, which periodically tests OS liveness, as perceived by applications, by means of I/O system calls; it is shown that this approach can improve relative coverage up to 94%. The hang detector can be deployed on any Linux system, with an acceptable overhead.

Assessment and Improvement of Hang Detection in the Linux Operating System / Cotroneo, Domenico; Natella, Roberto; Russo, Stefano. - STAMPA. - (2009), pp. 288-294. (Intervento presentato al convegno 28th IEEE International Syposium on Reliable Distributed Systems (SRDS 09) tenutosi a Niagara Falls, NY, USA nel Sept. 27-30) [10.1109/SRDS.2009.26].

Assessment and Improvement of Hang Detection in the Linux Operating System

COTRONEO, DOMENICO;NATELLA, ROBERTO;RUSSO, STEFANO
2009

Abstract

We propose a fault injection framework to assess hang detection facilities within the Linux operating system (OS). The novelty of the framework consists in the adoption of a more representative fault load than existing ones, and in the effectiveness in terms of number of hang failures produced; representativeness is supported by a field data study on the Linux OS. Using the proposed fault injection framework, along with realistic workloads, we find that the Linux OS is unable to detect hangs in several cases. We experience a relative coverage of 75%. To improve detection facilities, we propose a simple yet effective hang detector, which periodically tests OS liveness, as perceived by applications, by means of I/O system calls; it is shown that this approach can improve relative coverage up to 94%. The hang detector can be deployed on any Linux system, with an acceptable overhead.
2009
9780769538266
Assessment and Improvement of Hang Detection in the Linux Operating System / Cotroneo, Domenico; Natella, Roberto; Russo, Stefano. - STAMPA. - (2009), pp. 288-294. (Intervento presentato al convegno 28th IEEE International Syposium on Reliable Distributed Systems (SRDS 09) tenutosi a Niagara Falls, NY, USA nel Sept. 27-30) [10.1109/SRDS.2009.26].
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11588/353102
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 19
  • ???jsp.display-item.citation.isi??? 12
social impact