Big Data analytics and cloud computing impose an ever-growing demand for data-center providers in terms of computational requirements, latency, and storage. Distributed file systems offer the strategic advantage of scaling-out computing and storage resources, hence allowing for notable speed-ups with massively parallel and distributed computing paradigms. On the other hand, such distributed clusters are constantly challenged with storage failures. Data replication is often deployed to ensure fault tolerance and business continuity, typically in a 3x configuration. This results in expensive 200 % overheads in storage space, write propagation, and energy costs. Erasure codes offer an alternative approach for fault tolerance by allowing reconstruction of erased data chunks, while reducing storage overhead down to 30 %. However, a considerable share of CPU cycles and energy is spent computing such codes, effectively reducing the cluster’s efficiency and starving other user and system tasks. Offloading on a custom accelerator is a non-trivial issue, due to the highly multi-threaded nature of such tasks and the lack of robust multi-threading support in conventional accelerator runtimes.In this work, we present a heterogeneous hardware/software architectural design for large-scale and multi-threaded acceleration of distributed erasure codes on PCIe accelerators, and a new abstraction and integration model for distributed accelerators in fault-tolerant storage systems. We enable safe and seamless deployment of multi-threaded SYCL-based IP cores through a hardware thread proxying layer providing software thread-isolation, and integration with cluster-level middlewares. In addition, our design allows for heterogeneous cluster configurations, with full compatibility and transparent integration of heterogeneously-accelerated and CPU-only nodes. We systematically evaluate the individual layers of our architecture and validate design’s integration in a container-based HDFS cluster, comparing performance against the state-of-the-art AVX-512-accelerated ISA-L library and other SYCL substrates, such as GPUs and single-threaded FPGAs.

A hardware/software architecture for multi-threaded offloading of erasure codes in distributed file systems / Maisto, Vincenzo; Cilardo, Alessandro; Billi, Emilio; Fader, Chuck. - In: FUTURE GENERATION COMPUTER SYSTEMS. - ISSN 0167-739X. - 176:(2026). [10.1016/j.future.2025.108187]

A hardware/software architecture for multi-threaded offloading of erasure codes in distributed file systems

Cilardo, Alessandro;
2026

Abstract

Big Data analytics and cloud computing impose an ever-growing demand for data-center providers in terms of computational requirements, latency, and storage. Distributed file systems offer the strategic advantage of scaling-out computing and storage resources, hence allowing for notable speed-ups with massively parallel and distributed computing paradigms. On the other hand, such distributed clusters are constantly challenged with storage failures. Data replication is often deployed to ensure fault tolerance and business continuity, typically in a 3x configuration. This results in expensive 200 % overheads in storage space, write propagation, and energy costs. Erasure codes offer an alternative approach for fault tolerance by allowing reconstruction of erased data chunks, while reducing storage overhead down to 30 %. However, a considerable share of CPU cycles and energy is spent computing such codes, effectively reducing the cluster’s efficiency and starving other user and system tasks. Offloading on a custom accelerator is a non-trivial issue, due to the highly multi-threaded nature of such tasks and the lack of robust multi-threading support in conventional accelerator runtimes.In this work, we present a heterogeneous hardware/software architectural design for large-scale and multi-threaded acceleration of distributed erasure codes on PCIe accelerators, and a new abstraction and integration model for distributed accelerators in fault-tolerant storage systems. We enable safe and seamless deployment of multi-threaded SYCL-based IP cores through a hardware thread proxying layer providing software thread-isolation, and integration with cluster-level middlewares. In addition, our design allows for heterogeneous cluster configurations, with full compatibility and transparent integration of heterogeneously-accelerated and CPU-only nodes. We systematically evaluate the individual layers of our architecture and validate design’s integration in a container-based HDFS cluster, comparing performance against the state-of-the-art AVX-512-accelerated ISA-L library and other SYCL substrates, such as GPUs and single-threaded FPGAs.
2026
A hardware/software architecture for multi-threaded offloading of erasure codes in distributed file systems / Maisto, Vincenzo; Cilardo, Alessandro; Billi, Emilio; Fader, Chuck. - In: FUTURE GENERATION COMPUTER SYSTEMS. - ISSN 0167-739X. - 176:(2026). [10.1016/j.future.2025.108187]
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11588/1049800
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact