AI-based code generators have transformed offensive security by translating natural language descriptions into executable exploits. However, the semantic variability and implicit assumptions in NL descriptions limit their robustness and usability in this domain. This study evaluates nine state-of-the-art DL models, including fine-tuned models and instruction-tuned LLMs, under varying contextual information conditions to assess their ability to handle ambiguity, leverage useful context, and filter irrelevant information. Using a manually-curated dataset of real-world shellcodes and rigorous evaluations, we find that fine-tuned encoder-decoder models excel with related context, decoder-only indirectly benefit from unrelated context to better comprehend the task at hand, while instruction-tuned LLMs struggle to utilize context effectively, regardless of the prompting setting. These results underline the importance of optimized contextual strategies and task-specific fine-tuning for advancing AI-driven exploit generation for high-stakes applications in software security.

Reading between the Lines: Context-Aware AI-based generation of software exploits / Improta, Cristina; Liguori, Pietro; Natella, Roberto; Cukic, Bojan; Cotroneo, Domenico. - In: EMPIRICAL SOFTWARE ENGINEERING. - ISSN 1382-3256. - 31:3(2026). [10.1007/s10664-025-10796-x]

Reading between the Lines: Context-Aware AI-based generation of software exploits

Improta, Cristina
;
Liguori, Pietro;Natella, Roberto;Cotroneo, Domenico
2026

Abstract

AI-based code generators have transformed offensive security by translating natural language descriptions into executable exploits. However, the semantic variability and implicit assumptions in NL descriptions limit their robustness and usability in this domain. This study evaluates nine state-of-the-art DL models, including fine-tuned models and instruction-tuned LLMs, under varying contextual information conditions to assess their ability to handle ambiguity, leverage useful context, and filter irrelevant information. Using a manually-curated dataset of real-world shellcodes and rigorous evaluations, we find that fine-tuned encoder-decoder models excel with related context, decoder-only indirectly benefit from unrelated context to better comprehend the task at hand, while instruction-tuned LLMs struggle to utilize context effectively, regardless of the prompting setting. These results underline the importance of optimized contextual strategies and task-specific fine-tuning for advancing AI-driven exploit generation for high-stakes applications in software security.
2026
Reading between the Lines: Context-Aware AI-based generation of software exploits / Improta, Cristina; Liguori, Pietro; Natella, Roberto; Cukic, Bojan; Cotroneo, Domenico. - In: EMPIRICAL SOFTWARE ENGINEERING. - ISSN 1382-3256. - 31:3(2026). [10.1007/s10664-025-10796-x]
File in questo prodotto:
File Dimensione Formato  
s10664-025-10796-x.pdf

solo utenti autorizzati

Tipologia: Versione Editoriale (PDF)
Licenza: Copyright dell'editore
Dimensione 4.11 MB
Formato Adobe PDF
4.11 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11588/1031744
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact