Reading between the Lines: Context-Aware AI-based generation of software exploits

Improta, Cristina; Liguori, Pietro; Natella, Roberto; Cukic, Bojan; Cotroneo, Domenico

doi:10.1007/s10664-025-10796-x

AI-based code generators have transformed offensive security by translating natural language descriptions into executable exploits. However, the semantic variability and implicit assumptions in NL descriptions limit their robustness and usability in this domain. This study evaluates nine state-of-the-art DL models, including fine-tuned models and instruction-tuned LLMs, under varying contextual information conditions to assess their ability to handle ambiguity, leverage useful context, and filter irrelevant information. Using a manually-curated dataset of real-world shellcodes and rigorous evaluations, we find that fine-tuned encoder-decoder models excel with related context, decoder-only indirectly benefit from unrelated context to better comprehend the task at hand, while instruction-tuned LLMs struggle to utilize context effectively, regardless of the prompting setting. These results underline the importance of optimized contextual strategies and task-specific fine-tuning for advancing AI-driven exploit generation for high-stakes applications in software security.

Reading between the Lines: Context-Aware AI-based generation of software exploits / Improta, Cristina; Liguori, Pietro; Natella, Roberto; Cukic, Bojan; Cotroneo, Domenico. - In: EMPIRICAL SOFTWARE ENGINEERING. - ISSN 1382-3256. - 31:3(2026). [10.1007/s10664-025-10796-x]