Combined Text-Visual Attention Models for Robot Task Learning and Execution

Rauso, Giuseppe; Caccavale, Riccardo; Finzi, Alberto

doi:10.1007/978-3-031-80607-0_18

In this work, we explore the interplay between text and visual attention mechanisms in a robot reinforcement learning setting, where robotic tasks are conveyed through natural language instructions. Specifically, we propose a novel approach aimed at enhancing robot task learning and execution by leveraging an integrated multimodal attention model that associates task-relevant environmental features with related words in the natural language mission text. We illustrate the overall framework architecture along with the learning process, emphasizing the interaction between textual and visual feature-based attention mechanisms. The method is trained in MiniGrid environments using the Proximal Policy Optimization algorithm, and its performance is evaluated by comparing the proposed architecture with a baseline that lacks attentional mechanisms. Experimental results demonstrate the efficacy of the approach also highlighting its potential in behavior transparency.

Combined Text-Visual Attention Models for Robot Task Learning and Execution / Rauso, G., Caccavale, R., Finzi, A.. - 15450 LNAI:(2025), pp. 228-240. (23rd International Conference of the Italian Association for Artificial Intelligence, AIxIA 2024 ita 2024) [10.1007/978-3-031-80607-0_18].

Combined Text-Visual Attention Models for Robot Task Learning and Execution

Rauso, Giuseppe;Caccavale, Riccardo;Finzi, Alberto

2025

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2025
			
	Codice ISBN
	
				9783031806063
9783031806070
			
	Citazione
	
				Combined Text-Visual Attention Models for Robot Task Learning and Execution / Rauso, G., Caccavale, R., Finzi, A.. - 15450 LNAI:(2025), pp. 228-240. (23rd International Conference of the Italian Association for Artificial Intelligence, AIxIA 2024 ita 2024) [10.1007/978-3-031-80607-0_18].
			
	Appare nelle tipologie:
	
				4.1 Articoli in Atti di convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11588/996851

Combined Text-Visual Attention Models for Robot Task Learning and Execution

Rauso, Giuseppe;Caccavale, Riccardo;Finzi, Alberto

2025

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

Citazioni

social impact

Combined Text-Visual Attention Models for Robot Task Learning and Execution

Rauso, Giuseppe;Caccavale, Riccardo;Finzi, Alberto

2025

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Informazioni

Citazioni

social impact

Conferma cancellazione

Scheda breve

Scheda completa

Scheda completa (DC)