Minimizing learning experiences in embodied agents language and action learning

Milano, N.; Nolfi, S.

doi:10.1016/j.neucom.2025.129510

Language learning necessarily requires the ability to generalize knowledge extracted from a limited number of training examples to the unbounded set of meanings that can be expressed through language. For learning robots, the necessity to learn from limited experiences is further compounded by the high cost of collecting embodied training data. In this article we train a transformer neural network with behavioral cloning to control an embodied agent based on multimodal (language and vision) inputs. We analyze the role of several factors in determining the amount of learning experiences necessary to acquire integrated language and action skills. Our results indicate that the embodied and situated nature of the agents, the ability to reuse previous acquired knowledge to learn new related skills, and the capacity to extract knowledge from the distributional properties of data can greatly reduce the number of necessary examples. We arrive at these conclusions by comparing the results of a series of experiments in which the role of each considered factor is systematically varied.

Minimizing learning experiences in embodied agents language and action learning / Milano, N., Nolfi, S.. - In: NEUROCOMPUTING. - ISSN 0925-2312. - 624:(2025). [10.1016/j.neucom.2025.129510]