A Deep Deterministic Policy Gradient Learning Approach to Missile Autopilot Design

Candeli, A.; De Tommasi, G.; Lui, D. G.; Mele, A.; Santini, S.; Tartaglione, G.

doi:10.1109/ACCESS.2022.3150926

In this paper a Deep Reinforcement Learning algorithm, known as Deep Deterministic Policy Gradient (DDPG), is applied to the problem of designing a missile lateral acceleration control system. To this aim, the autopilot control problem is recast in the Reinforcement Learning framework, where the environment consists of a 2-Degrees-of-Freedom nonlinear model of the missile's longitudinal dynamics, while the agent training procedure is carried out on a linearized version of the model. In particular, we show how to account not only for the stabilization of the longitudinal dynamic, but also for the main performance indexes (settling-Time, undershoot, steady-state error, etc.) in the DDPG reward function. The effectiveness of the proposed DDPG-based missile autopilot is assessed through extensive numerical simulations, carried out on both the linearized and the fully nonlinear dynamics by considering different flight conditions and uncertainty in the aerodynamic coefficients, and its performance is compared against two model-based control strategies in order to check the capability of the proposed data-driven approach to achieve prescribed closed-loop response in a completely model-free fashion.

A Deep Deterministic Policy Gradient Learning Approach to Missile Autopilot Design / Candeli, A., De Tommasi, G., Lui, D.G., Mele, A., Santini, S., Tartaglione, G.. - In: IEEE ACCESS. - ISSN 2169-3536. - 10:(2022), pp. 19685-19696. [10.1109/ACCESS.2022.3150926]