Physical Reinforcement Learning with Integral Temporal Difference Error for Constrained Robots

The paradigm of reinforcement learning (RL) refers to agents that learn iteratively through continuous interactions with their environment. However, when the value function is unknown, a neural network is used, which is typically encoded into an unknown temporal difference equation. When RL is implemented in physical systems, explicit convergence and stability analyses are required to guarantee the worst-case operations for any trial, even when the initial conditions are set to zero. In this paper, physical RL (p-RL) refers to the application of RL in dynamical systems that interact with their environments, such as robot manipulators in contact tasks and humanoid robots in cooperation or interaction tasks. Unfortunately, most p-RL schemes lack stability properties, which can even be dangerous for specific robot applications, such as those involving contact (constrained) tasks or interaction tasks. Considering an unknown and disturbed DAE2 robot, in this paper a p-RL approach is developed to guaranteeing robust stability throughout a continuous-time-adaptive actor¿critic, with local exponential convergence of force¿position tracking error. The novel adaptive mechanisms lead to robustness, while an integral sliding mode enforces tracking. Simulations are presented and discussed to show our proposal¿s effectiveness, and some final remarks are addressed concerning the structural aspects. © 2025 by the authors.