abstract
- This research develops and implements a novel reinforcement learning (RL) architecture to address the trajectory-tracking problem in bipedal robotic systems under articulated-joint constraints. The proposed RL framework extends previously designed adaptive controllers characterized by state-dependent gain structures. The learning mechanism comprises two hierarchical adaptation layers: the first employs an adaptive dynamic programming (ADP) formulation to approximate the Bellman value function using a class of continuous-time dynamic neural networks. In contrast, the second uses an iterative optimization scheme based on the deep deterministic policy gradient (DDPG) algorithm. The resulting control strategy minimizes a robust performance index defined over the tracking trajectories of a system with uncertain and nonlinear dynamics representative of bipedal locomotion. The dynamic programming formulation ensures robustness to bounded parametric uncertainties and external perturbations. By approximating the Hamilton¿Jacobi¿Bellman (HJB) value function using neural network structures, a closed-loop controller design is systematically established. Numerical simulations demonstrate the convergence of the tracking error to a region centered at the origin with a size that depends on the approximation quality of the selected neural network. To assess the effectiveness of the proposed approach, a conventional state-feedback control design is adopted as a benchmark, revealing that the suggested method produces a lower cumulative tracking error norm (0.023 vs. 0.037 rad·s) in the trajectory-tracking control problem for all robotic joints while simultaneously reducing the control effort required to complete motion tasks. © 2025 by the authors.