摘要
针对无人机动态滑翔问题,提出了一种基于深度强化学习的航迹优化方法。该方法综合利用梯度风能和太阳能,引入了障碍物约束以模拟复杂障碍环境。使用神经网络近似逼近高斯伪谱方法求解航迹的策略,在训练得到的策略基础上利用双延迟深度确定性策略梯度算法进行策略改进,在大幅度提升推理实时性的同时解决了传统最优控制算法在动态滑翔领域难以应对变化风场的问题。实验针对动态滑翔2种经典模式进行仿真验证,之后在考虑多种能量源的情况下进行蒙特卡洛仿真。结果表明,基于深度强化学习的动态滑翔航迹优化方法在单个滑翔周期内获能与最优结果相近,而实时推理决策时间减少了91%。在变化风场环境下,文中方法相较于传统方法具有更强的适应性。
In addressing the issue of dynamic soaring in unmanned aerial vehicles,a trajectory optimization approach based on deep reinforcement learning is proposed.This method synergistically utilizes gradient wind energy and solar energy and incorporates obstacle constraints to simulate complex barrier environments.It employs neural networks to approximate the Gaussian pseudospectral method for solving trajectory policies.On the foundation of the trained policies,the method utilizes the twin delayed deep deterministic policy gradient algorithm for policy enhancement.This significantly boosts the real-time inference capabilities while addressing the challenges traditional optimal control algorithms face in dynamic soaring due to varying wind fields.The experiments initially validate the approach through simulation of two classic modes of dynamic soaring,followed by Monte Carlo simulations considering multiple energy sources.The results indicate that the dynamic soaring trajectory optimization method based on deep reinforcement learning achieves energy acquisition comparable to optimal outcomes within a single soaring cycle,with a 91%reduction in real-time inference decision time.Moreover,in changing wind field environments,this method demonstrates superior adaptability compared to traditional approaches.
作者
张云飞
王宏伦
张梦华
巩轶男
ZHANG Yunfei;WANG Honglun;ZHANG Menghua;GONG Yinan(School of Automation Science and Electrical Engineering,Beihang University,Beijing 100191,China;The Science and Technology on Aircraft Control Laboratory,Beihang University,Beijing 100191,China;Hiwing Aviation General Equipment Co.,Ltd.,Beijing 100074,China)
出处
《西北工业大学学报》
北大核心
2025年第1期128-139,共12页
Journal of Northwestern Polytechnical University
关键词
动态滑翔
强化学习
高斯伪谱
航迹优化
dynamic soaring
reinforcement learning
Gaussian pseudospectral method
trajectory optimization
作者简介
张云飞(1999-),硕士研究生;通信作者:王宏伦(1970-),教授,e-mail:wang-hl-12@126.com。