In consideration of the field-of-view(FOV)angle con-straint,this study focuses on the guidance problem with impact time control.A deep reinforcement learning guidance method is given for the missile to obtain the desi...In consideration of the field-of-view(FOV)angle con-straint,this study focuses on the guidance problem with impact time control.A deep reinforcement learning guidance method is given for the missile to obtain the desired impact time and meet the demand of FOV angle constraint.On basis of the framework of the proportional navigation guidance,an auxiliary control term is supplemented by the distributed deep deterministic policy gradient algorithm,in which the reward functions are developed to decrease the time-to-go error and improve the terminal guid-ance accuracy.The numerical simulation demonstrates that the missile governed by the presented deep reinforcement learning guidance law can hit the target successfully at appointed arrival time.展开更多
针对现有基于深度确定性策略梯度(deep deterministic policy gradient,DDPG)算法的再入制导方法计算精度较差,对强扰动条件适应性不足等问题,在DDPG算法训练框架的基础上,提出一种基于长短期记忆-DDPG(long short term memory-DDPG,LST...针对现有基于深度确定性策略梯度(deep deterministic policy gradient,DDPG)算法的再入制导方法计算精度较差,对强扰动条件适应性不足等问题,在DDPG算法训练框架的基础上,提出一种基于长短期记忆-DDPG(long short term memory-DDPG,LSTM-DDPG)的再入制导方法。该方法采用纵、侧向制导解耦设计思想,在纵向制导方面,首先针对再入制导问题构建强化学习所需的状态、动作空间;其次,确定决策点和制导周期内的指令计算策略,并设计考虑综合性能的奖励函数;然后,引入LSTM网络构建强化学习训练网络,进而通过在线更新策略提升算法的多任务适用性;侧向制导则采用基于横程误差的动态倾侧反转方法,获得倾侧角符号。以美国超音速通用飞行器(common aero vehicle-hypersonic,CAV-H)再入滑翔为例进行仿真,结果表明:与传统数值预测-校正方法相比,所提制导方法具有相当的终端精度和更高的计算效率优势;与现有基于DDPG算法的再入制导方法相比,所提制导方法具有相当的计算效率以及更高的终端精度和鲁棒性。展开更多
为适应大容量同步发电机组并网点母线电压波动增加对自动电压调节器(automatic voltage regulator,AVR)系统响应能力的更高要求,提出一种基于含探索网络的双延迟深度确定性策略梯度(twin delayed deep deterministic policy gradient wi...为适应大容量同步发电机组并网点母线电压波动增加对自动电压调节器(automatic voltage regulator,AVR)系统响应能力的更高要求,提出一种基于含探索网络的双延迟深度确定性策略梯度(twin delayed deep deterministic policy gradient with Explorer network,TD3EN)算法的同步发电机励磁电压控制方法。首先,通过传递函数对同步发电机励磁调压子系统进行建模;然后建立TD3EN算法探索网络、动作网络和评价网络,并设置相应参数;接着利用TD3EN算法训练智能体,通过探索网络探索动作空间,并根据评价网络更新动作网络参数,使其为AVR提供控制信号;将训练完成的智能体接入AVR系统,实现对发电机机端电压的控制。仿真结果表明,所提方法提高了AVR系统响应调节指令和应对电压暂降的能力。展开更多
基金supported by the National Natural Science Foundation of China(62003021,62373304)Industry-University-Research Innovation Fund for Chinese Universities(2021ZYA02009)+2 种基金Shaanxi Qinchuangyuan High-level Innovation and Entrepreneurship Talent Project(OCYRCXM-2022-136)Shaanxi Association for Science and Technology Youth Talent Support Program(XXJS202218)the Fundamental Research Funds for the Central Universities(D5000210830).
文摘In consideration of the field-of-view(FOV)angle con-straint,this study focuses on the guidance problem with impact time control.A deep reinforcement learning guidance method is given for the missile to obtain the desired impact time and meet the demand of FOV angle constraint.On basis of the framework of the proportional navigation guidance,an auxiliary control term is supplemented by the distributed deep deterministic policy gradient algorithm,in which the reward functions are developed to decrease the time-to-go error and improve the terminal guid-ance accuracy.The numerical simulation demonstrates that the missile governed by the presented deep reinforcement learning guidance law can hit the target successfully at appointed arrival time.
文摘针对现有基于深度确定性策略梯度(deep deterministic policy gradient,DDPG)算法的再入制导方法计算精度较差,对强扰动条件适应性不足等问题,在DDPG算法训练框架的基础上,提出一种基于长短期记忆-DDPG(long short term memory-DDPG,LSTM-DDPG)的再入制导方法。该方法采用纵、侧向制导解耦设计思想,在纵向制导方面,首先针对再入制导问题构建强化学习所需的状态、动作空间;其次,确定决策点和制导周期内的指令计算策略,并设计考虑综合性能的奖励函数;然后,引入LSTM网络构建强化学习训练网络,进而通过在线更新策略提升算法的多任务适用性;侧向制导则采用基于横程误差的动态倾侧反转方法,获得倾侧角符号。以美国超音速通用飞行器(common aero vehicle-hypersonic,CAV-H)再入滑翔为例进行仿真,结果表明:与传统数值预测-校正方法相比,所提制导方法具有相当的终端精度和更高的计算效率优势;与现有基于DDPG算法的再入制导方法相比,所提制导方法具有相当的计算效率以及更高的终端精度和鲁棒性。
文摘为适应大容量同步发电机组并网点母线电压波动增加对自动电压调节器(automatic voltage regulator,AVR)系统响应能力的更高要求,提出一种基于含探索网络的双延迟深度确定性策略梯度(twin delayed deep deterministic policy gradient with Explorer network,TD3EN)算法的同步发电机励磁电压控制方法。首先,通过传递函数对同步发电机励磁调压子系统进行建模;然后建立TD3EN算法探索网络、动作网络和评价网络,并设置相应参数;接着利用TD3EN算法训练智能体,通过探索网络探索动作空间,并根据评价网络更新动作网络参数,使其为AVR提供控制信号;将训练完成的智能体接入AVR系统,实现对发电机机端电压的控制。仿真结果表明,所提方法提高了AVR系统响应调节指令和应对电压暂降的能力。