This paper investigates impulsive orbital attack-defense(AD)games under multiple constraints and victory conditions,involving three spacecraft:attacker,target,and defender.In the AD scenario,the attacker aims to breac...This paper investigates impulsive orbital attack-defense(AD)games under multiple constraints and victory conditions,involving three spacecraft:attacker,target,and defender.In the AD scenario,the attacker aims to breach the defender's interception to rendezvous with the target,while the defender seeks to protect the target by blocking or actively pursuing the attacker.Four different maneuvering constraints and five potential game outcomes are incorporated to more accurately model AD game problems and increase complexity,thereby reducing the effectiveness of traditional methods such as differential games and game-tree searches.To address these challenges,this study proposes a multiagent deep reinforcement learning solution with variable reward functions.Two attack strategies,Direct attack(DA)and Bypass attack(BA),are developed for the attacker,each focusing on different mission priorities.Similarly,two defense strategies,Direct interdiction(DI)and Collinear interdiction(CI),are designed for the defender,each optimizing specific defensive actions through tailored reward functions.Each reward function incorporates both process rewards(e.g.,distance and angle)and outcome rewards,derived from physical principles and validated via geometric analysis.Extensive simulations of four strategy confrontations demonstrate average defensive success rates of 75%for DI vs.DA,40%for DI vs.BA,80%for CI vs.DA,and 70%for CI vs.BA.Results indicate that CI outperforms DI for defenders,while BA outperforms DA for attackers.Moreover,defenders achieve their objectives more effectively under identical maneuvering capabilities.Trajectory evolution analyses further illustrate the effectiveness of the proposed variable reward function-driven strategies.These strategies and analyses offer valuable guidance for practical orbital defense scenarios and lay a foundation for future multi-agent game research.展开更多
现代战争中的空战态势复杂多变,因此探索一种快速有效的决策方法十分重要。本文对多架无人机协同对抗问题展开研究,提出一种基于长短期记忆(Long and short-term memory,LSTM)和多智能体深度确定策略梯度(Multi-agent deep deterministi...现代战争中的空战态势复杂多变,因此探索一种快速有效的决策方法十分重要。本文对多架无人机协同对抗问题展开研究,提出一种基于长短期记忆(Long and short-term memory,LSTM)和多智能体深度确定策略梯度(Multi-agent deep deterministic policy gradient,MADDPG)的多机协同超视距空战决策算法。首先,建立无人机运动模型、雷达探测区模型和导弹攻击区模型。然后,提出了多机协同超视距空战决策算法。设计了集中式训练LSTM-MADDPG分布式执行架构和协同空战系统的状态空间来处理多架无人机之间的同步决策问题;设计了学习率衰减机制来提升网络的收敛速度和稳定性;利用LSTM网络改进了网络结构,增强了网络对战术特征的提取能力;利用基于衰减因子的奖励函数机制加强无人机的协同对抗能力。仿真结果表明所提出的多机协同超视距空战决策算法使无人机具备了协同攻防的能力,同时算法具备良好的稳定性和收敛性。展开更多
船舶在自动靠泊过程中会受到风、浪、流和岸壁效应等因素的影响,故需要精确的路径规划方法防止靠泊失败。针对全驱动船舶靠泊过程的基于双深度Q网络(double deep Q network,DDQN)算法,设计了一种船舶自动靠泊路径规划方法。首先建立船...船舶在自动靠泊过程中会受到风、浪、流和岸壁效应等因素的影响,故需要精确的路径规划方法防止靠泊失败。针对全驱动船舶靠泊过程的基于双深度Q网络(double deep Q network,DDQN)算法,设计了一种船舶自动靠泊路径规划方法。首先建立船舶三自由度模型,然后通过将距离、航向、推力、时间和碰撞作为奖励或惩罚,改进奖励函数。随后引入DDQN来学习动作奖励模型,并使用学习结果来操纵船舶运动。通过追求更高的奖励值,船舶可以自行找到最优的靠泊路径。实验结果表明,在不同水流速度下,船舶都可以在完成靠泊的同时减小时间和推力,并且在相同水流速度下,DDQN算法与Q-learning、SARSA(state action reward state action)、深度Q网络(deep Q network,DQN)等算法相比,靠泊过程推力分别减小了241.940、234.614、80.202 N,且时间仅为252.485 s。展开更多
基金supported by National Key R&D Program of China:Gravitational Wave Detection Project(Grant Nos.2021YFC22026,2021YFC2202601,2021YFC2202603)National Natural Science Foundation of China(Grant Nos.12172288 and 12472046)。
文摘This paper investigates impulsive orbital attack-defense(AD)games under multiple constraints and victory conditions,involving three spacecraft:attacker,target,and defender.In the AD scenario,the attacker aims to breach the defender's interception to rendezvous with the target,while the defender seeks to protect the target by blocking or actively pursuing the attacker.Four different maneuvering constraints and five potential game outcomes are incorporated to more accurately model AD game problems and increase complexity,thereby reducing the effectiveness of traditional methods such as differential games and game-tree searches.To address these challenges,this study proposes a multiagent deep reinforcement learning solution with variable reward functions.Two attack strategies,Direct attack(DA)and Bypass attack(BA),are developed for the attacker,each focusing on different mission priorities.Similarly,two defense strategies,Direct interdiction(DI)and Collinear interdiction(CI),are designed for the defender,each optimizing specific defensive actions through tailored reward functions.Each reward function incorporates both process rewards(e.g.,distance and angle)and outcome rewards,derived from physical principles and validated via geometric analysis.Extensive simulations of four strategy confrontations demonstrate average defensive success rates of 75%for DI vs.DA,40%for DI vs.BA,80%for CI vs.DA,and 70%for CI vs.BA.Results indicate that CI outperforms DI for defenders,while BA outperforms DA for attackers.Moreover,defenders achieve their objectives more effectively under identical maneuvering capabilities.Trajectory evolution analyses further illustrate the effectiveness of the proposed variable reward function-driven strategies.These strategies and analyses offer valuable guidance for practical orbital defense scenarios and lay a foundation for future multi-agent game research.
文摘现代战争中的空战态势复杂多变,因此探索一种快速有效的决策方法十分重要。本文对多架无人机协同对抗问题展开研究,提出一种基于长短期记忆(Long and short-term memory,LSTM)和多智能体深度确定策略梯度(Multi-agent deep deterministic policy gradient,MADDPG)的多机协同超视距空战决策算法。首先,建立无人机运动模型、雷达探测区模型和导弹攻击区模型。然后,提出了多机协同超视距空战决策算法。设计了集中式训练LSTM-MADDPG分布式执行架构和协同空战系统的状态空间来处理多架无人机之间的同步决策问题;设计了学习率衰减机制来提升网络的收敛速度和稳定性;利用LSTM网络改进了网络结构,增强了网络对战术特征的提取能力;利用基于衰减因子的奖励函数机制加强无人机的协同对抗能力。仿真结果表明所提出的多机协同超视距空战决策算法使无人机具备了协同攻防的能力,同时算法具备良好的稳定性和收敛性。