A differential game guidance scheme with obstacle avoidance,based on the formulation of a combined linear quadratic and norm-bounded differential game,is designed for a three-player engagement scenario,which includes ...A differential game guidance scheme with obstacle avoidance,based on the formulation of a combined linear quadratic and norm-bounded differential game,is designed for a three-player engagement scenario,which includes a pursuer,an interceptor,and an evader.The confrontation between the players is divided into four phases(P1-P4)by introducing the switching time,and proposing different guidance strategies according to the phase where the static obstacle is located:the linear quadratic game method is employed to devise the guidance scheme for the energy optimization when the obstacle is located in the P1 and P3 stages;the norm-bounded differential game guidance strategy is presented to satisfy the acceleration constraint under the circumstance that the obstacle is located in the P2 and P4 phases.Furthermore,the radii of the static obstacle and the interceptor are taken as the design parameters to derive the combined guidance strategy through the dead-zone function,which guarantees that the pursuer avoids the static obstacle,and the interceptor,and attacks the evader.Finally,the nonlinear numerical simulations verify the performance of the game guidance strategy.展开更多
A conflict of three players, including an attacker, a defender, and a target with bounded control is discussed based on the differential game theories in which the target and the defender use an optimal pursuit strate...A conflict of three players, including an attacker, a defender, and a target with bounded control is discussed based on the differential game theories in which the target and the defender use an optimal pursuit strategy. The current approach chooses the miss distance as the outcome of the conflict. Different optimal guidance laws are investigated, and feasible conditions are analyzed for the attacker to accomplish an attacking task. For some given conditions, the attacker cannot intercept the target by only using a one-to-one optimal pursuit guidance law; thus, a guidance law for the attacker to reach a critical safe value is investigated.Specifically, the guidance law is divided into two parts. Before the engagement time between the defender and the attacker, the attacker uses this derived guidance law to guarantee that the evasion distance from the defender is safe, and that the zero-effort-miss(ZEM) distance between the attacker and the target is the smallest.After that engagement time, the attacker uses the optimal one-toone guidance law to accomplish the pursuit task. The advantages and limited conditions of these derived guidance laws are also investigated by using nonlinear simulations.展开更多
针对航天器与非合作目标追逃博弈的生存型微分对策拦截问题,基于强化学习研究了追逃博弈策略,提出了自适应增强随机搜索(adaptive-augmented random search,A-ARS)算法。针对序贯决策的稀疏奖励难题,设计了基于策略参数空间扰动的探索方...针对航天器与非合作目标追逃博弈的生存型微分对策拦截问题,基于强化学习研究了追逃博弈策略,提出了自适应增强随机搜索(adaptive-augmented random search,A-ARS)算法。针对序贯决策的稀疏奖励难题,设计了基于策略参数空间扰动的探索方法,加快策略收敛速度;针对可能过早陷入局部最优问题设计了新颖度函数并引导策略更新,可提升数据利用效率;通过数值仿真验证并与增强随机搜索(augmented random search,ARS)、近端策略优化算法(proximal policy optimization,PPO)以及深度确定性策略梯度下降算法(deep deterministic policy gradient,DDPG)进行对比,验证了此方法的有效性和先进性。展开更多
基金supported by National Natural Science Foundation(NNSF)of China under(Grant No.62273119)。
文摘A differential game guidance scheme with obstacle avoidance,based on the formulation of a combined linear quadratic and norm-bounded differential game,is designed for a three-player engagement scenario,which includes a pursuer,an interceptor,and an evader.The confrontation between the players is divided into four phases(P1-P4)by introducing the switching time,and proposing different guidance strategies according to the phase where the static obstacle is located:the linear quadratic game method is employed to devise the guidance scheme for the energy optimization when the obstacle is located in the P1 and P3 stages;the norm-bounded differential game guidance strategy is presented to satisfy the acceleration constraint under the circumstance that the obstacle is located in the P2 and P4 phases.Furthermore,the radii of the static obstacle and the interceptor are taken as the design parameters to derive the combined guidance strategy through the dead-zone function,which guarantees that the pursuer avoids the static obstacle,and the interceptor,and attacks the evader.Finally,the nonlinear numerical simulations verify the performance of the game guidance strategy.
基金supported by the National Natural Science Foundation of China(11672093)
文摘A conflict of three players, including an attacker, a defender, and a target with bounded control is discussed based on the differential game theories in which the target and the defender use an optimal pursuit strategy. The current approach chooses the miss distance as the outcome of the conflict. Different optimal guidance laws are investigated, and feasible conditions are analyzed for the attacker to accomplish an attacking task. For some given conditions, the attacker cannot intercept the target by only using a one-to-one optimal pursuit guidance law; thus, a guidance law for the attacker to reach a critical safe value is investigated.Specifically, the guidance law is divided into two parts. Before the engagement time between the defender and the attacker, the attacker uses this derived guidance law to guarantee that the evasion distance from the defender is safe, and that the zero-effort-miss(ZEM) distance between the attacker and the target is the smallest.After that engagement time, the attacker uses the optimal one-toone guidance law to accomplish the pursuit task. The advantages and limited conditions of these derived guidance laws are also investigated by using nonlinear simulations.
文摘针对航天器与非合作目标追逃博弈的生存型微分对策拦截问题,基于强化学习研究了追逃博弈策略,提出了自适应增强随机搜索(adaptive-augmented random search,A-ARS)算法。针对序贯决策的稀疏奖励难题,设计了基于策略参数空间扰动的探索方法,加快策略收敛速度;针对可能过早陷入局部最优问题设计了新颖度函数并引导策略更新,可提升数据利用效率;通过数值仿真验证并与增强随机搜索(augmented random search,ARS)、近端策略优化算法(proximal policy optimization,PPO)以及深度确定性策略梯度下降算法(deep deterministic policy gradient,DDPG)进行对比,验证了此方法的有效性和先进性。