When the maneuverability of a pursuer is not significantly higher than that of an evader,it will be difficult to intercept the evader with only one pursuer.Therefore,this article adopts a two-to-one differential game ...When the maneuverability of a pursuer is not significantly higher than that of an evader,it will be difficult to intercept the evader with only one pursuer.Therefore,this article adopts a two-to-one differential game strategy,the game of kind is generally considered to be angle-optimized,which allows unlimited turns,but these practices do not take into account the effect of acceleration,which does not correspond to the actual situation,thus,based on the angle-optimized,the acceleration optimization and the acceleration upper bound constraint are added into the game for consideration.A two-to-one differential game problem is proposed in the three-dimensional space,and an improved multi-objective grey wolf optimization(IMOGWO)algorithm is proposed to solve the optimal game point of this problem.With the equations that describe the relative motions between the pursuers and the evader in the three-dimensional space,a multi-objective function with constraints is given as the performance index to design an optimal strategy for the differential game.Then the optimal game point is solved by using the IMOGWO algorithm.It is proved based on Markov chains that with the IMOGWO,the Pareto solution set is the solution of the differential game.Finally,it is verified through simulations that the pursuers can capture the escapee,and via comparative experiments,it is shown that the IMOGWO algorithm performs well in terms of running time and memory usage.展开更多
The article studies the evolutionary dynamics of two-population two-strategy game models with and without impulses. First, the payment matrix is given and two evolutionary dynamics models are established by adding sto...The article studies the evolutionary dynamics of two-population two-strategy game models with and without impulses. First, the payment matrix is given and two evolutionary dynamics models are established by adding stochastic and impulse. For the stochastic model without impulses, the existence and uniqueness of solution, and the existence of positive periodic solutions are proved, and a sufficient condition for strategy extinction is given. For the stochastic model with impulses, the existence of positive periodic solutions is proved. Numerical results show that noise and impulses directly affect the model, but the periodicity of the model does not change.展开更多
The Stackelberg prediction game(SPG)is a bilevel optimization frame-work for modeling strategic interactions between a learner and a follower.Existing meth-ods for solving this problem with general loss functions are ...The Stackelberg prediction game(SPG)is a bilevel optimization frame-work for modeling strategic interactions between a learner and a follower.Existing meth-ods for solving this problem with general loss functions are computationally expensive and scarce.We propose a novel hyper-gradient type method with a warm-start strategy to address this challenge.Particularly,we first use a Taylor expansion-based approach to obtain a good initial point.Then we apply a hyper-gradient descent method with an ex-plicit approximate hyper-gradient.We establish the convergence results of our algorithm theoretically.Furthermore,when the follower employs the least squares loss function,our method is shown to reach an e-stationary point by solving quadratic subproblems.Numerical experiments show our algorithms are empirically orders of magnitude faster than the state-of-the-art.展开更多
This paper presents a mode-switching collaborative defense strategy for spacecraft pursuit-evasiondefense scenarios.In these scenarios,the pursuer tries to avoid the defender while capturing the evader,while the evade...This paper presents a mode-switching collaborative defense strategy for spacecraft pursuit-evasiondefense scenarios.In these scenarios,the pursuer tries to avoid the defender while capturing the evader,while the evader and defender form an alliance to prevent the pursuer from achieving its goal.First,the behavioral modes of the pursuer,including attack and avoidance modes,were established using differential game theory.These modes are then recognized by an interactive multiple model-matching algorithm(IMM),that uses several smooth variable structure filters to match the modes of the pursuer and update their probabilities in real time.Based on the linear-quadratic optimization theory,combined with the results of strategy identification,a two-way cooperative optimal strategy for the defender and evader is proposed,where the evader aids the defender to intercept the pursuer by performing luring maneuvers.Simulation results show that the interactive multi-model algorithm based on several smooth variable structure filters perform well in the strategy identification of the pursuer,and the cooperative defense strategy based on strategy identification has good interception performance when facing pursuers,who are able to flexibly adjust their game objectives.展开更多
This paper comprehensively explores the impulsive on-orbit inspection game problem utilizing reinforcement learning and game training methods.The purpose of the spacecraft is to inspect the entire surface of a non-coo...This paper comprehensively explores the impulsive on-orbit inspection game problem utilizing reinforcement learning and game training methods.The purpose of the spacecraft is to inspect the entire surface of a non-cooperative target with active maneuverability in front lighting.First,the impulsive orbital game problem is formulated as a turn-based sequential game problem.Second,several typical relative orbit transfers are encapsulated into modules to construct a parameterized action space containing discrete modules and continuous parameters,and multi-pass deep Q-networks(MPDQN)algorithm is used to implement autonomous decision-making.Then,a curriculum learning method is used to gradually increase the difficulty of the training scenario.The backtracking proportional self-play training framework is used to enhance the agent’s ability to defeat inconsistent strategies by building a pool of opponents.The behavior variations of the agents during training indicate that the intelligent game system gradually evolves towards an equilibrium situation.The restraint relations between the agents show that the agents steadily improve the strategy.The influence of various factors on game results is tested.展开更多
This paper investigates impulsive orbital attack-defense(AD)games under multiple constraints and victory conditions,involving three spacecraft:attacker,target,and defender.In the AD scenario,the attacker aims to breac...This paper investigates impulsive orbital attack-defense(AD)games under multiple constraints and victory conditions,involving three spacecraft:attacker,target,and defender.In the AD scenario,the attacker aims to breach the defender's interception to rendezvous with the target,while the defender seeks to protect the target by blocking or actively pursuing the attacker.Four different maneuvering constraints and five potential game outcomes are incorporated to more accurately model AD game problems and increase complexity,thereby reducing the effectiveness of traditional methods such as differential games and game-tree searches.To address these challenges,this study proposes a multiagent deep reinforcement learning solution with variable reward functions.Two attack strategies,Direct attack(DA)and Bypass attack(BA),are developed for the attacker,each focusing on different mission priorities.Similarly,two defense strategies,Direct interdiction(DI)and Collinear interdiction(CI),are designed for the defender,each optimizing specific defensive actions through tailored reward functions.Each reward function incorporates both process rewards(e.g.,distance and angle)and outcome rewards,derived from physical principles and validated via geometric analysis.Extensive simulations of four strategy confrontations demonstrate average defensive success rates of 75%for DI vs.DA,40%for DI vs.BA,80%for CI vs.DA,and 70%for CI vs.BA.Results indicate that CI outperforms DI for defenders,while BA outperforms DA for attackers.Moreover,defenders achieve their objectives more effectively under identical maneuvering capabilities.Trajectory evolution analyses further illustrate the effectiveness of the proposed variable reward function-driven strategies.These strategies and analyses offer valuable guidance for practical orbital defense scenarios and lay a foundation for future multi-agent game research.展开更多
To address the confrontation decision-making issues in multi-round air combat,a dynamic game decision method is proposed based on decision tree for the confrontation of unmanned aerial vehicle(UAV)air combat.Based on ...To address the confrontation decision-making issues in multi-round air combat,a dynamic game decision method is proposed based on decision tree for the confrontation of unmanned aerial vehicle(UAV)air combat.Based on game the-ory and the confrontation characteristics of air combat,a dynamic game process is constructed including the strategy sets,the situation information,and the maneuver decisions for both sides of air combat.By analyzing the UAV’s flight dyna-mics and the both sides’information,a payment matrix is estab-lished through the situation advantage function,performance advantage function,and profit function.Furthermore,the dynamic game decision problem is solved based on the linear induction method to obtain the Nash equilibrium solution,where the decision tree method is introduced to obtain the optimal maneuver decision,thereby improving the situation advantage in the next round of confrontation.According to the analysis,the simulation results for the confrontation scenarios of multi-round air combat are presented to verify the effectiveness and advan-tages of the proposed method.展开更多
As a crucial process in the coordinated strikes of unmanned aerial vehicles(UAVs), weapon-target assignment is vital for optimizing the allocation of available weapons and effectively exploiting the capabilities of UA...As a crucial process in the coordinated strikes of unmanned aerial vehicles(UAVs), weapon-target assignment is vital for optimizing the allocation of available weapons and effectively exploiting the capabilities of UAVs. Existing weapon-target assignment methods primarily focus on macro cluster constraints while neglecting individual strategy updates. This paper proposes a novel weapon-target assignment method for UAVs based on the multi-strategy threshold public goods game(PGG). By analyzing the concept mapping between weapon-target assignment for UAVs and multi-strategy threshold PGG, a weapon-target assignment model for UAVs based on the multi-strategy threshold PGG is established, which is adaptively complemented by the diverse cooperation-defection strategy library and the utility function based on the threshold mechanism. Additionally, a multi-chain Markov is formulated to quantitatively describe the stochastic evolutionary dynamics, whose evolutionary stable distribution is theoretically derived through the development of a strategy update rule based on preference-based aspiration dynamic. Numerical simulation results validate the feasibility and effectiveness of the proposed method, and the impacts of selection intensity, preference degree and threshold on the evolutionary stable distribution are analyzed. Comparative simulations show that the proposed method outperforms GWO, DE, and NSGA-II, achieving 17.18% higher expected utility than NSGA-II and reducing evolutionary stable times by 25% in large-scale scenario.展开更多
基金National Natural Science Foundation of China(NSFC61773142,NSFC62303136)。
文摘When the maneuverability of a pursuer is not significantly higher than that of an evader,it will be difficult to intercept the evader with only one pursuer.Therefore,this article adopts a two-to-one differential game strategy,the game of kind is generally considered to be angle-optimized,which allows unlimited turns,but these practices do not take into account the effect of acceleration,which does not correspond to the actual situation,thus,based on the angle-optimized,the acceleration optimization and the acceleration upper bound constraint are added into the game for consideration.A two-to-one differential game problem is proposed in the three-dimensional space,and an improved multi-objective grey wolf optimization(IMOGWO)algorithm is proposed to solve the optimal game point of this problem.With the equations that describe the relative motions between the pursuers and the evader in the three-dimensional space,a multi-objective function with constraints is given as the performance index to design an optimal strategy for the differential game.Then the optimal game point is solved by using the IMOGWO algorithm.It is proved based on Markov chains that with the IMOGWO,the Pareto solution set is the solution of the differential game.Finally,it is verified through simulations that the pursuers can capture the escapee,and via comparative experiments,it is shown that the IMOGWO algorithm performs well in terms of running time and memory usage.
基金Supported by the National Natural Science Foundation of China(10671182)。
文摘The article studies the evolutionary dynamics of two-population two-strategy game models with and without impulses. First, the payment matrix is given and two evolutionary dynamics models are established by adding stochastic and impulse. For the stochastic model without impulses, the existence and uniqueness of solution, and the existence of positive periodic solutions are proved, and a sufficient condition for strategy extinction is given. For the stochastic model with impulses, the existence of positive periodic solutions is proved. Numerical results show that noise and impulses directly affect the model, but the periodicity of the model does not change.
文摘The Stackelberg prediction game(SPG)is a bilevel optimization frame-work for modeling strategic interactions between a learner and a follower.Existing meth-ods for solving this problem with general loss functions are computationally expensive and scarce.We propose a novel hyper-gradient type method with a warm-start strategy to address this challenge.Particularly,we first use a Taylor expansion-based approach to obtain a good initial point.Then we apply a hyper-gradient descent method with an ex-plicit approximate hyper-gradient.We establish the convergence results of our algorithm theoretically.Furthermore,when the follower employs the least squares loss function,our method is shown to reach an e-stationary point by solving quadratic subproblems.Numerical experiments show our algorithms are empirically orders of magnitude faster than the state-of-the-art.
基金the Science and Technology Department,Heilongjiang Province under Grant Agreement No JJ2022LH0315。
文摘This paper presents a mode-switching collaborative defense strategy for spacecraft pursuit-evasiondefense scenarios.In these scenarios,the pursuer tries to avoid the defender while capturing the evader,while the evader and defender form an alliance to prevent the pursuer from achieving its goal.First,the behavioral modes of the pursuer,including attack and avoidance modes,were established using differential game theory.These modes are then recognized by an interactive multiple model-matching algorithm(IMM),that uses several smooth variable structure filters to match the modes of the pursuer and update their probabilities in real time.Based on the linear-quadratic optimization theory,combined with the results of strategy identification,a two-way cooperative optimal strategy for the defender and evader is proposed,where the evader aids the defender to intercept the pursuer by performing luring maneuvers.Simulation results show that the interactive multi-model algorithm based on several smooth variable structure filters perform well in the strategy identification of the pursuer,and the cooperative defense strategy based on strategy identification has good interception performance when facing pursuers,who are able to flexibly adjust their game objectives.
文摘This paper comprehensively explores the impulsive on-orbit inspection game problem utilizing reinforcement learning and game training methods.The purpose of the spacecraft is to inspect the entire surface of a non-cooperative target with active maneuverability in front lighting.First,the impulsive orbital game problem is formulated as a turn-based sequential game problem.Second,several typical relative orbit transfers are encapsulated into modules to construct a parameterized action space containing discrete modules and continuous parameters,and multi-pass deep Q-networks(MPDQN)algorithm is used to implement autonomous decision-making.Then,a curriculum learning method is used to gradually increase the difficulty of the training scenario.The backtracking proportional self-play training framework is used to enhance the agent’s ability to defeat inconsistent strategies by building a pool of opponents.The behavior variations of the agents during training indicate that the intelligent game system gradually evolves towards an equilibrium situation.The restraint relations between the agents show that the agents steadily improve the strategy.The influence of various factors on game results is tested.
基金supported by National Key R&D Program of China:Gravitational Wave Detection Project(Grant Nos.2021YFC22026,2021YFC2202601,2021YFC2202603)National Natural Science Foundation of China(Grant Nos.12172288 and 12472046)。
文摘This paper investigates impulsive orbital attack-defense(AD)games under multiple constraints and victory conditions,involving three spacecraft:attacker,target,and defender.In the AD scenario,the attacker aims to breach the defender's interception to rendezvous with the target,while the defender seeks to protect the target by blocking or actively pursuing the attacker.Four different maneuvering constraints and five potential game outcomes are incorporated to more accurately model AD game problems and increase complexity,thereby reducing the effectiveness of traditional methods such as differential games and game-tree searches.To address these challenges,this study proposes a multiagent deep reinforcement learning solution with variable reward functions.Two attack strategies,Direct attack(DA)and Bypass attack(BA),are developed for the attacker,each focusing on different mission priorities.Similarly,two defense strategies,Direct interdiction(DI)and Collinear interdiction(CI),are designed for the defender,each optimizing specific defensive actions through tailored reward functions.Each reward function incorporates both process rewards(e.g.,distance and angle)and outcome rewards,derived from physical principles and validated via geometric analysis.Extensive simulations of four strategy confrontations demonstrate average defensive success rates of 75%for DI vs.DA,40%for DI vs.BA,80%for CI vs.DA,and 70%for CI vs.BA.Results indicate that CI outperforms DI for defenders,while BA outperforms DA for attackers.Moreover,defenders achieve their objectives more effectively under identical maneuvering capabilities.Trajectory evolution analyses further illustrate the effectiveness of the proposed variable reward function-driven strategies.These strategies and analyses offer valuable guidance for practical orbital defense scenarios and lay a foundation for future multi-agent game research.
基金supported by the Major Projects for Science and Technology Innovation 2030(2018AAA0100805).
文摘To address the confrontation decision-making issues in multi-round air combat,a dynamic game decision method is proposed based on decision tree for the confrontation of unmanned aerial vehicle(UAV)air combat.Based on game the-ory and the confrontation characteristics of air combat,a dynamic game process is constructed including the strategy sets,the situation information,and the maneuver decisions for both sides of air combat.By analyzing the UAV’s flight dyna-mics and the both sides’information,a payment matrix is estab-lished through the situation advantage function,performance advantage function,and profit function.Furthermore,the dynamic game decision problem is solved based on the linear induction method to obtain the Nash equilibrium solution,where the decision tree method is introduced to obtain the optimal maneuver decision,thereby improving the situation advantage in the next round of confrontation.According to the analysis,the simulation results for the confrontation scenarios of multi-round air combat are presented to verify the effectiveness and advan-tages of the proposed method.
基金supported by the National Natural Science Foundation of China (No. 62073267)。
文摘As a crucial process in the coordinated strikes of unmanned aerial vehicles(UAVs), weapon-target assignment is vital for optimizing the allocation of available weapons and effectively exploiting the capabilities of UAVs. Existing weapon-target assignment methods primarily focus on macro cluster constraints while neglecting individual strategy updates. This paper proposes a novel weapon-target assignment method for UAVs based on the multi-strategy threshold public goods game(PGG). By analyzing the concept mapping between weapon-target assignment for UAVs and multi-strategy threshold PGG, a weapon-target assignment model for UAVs based on the multi-strategy threshold PGG is established, which is adaptively complemented by the diverse cooperation-defection strategy library and the utility function based on the threshold mechanism. Additionally, a multi-chain Markov is formulated to quantitatively describe the stochastic evolutionary dynamics, whose evolutionary stable distribution is theoretically derived through the development of a strategy update rule based on preference-based aspiration dynamic. Numerical simulation results validate the feasibility and effectiveness of the proposed method, and the impacts of selection intensity, preference degree and threshold on the evolutionary stable distribution are analyzed. Comparative simulations show that the proposed method outperforms GWO, DE, and NSGA-II, achieving 17.18% higher expected utility than NSGA-II and reducing evolutionary stable times by 25% in large-scale scenario.