In strategic decision-making tasks,determining how to assign limited costly resource towards the defender and the attacker is a central problem.However,it is hard for pre-allocated resource assignment to adapt to dyna...In strategic decision-making tasks,determining how to assign limited costly resource towards the defender and the attacker is a central problem.However,it is hard for pre-allocated resource assignment to adapt to dynamic fighting scenarios,and exists situations where the scenario and rule of the Colonel Blotto(CB)game are too restrictive in real world.To address these issues,a support stage is added as supplementary for pre-allocated results,in which a novel two-stage competitive resource assignment problem is formulated based on CB game and stochastic Lanchester equation(SLE).Further,the force attrition in these two stages is formulated as a stochastic progress to consider the complex fighting progress,including the case that the player with fewer resources defeats the player with more resources and wins the battlefield.For solving this two-stage resource assignment problem,nested solving and no-regret learning are proposed to search the optimal resource assignment strategies.Numerical experiments are taken to analyze the effectiveness of the proposed model and study the assignment strategies in various cases.展开更多
When the maneuverability of a pursuer is not significantly higher than that of an evader,it will be difficult to intercept the evader with only one pursuer.Therefore,this article adopts a two-to-one differential game ...When the maneuverability of a pursuer is not significantly higher than that of an evader,it will be difficult to intercept the evader with only one pursuer.Therefore,this article adopts a two-to-one differential game strategy,the game of kind is generally considered to be angle-optimized,which allows unlimited turns,but these practices do not take into account the effect of acceleration,which does not correspond to the actual situation,thus,based on the angle-optimized,the acceleration optimization and the acceleration upper bound constraint are added into the game for consideration.A two-to-one differential game problem is proposed in the three-dimensional space,and an improved multi-objective grey wolf optimization(IMOGWO)algorithm is proposed to solve the optimal game point of this problem.With the equations that describe the relative motions between the pursuers and the evader in the three-dimensional space,a multi-objective function with constraints is given as the performance index to design an optimal strategy for the differential game.Then the optimal game point is solved by using the IMOGWO algorithm.It is proved based on Markov chains that with the IMOGWO,the Pareto solution set is the solution of the differential game.Finally,it is verified through simulations that the pursuers can capture the escapee,and via comparative experiments,it is shown that the IMOGWO algorithm performs well in terms of running time and memory usage.展开更多
The article studies the evolutionary dynamics of two-population two-strategy game models with and without impulses. First, the payment matrix is given and two evolutionary dynamics models are established by adding sto...The article studies the evolutionary dynamics of two-population two-strategy game models with and without impulses. First, the payment matrix is given and two evolutionary dynamics models are established by adding stochastic and impulse. For the stochastic model without impulses, the existence and uniqueness of solution, and the existence of positive periodic solutions are proved, and a sufficient condition for strategy extinction is given. For the stochastic model with impulses, the existence of positive periodic solutions is proved. Numerical results show that noise and impulses directly affect the model, but the periodicity of the model does not change.展开更多
The Stackelberg prediction game(SPG)is a bilevel optimization frame-work for modeling strategic interactions between a learner and a follower.Existing meth-ods for solving this problem with general loss functions are ...The Stackelberg prediction game(SPG)is a bilevel optimization frame-work for modeling strategic interactions between a learner and a follower.Existing meth-ods for solving this problem with general loss functions are computationally expensive and scarce.We propose a novel hyper-gradient type method with a warm-start strategy to address this challenge.Particularly,we first use a Taylor expansion-based approach to obtain a good initial point.Then we apply a hyper-gradient descent method with an ex-plicit approximate hyper-gradient.We establish the convergence results of our algorithm theoretically.Furthermore,when the follower employs the least squares loss function,our method is shown to reach an e-stationary point by solving quadratic subproblems.Numerical experiments show our algorithms are empirically orders of magnitude faster than the state-of-the-art.展开更多
成果名称:Shapley's Conjecture on the Cores of Abstract Market Games主要作者:曹志刚,秦承忠,杨晓光奖项类别:著作论文奖获奖等级:二等奖获奖论文《Shapley's Conjecture on the Cores of Abstract Market Games》发表于博...成果名称:Shapley's Conjecture on the Cores of Abstract Market Games主要作者:曹志刚,秦承忠,杨晓光奖项类别:著作论文奖获奖等级:二等奖获奖论文《Shapley's Conjecture on the Cores of Abstract Market Games》发表于博弈论领域顶级期刊《Games and Economic Behavior》2018年第2期。论文研究成果初步解决了诺贝尔经济学奖获得者罗伊德·沙普利(Lloyd S. Shapley)提出的抽象市场博弈核非空的猜想。展开更多
This paper presents a mode-switching collaborative defense strategy for spacecraft pursuit-evasiondefense scenarios.In these scenarios,the pursuer tries to avoid the defender while capturing the evader,while the evade...This paper presents a mode-switching collaborative defense strategy for spacecraft pursuit-evasiondefense scenarios.In these scenarios,the pursuer tries to avoid the defender while capturing the evader,while the evader and defender form an alliance to prevent the pursuer from achieving its goal.First,the behavioral modes of the pursuer,including attack and avoidance modes,were established using differential game theory.These modes are then recognized by an interactive multiple model-matching algorithm(IMM),that uses several smooth variable structure filters to match the modes of the pursuer and update their probabilities in real time.Based on the linear-quadratic optimization theory,combined with the results of strategy identification,a two-way cooperative optimal strategy for the defender and evader is proposed,where the evader aids the defender to intercept the pursuer by performing luring maneuvers.Simulation results show that the interactive multi-model algorithm based on several smooth variable structure filters perform well in the strategy identification of the pursuer,and the cooperative defense strategy based on strategy identification has good interception performance when facing pursuers,who are able to flexibly adjust their game objectives.展开更多
This paper comprehensively explores the impulsive on-orbit inspection game problem utilizing reinforcement learning and game training methods.The purpose of the spacecraft is to inspect the entire surface of a non-coo...This paper comprehensively explores the impulsive on-orbit inspection game problem utilizing reinforcement learning and game training methods.The purpose of the spacecraft is to inspect the entire surface of a non-cooperative target with active maneuverability in front lighting.First,the impulsive orbital game problem is formulated as a turn-based sequential game problem.Second,several typical relative orbit transfers are encapsulated into modules to construct a parameterized action space containing discrete modules and continuous parameters,and multi-pass deep Q-networks(MPDQN)algorithm is used to implement autonomous decision-making.Then,a curriculum learning method is used to gradually increase the difficulty of the training scenario.The backtracking proportional self-play training framework is used to enhance the agent’s ability to defeat inconsistent strategies by building a pool of opponents.The behavior variations of the agents during training indicate that the intelligent game system gradually evolves towards an equilibrium situation.The restraint relations between the agents show that the agents steadily improve the strategy.The influence of various factors on game results is tested.展开更多
This paper investigates impulsive orbital attack-defense(AD)games under multiple constraints and victory conditions,involving three spacecraft:attacker,target,and defender.In the AD scenario,the attacker aims to breac...This paper investigates impulsive orbital attack-defense(AD)games under multiple constraints and victory conditions,involving three spacecraft:attacker,target,and defender.In the AD scenario,the attacker aims to breach the defender's interception to rendezvous with the target,while the defender seeks to protect the target by blocking or actively pursuing the attacker.Four different maneuvering constraints and five potential game outcomes are incorporated to more accurately model AD game problems and increase complexity,thereby reducing the effectiveness of traditional methods such as differential games and game-tree searches.To address these challenges,this study proposes a multiagent deep reinforcement learning solution with variable reward functions.Two attack strategies,Direct attack(DA)and Bypass attack(BA),are developed for the attacker,each focusing on different mission priorities.Similarly,two defense strategies,Direct interdiction(DI)and Collinear interdiction(CI),are designed for the defender,each optimizing specific defensive actions through tailored reward functions.Each reward function incorporates both process rewards(e.g.,distance and angle)and outcome rewards,derived from physical principles and validated via geometric analysis.Extensive simulations of four strategy confrontations demonstrate average defensive success rates of 75%for DI vs.DA,40%for DI vs.BA,80%for CI vs.DA,and 70%for CI vs.BA.Results indicate that CI outperforms DI for defenders,while BA outperforms DA for attackers.Moreover,defenders achieve their objectives more effectively under identical maneuvering capabilities.Trajectory evolution analyses further illustrate the effectiveness of the proposed variable reward function-driven strategies.These strategies and analyses offer valuable guidance for practical orbital defense scenarios and lay a foundation for future multi-agent game research.展开更多
To address the confrontation decision-making issues in multi-round air combat,a dynamic game decision method is proposed based on decision tree for the confrontation of unmanned aerial vehicle(UAV)air combat.Based on ...To address the confrontation decision-making issues in multi-round air combat,a dynamic game decision method is proposed based on decision tree for the confrontation of unmanned aerial vehicle(UAV)air combat.Based on game the-ory and the confrontation characteristics of air combat,a dynamic game process is constructed including the strategy sets,the situation information,and the maneuver decisions for both sides of air combat.By analyzing the UAV’s flight dyna-mics and the both sides’information,a payment matrix is estab-lished through the situation advantage function,performance advantage function,and profit function.Furthermore,the dynamic game decision problem is solved based on the linear induction method to obtain the Nash equilibrium solution,where the decision tree method is introduced to obtain the optimal maneuver decision,thereby improving the situation advantage in the next round of confrontation.According to the analysis,the simulation results for the confrontation scenarios of multi-round air combat are presented to verify the effectiveness and advan-tages of the proposed method.展开更多
As a crucial process in the coordinated strikes of unmanned aerial vehicles(UAVs), weapon-target assignment is vital for optimizing the allocation of available weapons and effectively exploiting the capabilities of UA...As a crucial process in the coordinated strikes of unmanned aerial vehicles(UAVs), weapon-target assignment is vital for optimizing the allocation of available weapons and effectively exploiting the capabilities of UAVs. Existing weapon-target assignment methods primarily focus on macro cluster constraints while neglecting individual strategy updates. This paper proposes a novel weapon-target assignment method for UAVs based on the multi-strategy threshold public goods game(PGG). By analyzing the concept mapping between weapon-target assignment for UAVs and multi-strategy threshold PGG, a weapon-target assignment model for UAVs based on the multi-strategy threshold PGG is established, which is adaptively complemented by the diverse cooperation-defection strategy library and the utility function based on the threshold mechanism. Additionally, a multi-chain Markov is formulated to quantitatively describe the stochastic evolutionary dynamics, whose evolutionary stable distribution is theoretically derived through the development of a strategy update rule based on preference-based aspiration dynamic. Numerical simulation results validate the feasibility and effectiveness of the proposed method, and the impacts of selection intensity, preference degree and threshold on the evolutionary stable distribution are analyzed. Comparative simulations show that the proposed method outperforms GWO, DE, and NSGA-II, achieving 17.18% higher expected utility than NSGA-II and reducing evolutionary stable times by 25% in large-scale scenario.展开更多
为改善户外哺乳环境,有效满足母婴群体户外哺乳需求,集成AHP-QFD-TRIZ理论方法应用于户外哺乳坐具设计中,以完成户外哺乳坐具设计,优化母婴群体户外游玩体验。首先基于用户访谈收集用户需求,然后借助层次分析法(Analytic Hierarchy Proc...为改善户外哺乳环境,有效满足母婴群体户外哺乳需求,集成AHP-QFD-TRIZ理论方法应用于户外哺乳坐具设计中,以完成户外哺乳坐具设计,优化母婴群体户外游玩体验。首先基于用户访谈收集用户需求,然后借助层次分析法(Analytic Hierarchy Process,AHP)求解用户需求权重,再经由质量功能展开(Quality Function Deployment,QFD)将用户需求映射为哺乳坐具的设计要求,识别出重点设计要求及其存在的三对技术矛盾组,在此基础上引入发明问题解决理论(Theory of Inventive Problem Solving,TRIZ)寻求解域,指导完成户外哺乳坐具的方案设计。集成AHP-QFD-TRIZ理论可有效帮助设计人员在设计活动中识别出因满足用户关键需求而可能存在的技术矛盾,并迅速寻找最佳解,进而实现户外哺乳坐具的创新设计,提升设计效率和设计可行性。展开更多
基金supported by the National Natural Science Foundation of China(61702528,61806212,62173336)。
文摘In strategic decision-making tasks,determining how to assign limited costly resource towards the defender and the attacker is a central problem.However,it is hard for pre-allocated resource assignment to adapt to dynamic fighting scenarios,and exists situations where the scenario and rule of the Colonel Blotto(CB)game are too restrictive in real world.To address these issues,a support stage is added as supplementary for pre-allocated results,in which a novel two-stage competitive resource assignment problem is formulated based on CB game and stochastic Lanchester equation(SLE).Further,the force attrition in these two stages is formulated as a stochastic progress to consider the complex fighting progress,including the case that the player with fewer resources defeats the player with more resources and wins the battlefield.For solving this two-stage resource assignment problem,nested solving and no-regret learning are proposed to search the optimal resource assignment strategies.Numerical experiments are taken to analyze the effectiveness of the proposed model and study the assignment strategies in various cases.
基金National Natural Science Foundation of China(NSFC61773142,NSFC62303136)。
文摘When the maneuverability of a pursuer is not significantly higher than that of an evader,it will be difficult to intercept the evader with only one pursuer.Therefore,this article adopts a two-to-one differential game strategy,the game of kind is generally considered to be angle-optimized,which allows unlimited turns,but these practices do not take into account the effect of acceleration,which does not correspond to the actual situation,thus,based on the angle-optimized,the acceleration optimization and the acceleration upper bound constraint are added into the game for consideration.A two-to-one differential game problem is proposed in the three-dimensional space,and an improved multi-objective grey wolf optimization(IMOGWO)algorithm is proposed to solve the optimal game point of this problem.With the equations that describe the relative motions between the pursuers and the evader in the three-dimensional space,a multi-objective function with constraints is given as the performance index to design an optimal strategy for the differential game.Then the optimal game point is solved by using the IMOGWO algorithm.It is proved based on Markov chains that with the IMOGWO,the Pareto solution set is the solution of the differential game.Finally,it is verified through simulations that the pursuers can capture the escapee,and via comparative experiments,it is shown that the IMOGWO algorithm performs well in terms of running time and memory usage.
基金Supported by the National Natural Science Foundation of China(10671182)。
文摘The article studies the evolutionary dynamics of two-population two-strategy game models with and without impulses. First, the payment matrix is given and two evolutionary dynamics models are established by adding stochastic and impulse. For the stochastic model without impulses, the existence and uniqueness of solution, and the existence of positive periodic solutions are proved, and a sufficient condition for strategy extinction is given. For the stochastic model with impulses, the existence of positive periodic solutions is proved. Numerical results show that noise and impulses directly affect the model, but the periodicity of the model does not change.
文摘The Stackelberg prediction game(SPG)is a bilevel optimization frame-work for modeling strategic interactions between a learner and a follower.Existing meth-ods for solving this problem with general loss functions are computationally expensive and scarce.We propose a novel hyper-gradient type method with a warm-start strategy to address this challenge.Particularly,we first use a Taylor expansion-based approach to obtain a good initial point.Then we apply a hyper-gradient descent method with an ex-plicit approximate hyper-gradient.We establish the convergence results of our algorithm theoretically.Furthermore,when the follower employs the least squares loss function,our method is shown to reach an e-stationary point by solving quadratic subproblems.Numerical experiments show our algorithms are empirically orders of magnitude faster than the state-of-the-art.
文摘成果名称:Shapley's Conjecture on the Cores of Abstract Market Games主要作者:曹志刚,秦承忠,杨晓光奖项类别:著作论文奖获奖等级:二等奖获奖论文《Shapley's Conjecture on the Cores of Abstract Market Games》发表于博弈论领域顶级期刊《Games and Economic Behavior》2018年第2期。论文研究成果初步解决了诺贝尔经济学奖获得者罗伊德·沙普利(Lloyd S. Shapley)提出的抽象市场博弈核非空的猜想。
基金the Science and Technology Department,Heilongjiang Province under Grant Agreement No JJ2022LH0315。
文摘This paper presents a mode-switching collaborative defense strategy for spacecraft pursuit-evasiondefense scenarios.In these scenarios,the pursuer tries to avoid the defender while capturing the evader,while the evader and defender form an alliance to prevent the pursuer from achieving its goal.First,the behavioral modes of the pursuer,including attack and avoidance modes,were established using differential game theory.These modes are then recognized by an interactive multiple model-matching algorithm(IMM),that uses several smooth variable structure filters to match the modes of the pursuer and update their probabilities in real time.Based on the linear-quadratic optimization theory,combined with the results of strategy identification,a two-way cooperative optimal strategy for the defender and evader is proposed,where the evader aids the defender to intercept the pursuer by performing luring maneuvers.Simulation results show that the interactive multi-model algorithm based on several smooth variable structure filters perform well in the strategy identification of the pursuer,and the cooperative defense strategy based on strategy identification has good interception performance when facing pursuers,who are able to flexibly adjust their game objectives.
文摘This paper comprehensively explores the impulsive on-orbit inspection game problem utilizing reinforcement learning and game training methods.The purpose of the spacecraft is to inspect the entire surface of a non-cooperative target with active maneuverability in front lighting.First,the impulsive orbital game problem is formulated as a turn-based sequential game problem.Second,several typical relative orbit transfers are encapsulated into modules to construct a parameterized action space containing discrete modules and continuous parameters,and multi-pass deep Q-networks(MPDQN)algorithm is used to implement autonomous decision-making.Then,a curriculum learning method is used to gradually increase the difficulty of the training scenario.The backtracking proportional self-play training framework is used to enhance the agent’s ability to defeat inconsistent strategies by building a pool of opponents.The behavior variations of the agents during training indicate that the intelligent game system gradually evolves towards an equilibrium situation.The restraint relations between the agents show that the agents steadily improve the strategy.The influence of various factors on game results is tested.
基金supported by National Key R&D Program of China:Gravitational Wave Detection Project(Grant Nos.2021YFC22026,2021YFC2202601,2021YFC2202603)National Natural Science Foundation of China(Grant Nos.12172288 and 12472046)。
文摘This paper investigates impulsive orbital attack-defense(AD)games under multiple constraints and victory conditions,involving three spacecraft:attacker,target,and defender.In the AD scenario,the attacker aims to breach the defender's interception to rendezvous with the target,while the defender seeks to protect the target by blocking or actively pursuing the attacker.Four different maneuvering constraints and five potential game outcomes are incorporated to more accurately model AD game problems and increase complexity,thereby reducing the effectiveness of traditional methods such as differential games and game-tree searches.To address these challenges,this study proposes a multiagent deep reinforcement learning solution with variable reward functions.Two attack strategies,Direct attack(DA)and Bypass attack(BA),are developed for the attacker,each focusing on different mission priorities.Similarly,two defense strategies,Direct interdiction(DI)and Collinear interdiction(CI),are designed for the defender,each optimizing specific defensive actions through tailored reward functions.Each reward function incorporates both process rewards(e.g.,distance and angle)and outcome rewards,derived from physical principles and validated via geometric analysis.Extensive simulations of four strategy confrontations demonstrate average defensive success rates of 75%for DI vs.DA,40%for DI vs.BA,80%for CI vs.DA,and 70%for CI vs.BA.Results indicate that CI outperforms DI for defenders,while BA outperforms DA for attackers.Moreover,defenders achieve their objectives more effectively under identical maneuvering capabilities.Trajectory evolution analyses further illustrate the effectiveness of the proposed variable reward function-driven strategies.These strategies and analyses offer valuable guidance for practical orbital defense scenarios and lay a foundation for future multi-agent game research.
基金supported by the Major Projects for Science and Technology Innovation 2030(2018AAA0100805).
文摘To address the confrontation decision-making issues in multi-round air combat,a dynamic game decision method is proposed based on decision tree for the confrontation of unmanned aerial vehicle(UAV)air combat.Based on game the-ory and the confrontation characteristics of air combat,a dynamic game process is constructed including the strategy sets,the situation information,and the maneuver decisions for both sides of air combat.By analyzing the UAV’s flight dyna-mics and the both sides’information,a payment matrix is estab-lished through the situation advantage function,performance advantage function,and profit function.Furthermore,the dynamic game decision problem is solved based on the linear induction method to obtain the Nash equilibrium solution,where the decision tree method is introduced to obtain the optimal maneuver decision,thereby improving the situation advantage in the next round of confrontation.According to the analysis,the simulation results for the confrontation scenarios of multi-round air combat are presented to verify the effectiveness and advan-tages of the proposed method.
基金supported by the National Natural Science Foundation of China (No. 62073267)。
文摘As a crucial process in the coordinated strikes of unmanned aerial vehicles(UAVs), weapon-target assignment is vital for optimizing the allocation of available weapons and effectively exploiting the capabilities of UAVs. Existing weapon-target assignment methods primarily focus on macro cluster constraints while neglecting individual strategy updates. This paper proposes a novel weapon-target assignment method for UAVs based on the multi-strategy threshold public goods game(PGG). By analyzing the concept mapping between weapon-target assignment for UAVs and multi-strategy threshold PGG, a weapon-target assignment model for UAVs based on the multi-strategy threshold PGG is established, which is adaptively complemented by the diverse cooperation-defection strategy library and the utility function based on the threshold mechanism. Additionally, a multi-chain Markov is formulated to quantitatively describe the stochastic evolutionary dynamics, whose evolutionary stable distribution is theoretically derived through the development of a strategy update rule based on preference-based aspiration dynamic. Numerical simulation results validate the feasibility and effectiveness of the proposed method, and the impacts of selection intensity, preference degree and threshold on the evolutionary stable distribution are analyzed. Comparative simulations show that the proposed method outperforms GWO, DE, and NSGA-II, achieving 17.18% higher expected utility than NSGA-II and reducing evolutionary stable times by 25% in large-scale scenario.
文摘为改善户外哺乳环境,有效满足母婴群体户外哺乳需求,集成AHP-QFD-TRIZ理论方法应用于户外哺乳坐具设计中,以完成户外哺乳坐具设计,优化母婴群体户外游玩体验。首先基于用户访谈收集用户需求,然后借助层次分析法(Analytic Hierarchy Process,AHP)求解用户需求权重,再经由质量功能展开(Quality Function Deployment,QFD)将用户需求映射为哺乳坐具的设计要求,识别出重点设计要求及其存在的三对技术矛盾组,在此基础上引入发明问题解决理论(Theory of Inventive Problem Solving,TRIZ)寻求解域,指导完成户外哺乳坐具的方案设计。集成AHP-QFD-TRIZ理论可有效帮助设计人员在设计活动中识别出因满足用户关键需求而可能存在的技术矛盾,并迅速寻找最佳解,进而实现户外哺乳坐具的创新设计,提升设计效率和设计可行性。