With the rapid development of artificial intelligence,intelligent air combat maneuver decision-making(ACMD)has garnered global attention.Although deep reinforcement learning provides a promising approach to ACMD,exist...With the rapid development of artificial intelligence,intelligent air combat maneuver decision-making(ACMD)has garnered global attention.Although deep reinforcement learning provides a promising approach to ACMD,existing methods often suffer from rigid reward functions and limited adaptability to evolving adversarial strategies.Moreover,most research assumes open airspace,overlooking the influence of potential obstacles.In this paper,we address one-on-one within-visual-range ACMD in obstructed environments,and propose an improved Soft Actor-Critic(SAC)algorithm trained under a curriculum self-play framework.A maneuver strategy mirroring inference module is integrated to estimate each other's likely positions when visual obstruction occurs.By leveraging curriculum learning to guide progressive experience accumulation and self-play for adversarial evolution,our method enhances both training efficiency and tactical diversity.We further integrate an attention mechanism that dynamically adjusts the weights of sub-rewards,enabling the learned policy to adapt to rapidly changing air combat situations.Numerical simulations demonstrate that our enhanced SAC converges more quickly and achieves higher win rates than other baseline methods.An animation is available at bilibili.com/video/BV1BHVszHE98 for better illustration.展开更多
基金support of the National Key Research and Development Plan(No.2021YFB3302501)the financial support of the National Science Foundation of China(No.12161076)the financial support of the Fundamental Research Funds for the Central Universities(No.DUT25GF207).
文摘With the rapid development of artificial intelligence,intelligent air combat maneuver decision-making(ACMD)has garnered global attention.Although deep reinforcement learning provides a promising approach to ACMD,existing methods often suffer from rigid reward functions and limited adaptability to evolving adversarial strategies.Moreover,most research assumes open airspace,overlooking the influence of potential obstacles.In this paper,we address one-on-one within-visual-range ACMD in obstructed environments,and propose an improved Soft Actor-Critic(SAC)algorithm trained under a curriculum self-play framework.A maneuver strategy mirroring inference module is integrated to estimate each other's likely positions when visual obstruction occurs.By leveraging curriculum learning to guide progressive experience accumulation and self-play for adversarial evolution,our method enhances both training efficiency and tactical diversity.We further integrate an attention mechanism that dynamically adjusts the weights of sub-rewards,enabling the learned policy to adapt to rapidly changing air combat situations.Numerical simulations demonstrate that our enhanced SAC converges more quickly and achieves higher win rates than other baseline methods.An animation is available at bilibili.com/video/BV1BHVszHE98 for better illustration.