摘要
针对多飞行器智能协同控制中因规模大、环境复杂及资源受限导致的建模与协同难题,以提高决策算法效率为目标,构建了多智能体分层决策架构,提出了智能协同控制方法。首先,将飞行器作为智能体构建协同控制模型;其次,采用部分可观测马尔可夫决策过程模型解决观测信息不全问题;然后,针对博弈环境多变和学习成本问题,提出基于集中训练分布执行的分层双时延策略梯度强化学习方法,融合有模型(model-based)与无模型(model-free)机制高效利用现有博弈环境的演化模型;最后,在分层智能决策框架下,进行典型多飞行器博弈及千次多场景的仿真验证。结果表明,新方法有效解决多飞行器协同控制问题,相较于多智能体强化学习算法MAPPO和QMIX,训练时间分别减少了51.03%和79.03%,算法效率(累积回报)分别提升了37.51%和58.73%,规避机动成功率分别提高了17.63%和39.79%。
To address the modeling and coordination challenges in intelligent cooperative control of aircraft caused by large-scale systems,complex environments,and resource constraints,this study proposes an intelligent cooperative control method by establishing a hierarchical multi-agent decision-making architecture with the goal of improving decision-making algorithm efficiency.First,aircraft is modeled as an intelligent agent to establish a cooperative control framework.Second,a partially observable Markov decision process(POMDP)model is employed to handle incomplete observation information.Then,to tackle the issues of dynamic game environments and high learning costs,a hierarchical twin-delayed policy gradient reinforcement learning method based on centralized training with decentralized execution is proposed,which effectively combines model-based and model-free mechanisms to leverage existing game environment evolution models.Finally,under the hierarchical decision-making framework,simulations of typical multi-aircraft game scenarios and thousands of multi-scenario tests are conducted.The results demonstrate that the proposed method successfully resolves multi-aircraft cooperative control problem.Compared to the multi-agent reinforcement learning algorithms MAPPO and QMIX,the training time is reduced by 51.03%and 79.03%,algorithm efficiency(cumulative reward)is improved by 37.51%and 58.73%,and evasion maneuver success rate is increased by 17.63%and 39.79%,respectively.
作者
马宇
安豆
林熙祥
赵建福
张光华
牛鸿敏
MA Yu;AN Dou;LIN Xixiang;ZHAO Jianfu;ZHANG Guanghua;NIU Hongmin(School of Electronics and Control Engineering,Chang’an University,Xi’an 710064,China;School of Automation Science and Engineering,Xi’an Jiaotong University,Xi’an 710049,China)
出处
《西安交通大学学报》
北大核心
2025年第9期88-98,共11页
Journal of Xi'an Jiaotong University
基金
国家自然科学基金资助项目(62173268,62103318)
陕西省自然科学基金资助项目(2021JQ-288)。
关键词
智能决策
多飞行器智能协同控制
分层决策
强化学习
intelligent decision-making
multi-aircraft intelligent cooperative control
hierarchical decision
reinforcement learning
作者简介
马宇(1988-),男,讲师。