期刊文献+

基于梯度熵正则化的改进型QMIX网络

An Improved QMIX Network Based on Gradient Entropy Regularization
在线阅读 下载PDF
导出
摘要 针对合作多智能体系统缺乏个体奖励信号的情况下,不同智能体贡献无法区分导致合作效率低下的问题,利用价值分解范式引入了信用分配可区分性评价指标,并提出一种基于梯度熵正则化的方法实现区分度较高的信用分配。在此基础上,结合多智能体深度强化学习算法,提出一种改进型QMIX网络。通过SMAC多智能体学习环境和Starcraft2自带的地图编辑器,建立相应仿真实验环境,结果表明新提出的改进型QMIX网络相较于QMIX网络,学习效率和整体性能均有所提升,更适用于部分可观测环境下的合作多智能体强化学习问题。 When cooperative multi-agent system lacks individual reward signals,the contribution of different agents cannot be distinguished,which leads to low cooperation efficiency.To solve the problem,the discriminability evaluation index of credit allocation is introduced by using the value decomposition paradigm,and a method based on gradient entropy regularization is proposed to achieve highly discriminable credit allocation.Based on this,an improved QMIX network is proposed by using the multi-agent deep reinforcement learning algorithm.Through SMAC multi-agent learning environment and Starcraft2 s built-in map editor,the corresponding simulation environment is established.The results show that the learning efficiency and overall performance of the improved QMIX network are improved compared with that of QMIX network,and it is more suitable for cooperative multi-agent reinforcement learning in partially observable environment.
作者 卢锐 彭鹏菲 LU Rui;PENG Pengfei(Naval University of Engineering,Wuhan 430000,China)
机构地区 海军工程大学
出处 《电光与控制》 CSCD 北大核心 2023年第4期78-82,99,共6页 Electronics Optics & Control
关键词 多智能体 强化学习 信用分配 梯度熵 multi-agent reinforcement learning credit allocation gradient entropy
作者简介 卢锐(1999-),男,湖北黄冈人,硕士生。
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部