Efficient Multiagent Policy Optimization Based on Weighted Estimators in Stochastic Cooperative Environments 被引量：2

导出

摘要 Multiagent deep reinforcement learning (MA-DRL) has received increasingly wide attention. Most of the existing MA-DRL algorithms, however, are still inefficient when faced with the non-stationarity due to agents changing behavior consistently in stochastic environments. This paper extends the weighted double estimator to multiagent domains and proposes an MA-DRL framework, named Weighted Double Deep Q-Network (WDDQN). By leveraging the weighted double estimator and the deep neural network, WDDQN can not only reduce the bias effectively but also handle scenarios with raw visual inputs. To achieve efficient cooperation in multiagent domains, we introduce a lenient reward network and scheduled replay strategy. Empirical results show that WDDQN outperforms an existing DRL algorithm (double DQN) and an MA-DRL algorithm (lenient Q-learning) regarding the averaged reward and the convergence speed and is more likely to converge to the Pareto-optimal Nash equilibrium in stochastic cooperative environments.

作者 Yan Zheng Jian-Ye Hao Zong-Zhang Zhang Zhao-Peng Meng Xiao-Tian Hao

机构地区 College of Intelligence and Computing National Key Laboratory for Novel Software Technology

出处《Journal of Computer Science & Technology》 SCIE EI CSCD 2020年第2期268-280,共13页 计算机科学技术学报（英文版）

基金 The work was supported by the National Natural Science Foundation of China under Grant Nos.61702362,U1836214,and 61876119 the Special Program of Artificial Intelligence of Tianjin Research Program of Application Foundation and Advanced Technology under Grant No.16JCQNJC00100 the Special Program of Artificial Intelligence of Tianjin Municipal Science and Technology Commission of China under Grant No.56917ZXRGGX00150 the Science and Technology Program of Tianjin of China under Grant Nos.15PTCYSY00030 and 16ZXHLGX00170 the Natural Science Foundation of Jiangsu Province of China under Grant No.BK20181432 Acknowledgments We thank our industrial re search partner Netease, Inc., especially the Fuxi AILaboratory of Leihuo Business Groups for their discus sion and support with the experiments.

关键词 deep REINFORCEMENT LEARNING MULTIAGENT system WEIGHTED double estimator LENIENT REINFORCEMENT LEARNING COOPERATIVE Markov game

分类号 TP181 [自动化与计算机技术—控制理论与控制工程]

作者简介 Yan Zheng received his Ph.Ddegree in software engineering, Tian jin University, Tianjin. He is nowa research fellow at Nanyang Tech nological University, Singapare, andalso a member of the Deep Reinforce ment Learning Laboratory in TianjinUniversity, Tianjin. His research in cludes deep reinforcement learning, and multiagent system.E-mail: yanzheng@tju.edu.cn;Jian-Ye Hao is now an associateprofessor at Tianjin University, Tainjin.He was a postdoctoral research fellowin the Computer Science and ArtificialIntelligence Laboratory (CSAIL) atMassachusetts Institute of Technology(MIT), Boston, from 2013 to 2015.His research interests include deepreinforcement learning, and multiagent systems.E-mail: jianye.hao@tju.edu.cn;Zong-Zhang Zhang received hisPh.D. degree in computer science fromUniversity of Science and Technology ofChina, Hefei, in 2012. He is currently anassociate professor at the National KeyLaboratory for Novel Software Techno logy, Nanjing University, Nanjing. Hisresearch interests include reinforcementlearning, intelligent planning, and multi-agent learning.E-mail:zzzhang@nju.edu.cn;Zhao-Peng Meng is a professorof Software Engineering and Dean ofSoftware College, Tianjin University,Tianjin, and is also a member of Soft ware Engineering Steering Committee,Ministry of Education of China. Heis also a leading director of TianjinAcademy of Computers. Dr. Menghad hosted and participated in more than 20 researchprojects including National Natural Science Foundation,Major National Science and Technology, and Economicand Information Technology Commission of Tianjin. Hisresearch interests are in the Internet of Things, intelligencetransport system and distributed multimedia.E-mail:memgzp@tju.edu.cn;Xiao-Tian Hao is now a Ph.D.candidate at the College of Intelligenceand Computing of Tianjin University,Tianjin. He works on deep reinforce ment learning and multiagent systems.He is also a member of the DeepReinforcement Learning Laboratory inTianjin University,E-mail:xiaotian@tju.edu.cn。

引文网络
相关文献

同被引文献2

1吴天栋,石英.不完美信息博弈中对手模型的研究[J].河南科技大学学报（自然科学版）,2019,40(1):54-59. 被引量：2
2罗俊仁,张万鹏,袁唯淋,胡振震,陈少飞,陈璟.面向多智能体博弈对抗的对手建模框架[J].系统仿真学报,2022,34(9):1941-1955. 被引量：11

引证文献2

1邓有朋,范佳宣,郑岩,王振亚,吕勇梁,李雨霄.不完全信息下多智能体对手建模[J].航空学报,2023,44(S02):443-452. 被引量：4
2王鼎盛,丁磊.基于MP-DDQN的智能交通信号灯控制算法[J].陕西科技大学学报,2025,43(2):196-202.

二级引证文献4

1胡振震,陈少飞,李鹏,陈佳星,张煜,陈璟.基于显式对手建模的一对一超视距空战策略认知[J].航空学报,2025,46(4):162-187.
2惠耀洛,许波,李秀敏,孙均政.一种基于对手建模的多飞行器协同拦截策略[J].宇航学报,2025,46(3):601-615.
3刘忠祥.蓝宝石磁性复合流体抛光应力场分析与弹塑性去除机制研究[J].建模与仿真,2025,14(6):135-141.
4程恺,张金鹏,邵天浩,邹世辰,于本川.智能博弈领域中的对手建模方法综述[J].计算机技术与发展,2025,35(9):1-8.

1Danan Gu,Runlong Huang,Kirill Andreev,Matthew E.Dupre,Yaer Zhuang,Hongyan Liu.Assessments of mortality at oldest-old ages by province in China's 2000 and 2010 censuses[J].International Journal of Population Studies,2016,2(2):1-25. 被引量：5
2Meijia Wang,Qingshan Li,Yishuai Lin.A Personalized Search Model Using Online Social Network Data Based on a Holonic Multiagent System[J].China Communications,2020,17(2):176-205. 被引量：2
3赵贵能.集中型馈线自动化在铁路通信中断时的故障处理[J].制造业自动化,2020,42(5):133-136.
4Stephen ANOKYE,Mohammed SEID,SUN Guolin.A Survey on Machine Learning Based Proactive Caching[J].ZTE Communications,2019,17(4):46-55. 被引量：2
5Jing Xiong,Fei Han.Positioning performance analysis on combined GPS/BDS precise point positioning[J].Geodesy and Geodynamics,2020,11(1):78-83. 被引量：8
6Qingmeng TAN,Yifei TONG,Shaofeng WU,Dongbo LI.Towards a next-generation production system for industrial robots: A CPS-based hybrid architecture for smart assembly shop floors with closed-loop dynamic cyber physical interactions[J].Frontiers of Mechanical Engineering,2020,15(1):1-11.
7史景坚,周文涛,张宁,陈桥,刘金涛,曹振博,陈懿,宋航,刘友波.含储能系统的配电网电压调节深度强化学习算法[J].电力建设,2020,41(3):71-78. 被引量：11
8Mohammed SEID,Stephen ANOKYE,SUN Guolin.Machine Learning Based Unmanned Aerial Vehicle Enabled Fog-Radio Access Network and Edge Computing[J].ZTE Communications,2019,17(4):33-45. 被引量：1
9Weizhong TIAN,Fengrong WEI,Thomas BROWN.Mixture network autoregressive model with application on students’successes[J].Frontiers of Mathematics in China,2020,15(1):141-154.
10Kuanqi CAI,Chaoqun WANG,Jiyu CHENG,Shuang SONG,Clarence W.DE SILVA,Max Q.-H.Meng.Mobile Robot Path Planningin Dynamic Environments:A Survey[J].Instrumentation,2019,6(2):90-100. 被引量：2

Journal of Computer Science & Technology

2020年第2期

浏览历史

内容加载中请稍等...

Efficient Multiagent Policy Optimization Based on Weighted Estimators in Stochastic Cooperative Environments 被引量：2

同被引文献2

引证文献2

二级引证文献4

相关作者

相关机构

相关主题

浏览历史