期刊文献+

Efficient Multiagent Policy Optimization Based on Weighted Estimators in Stochastic Cooperative Environments 被引量:2

原文传递
导出
摘要 Multiagent deep reinforcement learning (MA-DRL) has received increasingly wide attention. Most of the existing MA-DRL algorithms, however, are still inefficient when faced with the non-stationarity due to agents changing behavior consistently in stochastic environments. This paper extends the weighted double estimator to multiagent domains and proposes an MA-DRL framework, named Weighted Double Deep Q-Network (WDDQN). By leveraging the weighted double estimator and the deep neural network, WDDQN can not only reduce the bias effectively but also handle scenarios with raw visual inputs. To achieve efficient cooperation in multiagent domains, we introduce a lenient reward network and scheduled replay strategy. Empirical results show that WDDQN outperforms an existing DRL algorithm (double DQN) and an MA-DRL algorithm (lenient Q-learning) regarding the averaged reward and the convergence speed and is more likely to converge to the Pareto-optimal Nash equilibrium in stochastic cooperative environments.
出处 《Journal of Computer Science & Technology》 SCIE EI CSCD 2020年第2期268-280,共13页 计算机科学技术学报(英文版)
基金 The work was supported by the National Natural Science Foundation of China under Grant Nos.61702362,U1836214,and 61876119 the Special Program of Artificial Intelligence of Tianjin Research Program of Application Foundation and Advanced Technology under Grant No.16JCQNJC00100 the Special Program of Artificial Intelligence of Tianjin Municipal Science and Technology Commission of China under Grant No.56917ZXRGGX00150 the Science and Technology Program of Tianjin of China under Grant Nos.15PTCYSY00030 and 16ZXHLGX00170 the Natural Science Foundation of Jiangsu Province of China under Grant No.BK20181432 Acknowledgments We thank our industrial re search partner Netease, Inc., especially the Fuxi AILaboratory of Leihuo Business Groups for their discus sion and support with the experiments.
作者简介 Yan Zheng received his Ph.Ddegree in software engineering, Tian jin University, Tianjin. He is nowa research fellow at Nanyang Tech nological University, Singapare, andalso a member of the Deep Reinforce ment Learning Laboratory in TianjinUniversity, Tianjin. His research in cludes deep reinforcement learning, and multiagent system.E-mail: yanzheng@tju.edu.cn;Jian-Ye Hao is now an associateprofessor at Tianjin University, Tainjin.He was a postdoctoral research fellowin the Computer Science and ArtificialIntelligence Laboratory (CSAIL) atMassachusetts Institute of Technology(MIT), Boston, from 2013 to 2015.His research interests include deepreinforcement learning, and multiagent systems.E-mail: jianye.hao@tju.edu.cn;Zong-Zhang Zhang received hisPh.D. degree in computer science fromUniversity of Science and Technology ofChina, Hefei, in 2012. He is currently anassociate professor at the National KeyLaboratory for Novel Software Techno logy, Nanjing University, Nanjing. Hisresearch interests include reinforcementlearning, intelligent planning, and multi-agent learning.E-mail:zzzhang@nju.edu.cn;Zhao-Peng Meng is a professorof Software Engineering and Dean ofSoftware College, Tianjin University,Tianjin, and is also a member of Soft ware Engineering Steering Committee,Ministry of Education of China. Heis also a leading director of TianjinAcademy of Computers. Dr. Menghad hosted and participated in more than 20 researchprojects including National Natural Science Foundation,Major National Science and Technology, and Economicand Information Technology Commission of Tianjin. Hisresearch interests are in the Internet of Things, intelligencetransport system and distributed multimedia.E-mail:memgzp@tju.edu.cn;Xiao-Tian Hao is now a Ph.D.candidate at the College of Intelligenceand Computing of Tianjin University,Tianjin. He works on deep reinforce ment learning and multiagent systems.He is also a member of the DeepReinforcement Learning Laboratory inTianjin University,E-mail:xiaotian@tju.edu.cn。
  • 相关文献

同被引文献2

引证文献2

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部