期刊文献+

基于蒙特卡洛Q值函数的多智能体决策方法 被引量:6

Multi-agent decision making using Monte Carlo Q-value function
原文传递
导出
摘要 多智能体决策问题是人工智能领域的研究热点.与单智能体决策问题相比,多智能体决策的策略搜索空间更大.分布式局部感知马尔可夫决策过程(Dec-POMDPs)建立了不确定环境下多智能体决策问题的通用模型,自提出以来受到很大关注,但是求解Dec-POMDPs问题计算复杂度高,内存占用大.基于此,提出一种新的Q值函数表示—–蒙特卡洛Q值函数(QMC),并从理论上证明QMC是最优Q值函数Q?的上界,能够保证启发式搜索到最优解;运用自适应抽样方法,平衡收敛准确性和求解时间的关系;结合启发式搜索的精确性和蒙特卡洛方法随机抽样的一般性,提出一种基于QMC的蒙特卡洛聚类/扩展算法(CEMC),CEMC整合了Q值函数求解和策略搜索过程,避免保存所有值函数,只按需求解.实验结果表明,CEMC在时间和内存占用上超过目前性能最好的使用紧凑Q值函数的启发式方法. Multi-agent decision making problems are very popular in artificial intelligence.Compared with single agent decision making problems,multi-agent decision making problems have larger policy space.Decentralized partially observable Markov decision processes(Dec-POMDPs)are general models for multi-agent decision making under uncertainty,which have caught much attention among researchers.Solving Dec-POMDPs has high computational complexity and takes much memory.This article presents a new Q-value function representation—Monte Carlo Q-value function(QMC),which is proved to be the upper bound of Q*.This guarantees that the optimal policy can be found.An adaptive sampling method is used to balance the precision of convergence and solving time.And an algorighm called clustering and expansion for Monte Carlo(CEMC)based on QMC is proposed,which combines the precision of heuristic search with the generality of Monte Carlo random sampling.This algorithm integrates Q-value function solving with policy search and calculates value functions as needed,which avoids the need to backup all Q-value functions.The experiments show that the proposed method outperforms the state-of-the-art heuristic methods,with the compact Q-value function.
作者 张健 潘耀宗 杨海涛 孙舒 赵洪利 ZHANG Jian;PAN Yao-zong;YANG Hai-tao;SUN Shu;ZHAO Hong-li;无(Space Engineering University,PAL Strategic Support Force,Beijing 101416,China;The 63628 Army of PLA,Sanhe 065201,China;The 63919 Army of PLA,Beijing 100089,China)
出处 《控制与决策》 EI CSCD 北大核心 2020年第3期637-644,共8页 Control and Decision
关键词 多智能体决策 蒙特卡洛 值函数 马尔可夫决策 multi-agent decision making Monte Carlo Q-value function Markov decision
作者简介 张健(1989-),男,博士,从事多智能体决策的研究,E-mail:zjconquer@126.com;潘耀宗(1984-),男,博士生,从事智能规划的研究,E-mail:panyaozong1284@163.com;通讯作者:杨海涛(1979-),男,副教授,博士,从事作战仿真系统的研究,E-mail:haitaoyang79@126.com;孙舒(1982-),女,工程师,博士,从事航天搜救的研究,E-mail:sunshu_susan@163.com;赵洪利(1964-),男,教授,博士生导师,从事作战筹划等研究,E-mail:zhlspace@sina.cn.
  • 相关文献

同被引文献29

引证文献6

二级引证文献19

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部