期刊文献+

一种基于多智能体强化学习的传感器网络目标锁定方法 被引量:2

A Target Locking Method for Sensor Networks Based on Multi-agent Reinforcement Learning
原文传递
导出
摘要 在传感器网络目标锁定过程中,针对如何保证快速有效锁定目标问题,提出了一种基于动作采样并借助UCB动作选择的多智能体强化学习(ASUCBQ)算法。该方法将多个传感器构建成一个多智能体系统,采用集中训练-分散执行(centralized training with decentralized execution,CTDE)的框架,在集中训练更新联合动作Q值和UCB值时,并没有遍历所有联合动作,而只对部分联合动作进行采样并求取最大Q值和UCB值。在动作选择和执行阶段,每个传感器又分别选择动作。此外,为避免局部最优情况的发生,该方法借助了基于置信度上界(upper confidence bound,UCB)的动作选择思想,通过对动作值估计的不确定性使传感器去探索更多的动作,通过对探索率的动态调整,更好地实现了强化学习“利用”和“探索”之间的平衡。仿真实验表明:该方法可以有效地锁定传感器网络中的目标,降低了在训练过程中的计算量。 In the process of target locking in sensor networks,a multi-agent reinforcement learning algorithm based on action sampling and action selection by UCB was proposed to ensure fast and effective target locking.In this method,multiple sensors are constructed into a multi-agent system,and the framework of centralized training with decentralized execution is adopted.When upda-ting the Q-value and UCB value of joint actions in centralized training,only part of joint actions are sampled and the maximum Q-value and UCB value are obtained.In the stage of action selection and execution,each sensor selects the action independently.In ad-dition,in order to avoid the occurrence of local optimal case,the method uses the idea of action selection based on the upper confi-dence bound,and makes the sensor explore more actions through the uncertainty of action estimation,through the dynamic adjustment of the exploration rate,the balance of reinforcement learning between“utilization”and“exploration”is better realized.The simula-tion results show that this method can effectively lock the target in the sensor network and reduce the amount of calculation in the training process.
作者 赵德京 李蔚 ZHAO Dejing;LI Wei(School of Automation,Qingdao University,Qingdao Shandong 266071;Shandong Provincial Key Laboratory of Indus-trial Control,Qingdao Shandong 266071;Qilu Business Department,Sinopec Chemical Sales Co.LTD,Zibo Shandong 255400)
出处 《自动化与仪器仪表》 2023年第6期213-218,共6页 Automation & Instrumentation
基金 国家自然科学基金项目(61903209) 青岛市博士后应用研究项目《基于多智能体强化学习的AGV路网设计和路径规划方法》。
关键词 传感器网络 强化学习 多智能体强化学习 动作采样 置信度上界 sensor networks reinforcement learning multi-agent reinforcement learning action-sampling upper confidence bound
作者简介 赵德京(1997-),男,山东青岛人,硕士研究生,主要研究方向为强化学习。
  • 相关文献

参考文献2

二级参考文献10

共引文献14

同被引文献18

引证文献2

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部