摘要
在传感器网络目标锁定过程中,针对如何保证快速有效锁定目标问题,提出了一种基于动作采样并借助UCB动作选择的多智能体强化学习(ASUCBQ)算法。该方法将多个传感器构建成一个多智能体系统,采用集中训练-分散执行(centralized training with decentralized execution,CTDE)的框架,在集中训练更新联合动作Q值和UCB值时,并没有遍历所有联合动作,而只对部分联合动作进行采样并求取最大Q值和UCB值。在动作选择和执行阶段,每个传感器又分别选择动作。此外,为避免局部最优情况的发生,该方法借助了基于置信度上界(upper confidence bound,UCB)的动作选择思想,通过对动作值估计的不确定性使传感器去探索更多的动作,通过对探索率的动态调整,更好地实现了强化学习“利用”和“探索”之间的平衡。仿真实验表明:该方法可以有效地锁定传感器网络中的目标,降低了在训练过程中的计算量。
In the process of target locking in sensor networks,a multi-agent reinforcement learning algorithm based on action sampling and action selection by UCB was proposed to ensure fast and effective target locking.In this method,multiple sensors are constructed into a multi-agent system,and the framework of centralized training with decentralized execution is adopted.When upda-ting the Q-value and UCB value of joint actions in centralized training,only part of joint actions are sampled and the maximum Q-value and UCB value are obtained.In the stage of action selection and execution,each sensor selects the action independently.In ad-dition,in order to avoid the occurrence of local optimal case,the method uses the idea of action selection based on the upper confi-dence bound,and makes the sensor explore more actions through the uncertainty of action estimation,through the dynamic adjustment of the exploration rate,the balance of reinforcement learning between“utilization”and“exploration”is better realized.The simula-tion results show that this method can effectively lock the target in the sensor network and reduce the amount of calculation in the training process.
作者
赵德京
李蔚
ZHAO Dejing;LI Wei(School of Automation,Qingdao University,Qingdao Shandong 266071;Shandong Provincial Key Laboratory of Indus-trial Control,Qingdao Shandong 266071;Qilu Business Department,Sinopec Chemical Sales Co.LTD,Zibo Shandong 255400)
出处
《自动化与仪器仪表》
2023年第6期213-218,共6页
Automation & Instrumentation
基金
国家自然科学基金项目(61903209)
青岛市博士后应用研究项目《基于多智能体强化学习的AGV路网设计和路径规划方法》。
关键词
传感器网络
强化学习
多智能体强化学习
动作采样
置信度上界
sensor networks
reinforcement learning
multi-agent reinforcement learning
action-sampling
upper confidence bound
作者简介
赵德京(1997-),男,山东青岛人,硕士研究生,主要研究方向为强化学习。