基于自适应状态聚集Q学习的移动机器人动态规划方法被引量：3

A Dynamic Planning Method for Mobile Robot Based on Adaptive State Aggregating Q-Learning

在线阅读下载PDF

导出

摘要针对现有移动机器人路径规划方法存在的收敛速度慢和难以进行在线规划的问题,研究了一种基于状态聚集SOM网和带资格迹Q学习的移动机器人路径动态规划方法——SQ(λ);首先,设计了系统的总体闭环规划模型,将整个系统分为前端(状态聚集)和后端(路径规划);然后,在传统的SOM基础上增加输出层构建出三层的SOM网实现对移动机器人状态的聚集,并给出了三层SOM网的训练算法;最后,基于聚集的状态提出了一种基于带资格迹和探索因子自适应变化的改进Q学习算法实现最优策略的获取,并能根据改进Q学习算法的收敛速度自适应地控制前端SOM输出层神经元的增减,从而改进整体算法的收敛性能;仿真实验表明:文中设计的SQ(λ)能有效地实现移动机器人的路径规划,较其它算法相比,具有收敛速度快和寻优能力强的优点,具有较大的优越性。 Aiming at the given path planning method for mobile robot has the slow convergence rate and hard to plan on--line, a dynamic path planning method based on state aggregating SOM net and Q-Learning is researched. Firstly, the planning model of whole system is de- signed and it is divided into two parts such as frontier part （state aggregating） and back part （path planning）, then the three--layer SOM net is developed to realize the aggregation of states based on the traditional SOM, the training algorithm for three--layer SOM net is given. Fi- nally, a algorithm for obtaining the optimal strategy based on eligibility trace and adaptive changing explore factor is proposed, and the num- ber of output nodes of SOM can be adaptive increase or decrease according to the convergence extent of the Q （λ）, therefore, the whole con- vergence can be guaranteed by the improved algorithm. The simulation experiment shows the method designed can realize the path planning, and compared with the other methods, it has the rapid convergence rate and the ability to get the optimal solution, and it is proved to be has big priority over the other methods.

作者王辉宋昌统

机构地区江苏大学计算机科学与通信工程学院镇江高等专科学校电子信息系

出处《计算机测量与控制》北大核心 2014年第10期3419-3422,共4页 Computer Measurement &Control

基金江苏省高校自然科学研究计划(03kjd520075)

关键词移动机器人路径规划状态聚集 Q学习 mobile robot path planning state aggregate Q learning

分类号 TP393 [自动化与计算机技术—计算机应用技术]

作者简介王辉（1980－），女，江苏丹阳人，讲师，硕士研究生，主要从事虚拟现实和人工智能方向的研究。

引文网络
相关文献

参考文献12

1Schaal S, Atkeson C. Learning control in robotics [J] . IEEE Ro- botics &. Automation Magazine, 2010, 17 (3): 20- 29.
2Er M J, Zhou Y. A novel framework for automatic generation of fuzzy neural networks [J]. Neurocomputing, 2008, (71): 584-591.
3宋勇,李贻斌,李彩虹.移动机器人路径规划强化学习的初始化[J].控制理论与应用,2012,29(12):1623-1628. 被引量：27
4王金秋,孙晓松,秦华.基于强化学习的爬壁机器人路径规划方法[J].计算机测量与控制,2013,21(11):3093-3095. 被引量：4
5Sutton R S, Barto A O. Reinforcement learning: an introduction [M]. Cambridge: MITPress, 1998.
6Geist M, Pietquin O. Parametric value function approximation: A unified view [A]. Proc. of the IEEE Symposium on Adaptive Dy- namic Programming and Reinforcement Learning [C]. NJ : IEEE, 2011: 9-16.
7Maei H R, Szepesvari C, Bhatnagar S, et al. Toward off - policy learning control with function approximation [A] . Proc of the 27th International Conference on Machine Learning [C]. Haifa: Omnipress, 2010: 719- 726.
8陈宗海,文锋,聂建斌,吴晓曙.基于节点生长k-均值聚类算法的强化学习方法[J].计算机研究与发展,2006,43(4):661-666. 被引量：13
9Chang H S, Fu M C, Hu J, et al. Simulation-based Algorithms for Markov Decision Processes[M]. New York: Springer, 2007.
10Cai Q, He H B, Man H. Imbalanced evolving self-organizing learning [J]. Neuro-computing, 2014, 133 (10): 258-270.

二级参考文献33

1A.G.Barto,S.J.Bradtke,S.P.Sign.Learning to act using real-time dynamic programming[Master dissertation].Amherst:Department of Computer Science,University of Massachusetts,1995
2C.S.Lin,H.Kim.Selection of learning parameters for CMAC-based adpative critic learning.IEEE Trans.Neural Networks,1999,6(3):642～647
3A.J.Smith.Application of the self-organising map to reinforcement learning.Neural Networks,2002,15(8/9):1107～1124
4I.Menache,S.Mannor,N.Shimkin.Basis function adaptation in temporal difference reinforcement learning.http://www.ee.technion.ac.il/people/shimkin/PREPRINTS/BasisAdaptation-Dec03.pdf,2003
5R.S.Sutton,A.G.Barto.Reinforcement Learning:An Introduction.Cambridge:MIT Press,1998
6G.J.Gordon.Chattering in SARSA.http://www-2.cs.emu.edu/～ ggordon/chatter.ps.gz,1996
7A.W.Moore,C.G.Atkeson.The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces.Machine Learning,1995,21(3):199～233
8W.T.B.Uther,M.M.Veloso.Tree based discretization for continuous state space reinforcement learning.AAAI' 98,Madison,Wisconsin,United States,1998
9I.S.K.Lee,H.Y.K.Lau.Adaptive state space partitioning for reinforcement learning.Engineering Applications of Artificial Intelligence,2004,17(6):577～588
10S.P.Singh,R.S.Sutton.Reinforcement learning with replacing eligibility traces.Machine Learning,1996,22(2):123～158

共引文献41

1金翔,王天霖,于鹏垚,赵勇.基于值迭代网络的路径规划算法[J].华中科技大学学报（自然科学版）,2020,48(2):91-96. 被引量：1
2刘洋,李建军.深度确定性策略梯度算法优化[J].辽宁工程技术大学学报（自然科学版）,2020(6):545-549. 被引量：2
3陈宗海,文锋,王智灵.基于自适应评价的非线性系统神经网络控制[J].控制与决策,2007,22(7):765-768. 被引量：5
4梁吉业,白亮,曹付元.基于新的距离度量的K-Modes聚类算法[J].计算机研究与发展,2010,47(10):1749-1755. 被引量：47
5郭振波,吕慧显,赵志刚,刘晓刚.基于微人工鱼群算法优化RBF神经网络[J].青岛理工大学学报,2010,31(6):127-132. 被引量：1
6夏丽丽.连续状态-连续行动强化学习[J].电脑知识与技术,2011,7(7):4669-4672. 被引量：2
7秦奇伟,梁吉业,钱宇华.一种基于邻域距离的聚类特征选择方法[J].计算机科学,2012,39(1):175-177. 被引量：9
8肖飞,刘全,傅启明,孙洪坤,高龙.基于自适应势函数塑造奖赏机制的梯度下降Sarsa(λ)算法[J].通信学报,2013,34(1):77-88. 被引量：6
9余涛,张水平.基于5要素试错更新算法SARSA(λ)的自动发电控制[J].控制理论与应用,2013,30(10):1246-1251. 被引量：2
10陈晓勇,顾晖,彭志娟.数据挖掘中K-均值聚类算法的缺陷及工作效率改进的实验研究[J].科学技术与工程,2013,21(34):10359-10363. 被引量：6

同被引文献34

1刘好德,杨晓光.基于改进遗传算法的公交线网优化设计研究[J].计算机工程与应用,2007,43(8):10-14. 被引量：16
2(?hen Y W, Hong H C. A fast- locking all- digital phase locked loop in 90nm CMOS for Gigascale systems EA3. Circuits and Sys- tems (ISCAS) EC~. 1gEE, 2014. 1134-1137.
3Elkholy A, Anand T, Choi W S, Elshazly A, Hanumolu P K. A 3.7 mW low-noise wide bandwidth 4.5 GHz digital fractional- N PLL using time amplifier-based TDC EJ~. Solid-State Cir- cuits, IEEE, 2015, 50 (4): 867-881.
4Geng H, Xu D, Wu B. A novel hardware-based all-digital phase-locked loop applied to grid-connected power converters EJ]. In- dustrial Electronics, IEEE, 2011, 58 (5) : 1737 - 1745.
5Singhal A, Madhu C, Kumar V. Designs of all digital phase locked loop E A1. Engineering and Computational Sciences (RAECS) ~C~. IEEE, 2014. 1 -5.
6王文理,张霞.基于FPGA的全数字锁相环的设计[J].电子设计工程,2009,17(1):39-40. 被引量：8
7李伟,蒲浩,彭先宝.基于方向加速法的铁路既有线平面重构优化算法[J].铁道科学与工程学报,2009,6(3):47-51. 被引量：19
8张东,吴晓琳.导弹遥测数据预处理方法研究[J].信息技术,2011(11):134-137. 被引量：6
9赵玮,齐向东.基于VHDL的数字锁相环设计及Modelsim仿真[J].机械工程与自动化,2013(2):57-59. 被引量：4
10彭咏龙,路智斌,李亚斌.基于FPGA的改进型全数字锁相环的设计[J].电源技术,2015,39(2):410-412. 被引量：6

引证文献3

1李凤华,刘丹丹,单长虹.一种可编程全数字锁相环的设计与实现[J].计算机测量与控制,2016,24(1):243-245. 被引量：3
2王万金,张志国.动态规划在多源遥测参数优选中的应用[J].电子技术与软件工程,2021(16):162-166.
3况立群,冯利,韩燮,贾炅昊,郭广行.基于双深度Q网络的智能决策系统研究[J].计算机技术与发展,2022,32(2):137-142. 被引量：4

二级引证文献7

1付东兵,徐洋洋,邱雅倩,姚亚峰.基于积分结构的数字正切锁相环改进设计[J].电视技术,2019,43(1):1-4. 被引量：1
2陈俊虎,梅建伟.高频淬火电源的频率跟踪控制系统[J].湖北汽车工业学院学报,2023,37(3):39-43.
3顾晔,陈甜妹,徐天天.基于区块链技术的数据中台安全性提高研究与分析[J].制造业自动化,2023,45(11):26-30. 被引量：3
4于星辉.基于深度Q网络的5G网络优化方法[J].通信电源技术,2023,40(19):168-170.
5李彬.自主可控可编程控制系统架构及关键技术分析[J].科技资讯,2024,22(3):19-22.
6车景平,王强,吉凡.基于深度强化学习的Super Mario Bros游戏智能训练[J].周口师范学院学报,2025,42(2):60-64.
7肖濛,周骏丰,周园春.基于强化学习的科学数据特征生成算法[J].计算机研究与发展,2025,62(9):2127-2138.

1徐陈锋,奚宏生,殷保群.一类混合资源定位服务的优化模型[J].微计算机应用,2008,29(9):6-11.
2孙羽,张汝波,徐东.强化学习中资格迹的作用[J].计算机工程,2002,28(5):128-129. 被引量：1
3刘智斌,曾晓勤,徐彦,禹继国.采用资格迹的神经网络学习控制算法[J].控制理论与应用,2015,32(7):887-894. 被引量：4
4傅启明,刘全,孙洪坤,高龙,李瑾,王辉.一种二阶TD Error快速Q(λ)算法[J].模式识别与人工智能,2013,26(3):282-292. 被引量：5
5颜昕,李腊元.多QoS约束的层次多播路由算法框架[J].计算机科学,2007,34(2):27-34. 被引量：2
6王雪松,程玉虎,易建强,王炜强.基于Elman网络的非线性系统增强式学习控制[J].中国矿业大学学报,2006,35(5):653-657. 被引量：8
7杨海.探索因子蚁群算法及其在VRP中的应用[J].科技信息,2009(31).
8张叶茂,杨晓武.基于改进蚁群算法的动态路径规划算法研究[J].西部交通科技,2017(3):89-93. 被引量：3
9沈智鹏,郭晨.带有资格迹的模糊CMAC控制仿真研究[J].系统仿真学报,2004,16(11):2604-2607.
10修保新,刘振亚,张维明,刘忠.基于粒度计算的指控系统效能测度的探索性建模方法[J].国防科技大学学报,2010,32(5):123-128. 被引量：6

计算机测量与控制

2014年第10期

浏览历史

内容加载中请稍等...

基于自适应状态聚集Q学习的移动机器人动态规划方法被引量：3

参考文献12

二级参考文献33

共引文献41

同被引文献34

引证文献3

二级引证文献7

相关作者

相关机构

相关主题

浏览历史

基于自适应状态聚集Q学习的移动机器人动态规划方法 被引量：3

参考文献12

二级参考文献33

共引文献41

同被引文献34

引证文献3

二级引证文献7

相关作者

相关机构

相关主题

浏览历史

基于自适应状态聚集Q学习的移动机器人动态规划方法被引量：3