期刊文献+

基于自适应状态聚集Q学习的移动机器人动态规划方法 被引量:3

A Dynamic Planning Method for Mobile Robot Based on Adaptive State Aggregating Q-Learning
在线阅读 下载PDF
导出
摘要 针对现有移动机器人路径规划方法存在的收敛速度慢和难以进行在线规划的问题,研究了一种基于状态聚集SOM网和带资格迹Q学习的移动机器人路径动态规划方法——SQ(λ);首先,设计了系统的总体闭环规划模型,将整个系统分为前端(状态聚集)和后端(路径规划);然后,在传统的SOM基础上增加输出层构建出三层的SOM网实现对移动机器人状态的聚集,并给出了三层SOM网的训练算法;最后,基于聚集的状态提出了一种基于带资格迹和探索因子自适应变化的改进Q学习算法实现最优策略的获取,并能根据改进Q学习算法的收敛速度自适应地控制前端SOM输出层神经元的增减,从而改进整体算法的收敛性能;仿真实验表明:文中设计的SQ(λ)能有效地实现移动机器人的路径规划,较其它算法相比,具有收敛速度快和寻优能力强的优点,具有较大的优越性。 Aiming at the given path planning method for mobile robot has the slow convergence rate and hard to plan on--line, a dynamic path planning method based on state aggregating SOM net and Q-Learning is researched. Firstly, the planning model of whole system is de- signed and it is divided into two parts such as frontier part (state aggregating) and back part (path planning), then the three--layer SOM net is developed to realize the aggregation of states based on the traditional SOM, the training algorithm for three--layer SOM net is given. Fi- nally, a algorithm for obtaining the optimal strategy based on eligibility trace and adaptive changing explore factor is proposed, and the num- ber of output nodes of SOM can be adaptive increase or decrease according to the convergence extent of the Q (λ), therefore, the whole con- vergence can be guaranteed by the improved algorithm. The simulation experiment shows the method designed can realize the path planning, and compared with the other methods, it has the rapid convergence rate and the ability to get the optimal solution, and it is proved to be has big priority over the other methods.
作者 王辉 宋昌统
出处 《计算机测量与控制》 北大核心 2014年第10期3419-3422,共4页 Computer Measurement &Control
基金 江苏省高校自然科学研究计划(03kjd520075)
关键词 移动机器人 路径规划 状态聚集 Q学习 mobile robot path planning state aggregate Q learning
作者简介 王辉(1980-),女,江苏丹阳人,讲师,硕士研究生,主要从事虚拟现实和人工智能方向的研究。
  • 相关文献

参考文献12

  • 1Schaal S, Atkeson C. Learning control in robotics [J] . IEEE Ro- botics &. Automation Magazine, 2010, 17 (3): 20- 29.
  • 2Er M J, Zhou Y. A novel framework for automatic generation of fuzzy neural networks [J]. Neurocomputing, 2008, (71): 584-591.
  • 3宋勇,李贻斌,李彩虹.移动机器人路径规划强化学习的初始化[J].控制理论与应用,2012,29(12):1623-1628. 被引量:27
  • 4王金秋,孙晓松,秦华.基于强化学习的爬壁机器人路径规划方法[J].计算机测量与控制,2013,21(11):3093-3095. 被引量:4
  • 5Sutton R S, Barto A O. Reinforcement learning: an introduction [M]. Cambridge: MITPress, 1998.
  • 6Geist M, Pietquin O. Parametric value function approximation: A unified view [A]. Proc. of the IEEE Symposium on Adaptive Dy- namic Programming and Reinforcement Learning [C]. NJ : IEEE, 2011: 9-16.
  • 7Maei H R, Szepesvari C, Bhatnagar S, et al. Toward off - policy learning control with function approximation [A] . Proc of the 27th International Conference on Machine Learning [C]. Haifa: Omnipress, 2010: 719- 726.
  • 8陈宗海,文锋,聂建斌,吴晓曙.基于节点生长k-均值聚类算法的强化学习方法[J].计算机研究与发展,2006,43(4):661-666. 被引量:13
  • 9Chang H S, Fu M C, Hu J, et al. Simulation-based Algorithms for Markov Decision Processes[M]. New York: Springer, 2007.
  • 10Cai Q, He H B, Man H. Imbalanced evolving self-organizing learning [J]. Neuro-computing, 2014, 133 (10): 258-270.

二级参考文献33

  • 1A.G.Barto,S.J.Bradtke,S.P.Sign.Learning to act using real-time dynamic programming[Master dissertation].Amherst:Department of Computer Science,University of Massachusetts,1995
  • 2C.S.Lin,H.Kim.Selection of learning parameters for CMAC-based adpative critic learning.IEEE Trans.Neural Networks,1999,6(3):642~647
  • 3A.J.Smith.Application of the self-organising map to reinforcement learning.Neural Networks,2002,15(8/9):1107~1124
  • 4I.Menache,S.Mannor,N.Shimkin.Basis function adaptation in temporal difference reinforcement learning.http://www.ee.technion.ac.il/people/shimkin/PREPRINTS/BasisAdaptation-Dec03.pdf,2003
  • 5R.S.Sutton,A.G.Barto.Reinforcement Learning:An Introduction.Cambridge:MIT Press,1998
  • 6G.J.Gordon.Chattering in SARSA.http://www-2.cs.emu.edu/~ ggordon/chatter.ps.gz,1996
  • 7A.W.Moore,C.G.Atkeson.The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces.Machine Learning,1995,21(3):199~233
  • 8W.T.B.Uther,M.M.Veloso.Tree based discretization for continuous state space reinforcement learning.AAAI' 98,Madison,Wisconsin,United States,1998
  • 9I.S.K.Lee,H.Y.K.Lau.Adaptive state space partitioning for reinforcement learning.Engineering Applications of Artificial Intelligence,2004,17(6):577~588
  • 10S.P.Singh,R.S.Sutton.Reinforcement learning with replacing eligibility traces.Machine Learning,1996,22(2):123~158

共引文献41

同被引文献34

引证文献3

二级引证文献7

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部