期刊文献+

基于强化学习的移动机器人路径规划改进算法研究

Research on improved path planning algorithm of mobile robot based on reinforcement learning
在线阅读 下载PDF
导出
摘要 针对Q-learning算法学习效率较低,收敛速度慢并且在较危险的障碍物区域判断易于出错的问题,提出了一种动态调整探索因子的策略,解决了Q-learning算法探索和利用之间的矛盾。并提出Q-Sarsa算法,将两种算法的优点结合,解决了Q-learning探索策略过于激进而导致过高估计的问题,使得算法收敛速度更快,并找到一个更可靠的路径。实验通过栅格法构建地图,对比实验表明,改进后的算法比原算法在收敛速度和最终步数上有较大优化,能够较好地完成二维地图下的路径规划,验证改进算法的有效性和实用性。 The Q-learning algorithm has low learning efficiency,slow convergence speed and easy to make mistakes in dangerous obstacle areas.To solve this problem,a strategy of dynamically adjusting exploration factors is proposed to solve the contradiction be-tween exploration and utilization of Q-learning algorithm,avoid falling into local optimal and improve algorithm efficiency.Q-Sarsa al-gorithm is proposed,which combines the advantages of the two algorithms to solve the problem that Q-learning exploration strategy is too aggressive and leads to overestimation,which makes the algorithm converge faster and find a more reliable path.The experimental results show that compared with the original algorithm,the improved algorithm has greater optimization in convergence speed and final number of steps,which can better complete the path planning under two-dimensional map,and verify the effectiveness and practicabili-ty of the improved algorithm.
作者 张艳珠 张程 庄博 ZHANG Yanzhu;ZHANG Cheng;ZHUANG Bo(Shenyang Ligong University Automation and Electrical Engineering college,Liaoning Shenyang 110159)
出处 《通信与信息技术》 2024年第6期39-43,共5页 Communication & Information Technology
关键词 Q-LEARNING 路径规划 自适应 贪婪算法 Q-Sarsa Q-learning Path planning Adaptive Greedy algorithm Q-Sarsa
作者简介 张艳珠(1971-),女,副教授,博士,研究方向为分数阶控制系统、图像处理与识别技术、智能控制理论与算法等。
  • 相关文献

参考文献3

二级参考文献28

  • 1沈晶,顾国昌,刘海波.未知动态环境中基于分层强化学习的移动机器人路径规划[J].机器人,2006,28(5):544-547. 被引量:15
  • 2Sutton R S, Barto A G. Introduction to Reinforcement Learning [M]. Cambridge: MIT Press, 1998.
  • 3Liu C, Xu X, Hu D. Multiobjeetive reinforcement learning: A comprehensive overview [J]. IEEE Trans on Systems, Man, and Cybernetics, Part C: Applications and Reviews, 2013, 99(4): 1-13.
  • 4Sutton R S, Precup D, Singh S P. Between MDPs and semi MDPs : A framework for temporal abstraction in reinforcement learning [J]. Artificial Intelligence, 1999, 112 (1) : 181-211.
  • 5Parr R. Hierachieal control and learning for markov decision processes [D]. Berkeley: University of Californiac at Berkeley, 1998.
  • 6Hengst B. Discovering hierarchical reinforcement learning [D]. Sydney: University of New South Wales, 2003.
  • 7Dietterich T G. Hierarchical reinforcement learning with the MAXQ value function decomposition [J]. Journal of Artificial Intelligence Research, 2000, 13(1): 227-303.
  • 8Hwang K S, Lin H Y, Hsu Y P, et al. Self-organizing state aggregation for architecture design of Q-learning [J]. Information Sciences, 2011, 181(13) : 2813-2822.
  • 9Ng A Y, Harada D, Russell S. Policy invariance under reward transformations: theory and application to reward shaping [C] //Proc of the 16th Int Conf on Machine Learning. San Francisco: Morgan Kaufmann, 1999= 278-287.
  • 10Bianchi R A C, Ribeiro C H C, Costa A H R. Accelerating autonomous learning by using heuristic selection of actions [J]. Journal of Heuristics, 2008, 14(2): 135-168.

共引文献112

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部