期刊文献+

改进的Q-Learning算法及其在路径规划中的应用 被引量:20

An Improved Q-Learning Algorithm and Its Application in Path Planning
在线阅读 下载PDF
导出
摘要 在传统的Q-学习算法上,提出了一种改进算法ε-Q-Learning,并应用到路径规划中。引入了动态搜索因子,其根据环境的反馈来动态调整贪婪因子ε,如果一次从起点到终点的探索失败,则通过增大ε来使下一次探索的随机性增大,以免陷入局部优化困境;反之,则通过减少ε来增加目的性。本实验利用算法损失函数、运行效率、步数、总回报来评估算法表现。实验表明,ε-Q-Learning算法相比于已有的Q-Learning算法,不仅可以找到更优的路径,而且可以有效地减少迭代搜索的代价。 Traditional Q-Learning algorithm has the problems of too many random searches and slow convergence speed.Therefore,in this paper an improvedε-Q-Learning algorithm based on traditional Q-Learning algorithm was propased and applied to path planning.The key of this method is to introduce the dynamic search factor technology,which adjusts the greedy factor dynamically according to the feedback of the environment.If one exploration from the beginning to the end fails,the randomicity of the next exploration will be increased by increasing greedy factor,in order to avoid falling into the local optimization dilemma.Conversely,purpose will be increased by reducing greedy factor.The performance of the algorithm is evaluated by loss function,running efficiency,number of steps,and total return.Experiments show that compared with the existing Q-Learning algorithm,ε-Q-Learning can not only find a better optimal path,but also significantly reduce the cost of iterative searching.
作者 毛国君 顾世民 MAO Guojun;GU Shimin(Institute of Machine Learning and Intelligent Science, Fujian Universtiy of Technology, Fuzhou 350118, China)
出处 《太原理工大学学报》 CAS 北大核心 2021年第1期91-97,共7页 Journal of Taiyuan University of Technology
基金 国家自然科学基金资助项目(61773415)。
关键词 路径规划 人工智能 强化学习 Q-LEARNING path planning artificial intelligence reinforcement learning Q-Learning
作者简介 通信作者:毛国君(1966-),博士,教授,主要从事数据挖掘,机器学习及大数据研究,(E-mail)maximmao@hotmail.com。
  • 相关文献

参考文献7

二级参考文献58

  • 1李伟,何雪松,叶庆泰,朱昌明.基于先验知识的强化学习系统[J].上海交通大学学报,2004,38(8):1362-1365. 被引量:5
  • 2窦全胜,周春光,马铭.粒子群优化的两种改进策略[J].计算机研究与发展,2005,42(5):897-904. 被引量:39
  • 3Tom M Mitchell.Machine learning[M].Beijing, China:Machine Press,2004:263-280.
  • 4Dayan P.The convergence of TD (λ) for general λ[J].Machine Learning, 1992(8):341-362.
  • 5Kaelbling L P, Littman M L,Moore A W.Reinforcement learning: A survey[J].Joumal of Artificial Intelligence Research, 1996(4): 237-285.
  • 6Watins P Dyna. Q_leaming [J]. Machine Learning, 1992,8 (3): 279-292.
  • 7Moor A W, Atkeson C G.Prioritized sweeping: Reinforcement learning with less data and less real time[J].Machine Learning, 1993,13:103-130.
  • 8Hu J, Wellman M ENash Q-learning for general-sum stochastic games [J]. Journal of Machine Learning Research, 2003 (4): 1039-1069.
  • 9Badtke S J,Barto R G.Linear least-squares algorithms for temporal differenee learning [J]. Machine Learning, 1996,22 (1-3): 33-57.
  • 10Bowling M.Convergence and no-regret in multiagent learning [C].Advances in Naural Information Processing Systems,2004.

共引文献375

同被引文献223

引证文献20

二级引证文献61

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部