摘要
在传统的Q-学习算法上,提出了一种改进算法ε-Q-Learning,并应用到路径规划中。引入了动态搜索因子,其根据环境的反馈来动态调整贪婪因子ε,如果一次从起点到终点的探索失败,则通过增大ε来使下一次探索的随机性增大,以免陷入局部优化困境;反之,则通过减少ε来增加目的性。本实验利用算法损失函数、运行效率、步数、总回报来评估算法表现。实验表明,ε-Q-Learning算法相比于已有的Q-Learning算法,不仅可以找到更优的路径,而且可以有效地减少迭代搜索的代价。
Traditional Q-Learning algorithm has the problems of too many random searches and slow convergence speed.Therefore,in this paper an improvedε-Q-Learning algorithm based on traditional Q-Learning algorithm was propased and applied to path planning.The key of this method is to introduce the dynamic search factor technology,which adjusts the greedy factor dynamically according to the feedback of the environment.If one exploration from the beginning to the end fails,the randomicity of the next exploration will be increased by increasing greedy factor,in order to avoid falling into the local optimization dilemma.Conversely,purpose will be increased by reducing greedy factor.The performance of the algorithm is evaluated by loss function,running efficiency,number of steps,and total return.Experiments show that compared with the existing Q-Learning algorithm,ε-Q-Learning can not only find a better optimal path,but also significantly reduce the cost of iterative searching.
作者
毛国君
顾世民
MAO Guojun;GU Shimin(Institute of Machine Learning and Intelligent Science, Fujian Universtiy of Technology, Fuzhou 350118, China)
出处
《太原理工大学学报》
CAS
北大核心
2021年第1期91-97,共7页
Journal of Taiyuan University of Technology
基金
国家自然科学基金资助项目(61773415)。
作者简介
通信作者:毛国君(1966-),博士,教授,主要从事数据挖掘,机器学习及大数据研究,(E-mail)maximmao@hotmail.com。