摘要
结合蒙特卡罗树搜索与深度神经网络的深度强化学习方法,已经成为解决复杂博弈问题的基准方法,但仍面临奖励稀疏及训练成本高等问题。为此,提出引入威胁空间搜索的五子棋深度强化学习方法:首先,设计了嵌入到蒙特卡罗树搜索的统一威胁空间搜索算法,缓解了奖励稀疏的问题;其次,提出了基于领域知识的双层知识库,加快算法搜索速度;此外,将威胁动作空间作为神经网络的输入特征,增强了模型对关键局部形势的感知能力;最后;利用走法过滤机制有效缩小了动作空间。实验结果表明:上述改进措施显著提升了自博弈程序的学习速度和竞技水平。
Deep reinforcement learning methods integrating Monte Carlo tree search(MCTS)and deep neural networks(DNN)have become the benchmark approach for solving complex gaming problems.However,challenges such as sparse rewards and high training costs remain prominent.To address these issues,this paper proposes an improved deep reinforcement learning method for Gomoku by incorporating threat space search(TSS).First,a unified threat space search algorithm is designed.It seamlessly integrates with MCTS to mitigate the sparse reward problem.Then,a domain knowledge-based dual-layer knowledge base structure is proposed to enhance search efficiency.Next,threat-based offensive and defensive sets are incorporated as neural network input features to improve the model’s perception of critical local game situations.Finally,a move filtering mechanism is developed based on threat space features to effectively reduce the action space.Experimental results demonstrate these improvements markedly enhance both the learning efficiency and competitive performance of the self-play program.
作者
牛学芬
王子游
陈灵
吴育华
刘雨泽
徐长明
NIU Xuefen;WANG Ziyou;CHEN Ling;WU Yuhua;LIU Yuze;XU Changming(School of Computer and Communication Engineering,Northeastern University,Qinhuangdao 066004,China;Graduate School,Northeastern University,Qinhuangdao 066004,China)
出处
《重庆理工大学学报(自然科学)》
北大核心
2025年第8期118-125,共8页
Journal of Chongqing University of Technology:Natural Science
基金
河北省自然科学基金项目面上项目(F2022501015)。
作者简介
牛学芬,女,讲师,主要从事FPGA开发、机器学习研究,E-mail:niuxuefen@neuq.edu.cn;通信作者:徐长明,男,博士,讲师,主要从事基于深度学习的机器博弈、时间序列异常检测等研究,E-mail:changmingxu@neuq.edu.cn。