摘要
针对局部可观测的非线性动态地震环境下,六足机器人采用传统算法进行动态避障时易出现算法不稳定的情况。运用了基于双重深度Q网络(DDQN)的决策方式,通过传感器数据输入卷积神经网络(CNN)并结合强化学习的策略,下达命令到六足机器人,控制输出决策动作,实现机器人动态避障。将系统的环境反馈与决策控制直接形成闭环,通过最大化机器人与避障环境交互产生的累计奖励回报,更新神经网络权重,形成最优决策策略。通过六足机器人平台实验结果证明:此方法能较好地减少传统深度强化学习算法容易导致过度估计状态动作值和损失函数难以收敛的风险;并且提高了六足机器人进行动态避障的效率和稳定性。
Aiming at the locally observable non-linear dynamic earthquake environment,Hexapod robots are prone to algorithm instability when using traditional algorithms for dynamic obstacle avoidance.Using the decision method based on double deep Q-network(DDQN),the sensor data is input to the convolutional neural network and combined with the strategy of reinforcement learning,and the command is issued to the hexapod robot to control the output decision-making action and realize the robot dynamics avoidance.The environmental feedback of system and decision control are directly formed into a closed loop,and the optimal decision strategy is learned by maximizing the cumulative reward return generated by the interaction between the robot and the obstacle avoidance environment.The experimental results of the hexapod robot platform prove that this method can better reduce the risk that traditional deep reinforcement learning algorithms can easily lead to overestimation and loss functions that are difficult to converge.The efficiency and stability of hexapod robot for dynamic obstacle avoidance are improved.
作者
董星宇
唐开强
傅汇乔
留沧海
蒋刚
DONG Xingyu;TANG Kaiqiang;FU Huiqiao;LIU Canghai;JIANG Gang(Faculty of Manufacturing Science and Engineering,Southwest University of Science and Technology,Mianyang 621000,China;Department of Control and Systems Engineering,College of Engineering Management,Nanjing University,Nanjing 210093,China;Manufacturing Process Testing Technology Key Laboratory of Ministry of Education,Mianyang 621000,China;College of Nuclear Technology and Automation Engineering,Chengdu University of Technology,Chengdu 610059,China)
出处
《传感器与微系统》
CSCD
北大核心
2022年第1期19-23,共5页
Transducer and Microsystem Technologies
基金
四川省重大科技专项资助项目(2020ZDZX0019)
四川省科技厅重点研发计划资助项目(19ZDYF1083)。
关键词
双重深度Q网络
六足机器人
动态避障
传感器输入
double deep Q-network(DDQN)
hexapod robot
dynamic obstacle avoidance
sensor input
作者简介
董星宇(1995-),男,硕士研究生,研究方向为机器人技术,机器学习;通讯作者:唐开强(1992-),男,博士研究生,研究方向为机器学习,机器人技术;留沧海(1966-),男,博士,教授,主要研究领域为机电一体化,机器人技术等;蒋刚(1978-),男,博士,教授,博士研究生导师,主要研究领域为机器人技术,机电一体化。