基于船舶领域的《内河避碰规则》,以交互学习、积累决策经验为手段,建立基于TD3(twin-delayed deep deterministic policy gradient)算法的驾驶行为决策模型.模型设置安全性、经济性及协调性三类奖励函数,并通过奖励值的不断迭代,使模...基于船舶领域的《内河避碰规则》,以交互学习、积累决策经验为手段,建立基于TD3(twin-delayed deep deterministic policy gradient)算法的驾驶行为决策模型.模型设置安全性、经济性及协调性三类奖励函数,并通过奖励值的不断迭代,使模型快速收敛.并基于此模型,对内河中常见的对驶、交叉相遇及追越三种会遇场景进行仿真试验.结果表明:与DDPG相比,TD3算法不仅缩短了模型训练时间,还提升了决策效果.对比分析不同船长、航速下的驾驶行为决策结果,发现此模型不仅可以安全快速地进行自主驾驶决策,而且可对避碰路径进行优化,得到无人船较佳的自主驾驶决策方案.展开更多
Driving behavior modeling is very important in the research area of road traffic systems safety analysis. The characteristics of action of recovering from erroneous driving condition underlying road traffic accident o...Driving behavior modeling is very important in the research area of road traffic systems safety analysis. The characteristics of action of recovering from erroneous driving condition underlying road traffic accident or incident scenarios is quantitatively analyzed, the model of action of recovering from erroneous driving condition is set up according to the identification of erroneous driving condition and the measurement of correction from erroneous driving condition. And then, the probability of action of recovering from erroneous driving condition has been measured based on a revised decision tree. The measure process uses a combination of test data and subjective judgments of driving behavior. It can provide a very helpful theoretical basis for the further analysis of driving behavior in road traffic system.展开更多
文摘基于船舶领域的《内河避碰规则》,以交互学习、积累决策经验为手段,建立基于TD3(twin-delayed deep deterministic policy gradient)算法的驾驶行为决策模型.模型设置安全性、经济性及协调性三类奖励函数,并通过奖励值的不断迭代,使模型快速收敛.并基于此模型,对内河中常见的对驶、交叉相遇及追越三种会遇场景进行仿真试验.结果表明:与DDPG相比,TD3算法不仅缩短了模型训练时间,还提升了决策效果.对比分析不同船长、航速下的驾驶行为决策结果,发现此模型不仅可以安全快速地进行自主驾驶决策,而且可对避碰路径进行优化,得到无人船较佳的自主驾驶决策方案.
文摘Driving behavior modeling is very important in the research area of road traffic systems safety analysis. The characteristics of action of recovering from erroneous driving condition underlying road traffic accident or incident scenarios is quantitatively analyzed, the model of action of recovering from erroneous driving condition is set up according to the identification of erroneous driving condition and the measurement of correction from erroneous driving condition. And then, the probability of action of recovering from erroneous driving condition has been measured based on a revised decision tree. The measure process uses a combination of test data and subjective judgments of driving behavior. It can provide a very helpful theoretical basis for the further analysis of driving behavior in road traffic system.