The unmanned aerial vehicle(UAV)swarm technology is one of the research hotspots in recent years.With the continuous improvement of autonomous intelligence of UAV,the swarm technology of UAV will become one of the mai...The unmanned aerial vehicle(UAV)swarm technology is one of the research hotspots in recent years.With the continuous improvement of autonomous intelligence of UAV,the swarm technology of UAV will become one of the main trends of UAV development in the future.This paper studies the behavior decision-making process of UAV swarm rendezvous task based on the double deep Q network(DDQN)algorithm.We design a guided reward function to effectively solve the problem of algorithm convergence caused by the sparse return problem in deep reinforcement learning(DRL)for the long period task.We also propose the concept of temporary storage area,optimizing the memory playback unit of the traditional DDQN algorithm,improving the convergence speed of the algorithm,and speeding up the training process of the algorithm.Different from traditional task environment,this paper establishes a continuous state-space task environment model to improve the authentication process of UAV task environment.Based on the DDQN algorithm,the collaborative tasks of UAV swarm in different task scenarios are trained.The experimental results validate that the DDQN algorithm is efficient in terms of training UAV swarm to complete the given collaborative tasks while meeting the requirements of UAV swarm for centralization and autonomy,and improving the intelligence of UAV swarm collaborative task execution.The simulation results show that after training,the proposed UAV swarm can carry out the rendezvous task well,and the success rate of the mission reaches 90%.展开更多
The guidance strategy is an extremely critical factor in determining the striking effect of the missile operation.A novel guidance law is presented by exploiting the deep reinforcement learning(DRL)with the hierarchic...The guidance strategy is an extremely critical factor in determining the striking effect of the missile operation.A novel guidance law is presented by exploiting the deep reinforcement learning(DRL)with the hierarchical deep deterministic policy gradient(DDPG)algorithm.The reward functions are constructed to minimize the line-of-sight(LOS)angle rate and avoid the threat caused by the opposed obstacles.To attenuate the chattering of the acceleration,a hierarchical reinforcement learning structure and an improved reward function with action penalty are put forward.The simulation results validate that the missile under the proposed method can hit the target successfully and keep away from the threatened areas effectively.展开更多
This paper presents a deep reinforcement learning(DRL)-based motion control method to provide unmanned aerial vehicles(UAVs)with additional flexibility while flying across dynamic unknown environments autonomously.Thi...This paper presents a deep reinforcement learning(DRL)-based motion control method to provide unmanned aerial vehicles(UAVs)with additional flexibility while flying across dynamic unknown environments autonomously.This method is applicable in both military and civilian fields such as penetration and rescue.The autonomous motion control problem is addressed through motion planning,action interpretation,trajectory tracking,and vehicle movement within the DRL framework.Novel DRL algorithms are presented by combining two difference-amplifying approaches with traditional DRL methods and are used for solving the motion planning problem.An improved Lyapunov guidance vector field(LGVF)method is used to handle the trajectory-tracking problem and provide guidance control commands for the UAV.In contrast to conventional motion-control approaches,the proposed methods directly map the sensorbased detections and measurements into control signals for the inner loop of the UAV,i.e.,an end-to-end control.The training experiment results show that the novel DRL algorithms provide more than a 20%performance improvement over the state-ofthe-art DRL algorithms.The testing experiment results demonstrate that the controller based on the novel DRL and LGVF,which is only trained once in a static environment,enables the UAV to fly autonomously in various dynamic unknown environments.Thus,the proposed technique provides strong flexibility for the controller.展开更多
堆叠覆盖环境下的机械臂避障抓取是一个重要且有挑战性的任务。针对机械臂在堆叠环境下的避障抓取任务,本文提出了一种基于图像编码器和深度强化学习(deep reinforcement learning,DRL)的机械臂避障抓取方法Ec-DSAC(encoder and crop fo...堆叠覆盖环境下的机械臂避障抓取是一个重要且有挑战性的任务。针对机械臂在堆叠环境下的避障抓取任务,本文提出了一种基于图像编码器和深度强化学习(deep reinforcement learning,DRL)的机械臂避障抓取方法Ec-DSAC(encoder and crop for discrete SAC)。首先设计结合YOLO(you only look once)v5和对比学习网络编码的图像编码器,能够编码关键特征和全局特征,实现像素信息至向量信息的降维。其次结合图像编码器和离散软演员-评价家(soft actor-critic,SAC)算法,设计离散动作空间和密集奖励函数约束并引导策略输出的学习方向,同时使用随机图像裁剪增加强化学习的样本效率。最后,提出了一种应用于深度强化学习预训练的二次行为克隆方法,增强了强化学习网络的学习能力并提高了控制策略的成功率。仿真实验中Ec-DSAC的避障抓取成功率稳定高于80.0%,验证其具有比现有方法更好的避障抓取性能。现实实验中避障抓取成功率为73.3%,验证其在现实堆叠覆盖环境下避障抓取的有效性。展开更多
针对车辆边缘计算(VEC)中存在的用户体验质量需求不断增加、高度移动车辆引起的链路状态获取困难和异构边缘节点为车辆提供资源的时变性等问题,制定一种联合任务卸载和资源优化(JTO-RO)的VEC方案。首先,在不失一般性的前提下,综合考虑...针对车辆边缘计算(VEC)中存在的用户体验质量需求不断增加、高度移动车辆引起的链路状态获取困难和异构边缘节点为车辆提供资源的时变性等问题,制定一种联合任务卸载和资源优化(JTO-RO)的VEC方案。首先,在不失一般性的前提下,综合考虑边缘内和边缘间干扰,提出一种车辆到基础设施(V2I)的传输模型,该模型通过引入非正交多址接入(NOMA)技术使边缘节点不仅无需依赖链路状态信息,还可以提升信道容量;其次,为了提高系统的性能和效率,设计一种多智能体双延迟深度确定性(MATD3)算法用于制定任务卸载策略,这些策略可通过与环境的交互学习进行动态调整;再次,联合考虑2种策略的协同作用,并制定将最大化任务服务比率作为目标的优化方案,从而满足不断提升的用户体验质量需求;最后,对真实车辆轨迹数据集进行仿真实验。结果表明,相较于当前具有代表性的3种方案(分别以随机卸载(RO)算法、D4PG(Distributed Distributional Deep Deterministic Policy Gradient)算法和MADDPG(Multi-Agent Deep Deterministic Policy Gradient)算法为任务卸载算法的方案)在3类场景下(普通场景、任务密集型场景和时延敏感型场景),所提方案的平均服务比率分别提高了20%、10%和29%以上,验证了该方案的优势和有效性。展开更多
基金supported by the Aeronautical Science Foundation(2017ZC53033).
文摘The unmanned aerial vehicle(UAV)swarm technology is one of the research hotspots in recent years.With the continuous improvement of autonomous intelligence of UAV,the swarm technology of UAV will become one of the main trends of UAV development in the future.This paper studies the behavior decision-making process of UAV swarm rendezvous task based on the double deep Q network(DDQN)algorithm.We design a guided reward function to effectively solve the problem of algorithm convergence caused by the sparse return problem in deep reinforcement learning(DRL)for the long period task.We also propose the concept of temporary storage area,optimizing the memory playback unit of the traditional DDQN algorithm,improving the convergence speed of the algorithm,and speeding up the training process of the algorithm.Different from traditional task environment,this paper establishes a continuous state-space task environment model to improve the authentication process of UAV task environment.Based on the DDQN algorithm,the collaborative tasks of UAV swarm in different task scenarios are trained.The experimental results validate that the DDQN algorithm is efficient in terms of training UAV swarm to complete the given collaborative tasks while meeting the requirements of UAV swarm for centralization and autonomy,and improving the intelligence of UAV swarm collaborative task execution.The simulation results show that after training,the proposed UAV swarm can carry out the rendezvous task well,and the success rate of the mission reaches 90%.
基金supported by the National Natural Science Foundation of China(62003021,91212304).
文摘The guidance strategy is an extremely critical factor in determining the striking effect of the missile operation.A novel guidance law is presented by exploiting the deep reinforcement learning(DRL)with the hierarchical deep deterministic policy gradient(DDPG)algorithm.The reward functions are constructed to minimize the line-of-sight(LOS)angle rate and avoid the threat caused by the opposed obstacles.To attenuate the chattering of the acceleration,a hierarchical reinforcement learning structure and an improved reward function with action penalty are put forward.The simulation results validate that the missile under the proposed method can hit the target successfully and keep away from the threatened areas effectively.
基金supported by the National Natural Science Foundation of China(62003267)the Natural Science Foundation of Shaanxi Province(2020JQ-220)the Open Project of Science and Technology on Electronic Information Control Laboratory(JS20201100339)。
文摘This paper presents a deep reinforcement learning(DRL)-based motion control method to provide unmanned aerial vehicles(UAVs)with additional flexibility while flying across dynamic unknown environments autonomously.This method is applicable in both military and civilian fields such as penetration and rescue.The autonomous motion control problem is addressed through motion planning,action interpretation,trajectory tracking,and vehicle movement within the DRL framework.Novel DRL algorithms are presented by combining two difference-amplifying approaches with traditional DRL methods and are used for solving the motion planning problem.An improved Lyapunov guidance vector field(LGVF)method is used to handle the trajectory-tracking problem and provide guidance control commands for the UAV.In contrast to conventional motion-control approaches,the proposed methods directly map the sensorbased detections and measurements into control signals for the inner loop of the UAV,i.e.,an end-to-end control.The training experiment results show that the novel DRL algorithms provide more than a 20%performance improvement over the state-ofthe-art DRL algorithms.The testing experiment results demonstrate that the controller based on the novel DRL and LGVF,which is only trained once in a static environment,enables the UAV to fly autonomously in various dynamic unknown environments.Thus,the proposed technique provides strong flexibility for the controller.
文摘堆叠覆盖环境下的机械臂避障抓取是一个重要且有挑战性的任务。针对机械臂在堆叠环境下的避障抓取任务,本文提出了一种基于图像编码器和深度强化学习(deep reinforcement learning,DRL)的机械臂避障抓取方法Ec-DSAC(encoder and crop for discrete SAC)。首先设计结合YOLO(you only look once)v5和对比学习网络编码的图像编码器,能够编码关键特征和全局特征,实现像素信息至向量信息的降维。其次结合图像编码器和离散软演员-评价家(soft actor-critic,SAC)算法,设计离散动作空间和密集奖励函数约束并引导策略输出的学习方向,同时使用随机图像裁剪增加强化学习的样本效率。最后,提出了一种应用于深度强化学习预训练的二次行为克隆方法,增强了强化学习网络的学习能力并提高了控制策略的成功率。仿真实验中Ec-DSAC的避障抓取成功率稳定高于80.0%,验证其具有比现有方法更好的避障抓取性能。现实实验中避障抓取成功率为73.3%,验证其在现实堆叠覆盖环境下避障抓取的有效性。
文摘针对车辆边缘计算(VEC)中存在的用户体验质量需求不断增加、高度移动车辆引起的链路状态获取困难和异构边缘节点为车辆提供资源的时变性等问题,制定一种联合任务卸载和资源优化(JTO-RO)的VEC方案。首先,在不失一般性的前提下,综合考虑边缘内和边缘间干扰,提出一种车辆到基础设施(V2I)的传输模型,该模型通过引入非正交多址接入(NOMA)技术使边缘节点不仅无需依赖链路状态信息,还可以提升信道容量;其次,为了提高系统的性能和效率,设计一种多智能体双延迟深度确定性(MATD3)算法用于制定任务卸载策略,这些策略可通过与环境的交互学习进行动态调整;再次,联合考虑2种策略的协同作用,并制定将最大化任务服务比率作为目标的优化方案,从而满足不断提升的用户体验质量需求;最后,对真实车辆轨迹数据集进行仿真实验。结果表明,相较于当前具有代表性的3种方案(分别以随机卸载(RO)算法、D4PG(Distributed Distributional Deep Deterministic Policy Gradient)算法和MADDPG(Multi-Agent Deep Deterministic Policy Gradient)算法为任务卸载算法的方案)在3类场景下(普通场景、任务密集型场景和时延敏感型场景),所提方案的平均服务比率分别提高了20%、10%和29%以上,验证了该方案的优势和有效性。