期刊文献+
共找到14篇文章
< 1 >
每页显示 20 50 100
UAV maneuvering decision-making algorithm based on deep reinforcement learning under the guidance of expert experience
1
作者 ZHAN Guang ZHANG Kun +1 位作者 LI Ke PIAO Haiyin 《Journal of Systems Engineering and Electronics》 SCIE CSCD 2024年第3期644-665,共22页
Autonomous umanned aerial vehicle(UAV) manipulation is necessary for the defense department to execute tactical missions given by commanders in the future unmanned battlefield. A large amount of research has been devo... Autonomous umanned aerial vehicle(UAV) manipulation is necessary for the defense department to execute tactical missions given by commanders in the future unmanned battlefield. A large amount of research has been devoted to improving the autonomous decision-making ability of UAV in an interactive environment, where finding the optimal maneuvering decisionmaking policy became one of the key issues for enabling the intelligence of UAV. In this paper, we propose a maneuvering decision-making algorithm for autonomous air-delivery based on deep reinforcement learning under the guidance of expert experience. Specifically, we refine the guidance towards area and guidance towards specific point tasks for the air-delivery process based on the traditional air-to-surface fire control methods.Moreover, we construct the UAV maneuvering decision-making model based on Markov decision processes(MDPs). Specifically, we present a reward shaping method for the guidance towards area and guidance towards specific point tasks using potential-based function and expert-guided advice. The proposed algorithm could accelerate the convergence of the maneuvering decision-making policy and increase the stability of the policy in terms of the output during the later stage of training process. The effectiveness of the proposed maneuvering decision-making policy is illustrated by the curves of training parameters and extensive experimental results for testing the trained policy. 展开更多
关键词 unmanned aerial vehicle(UAV) maneuvering decision-making autonomous air-delivery deep reinforcement learning reward shaping expert experience
在线阅读 下载PDF
Deep reinforcement learning guidance with impact time control
2
作者 LI Guofei LI Shituo +1 位作者 LI Bohao WU Yunjie 《Journal of Systems Engineering and Electronics》 CSCD 2024年第6期1594-1603,共10页
In consideration of the field-of-view(FOV)angle con-straint,this study focuses on the guidance problem with impact time control.A deep reinforcement learning guidance method is given for the missile to obtain the desi... In consideration of the field-of-view(FOV)angle con-straint,this study focuses on the guidance problem with impact time control.A deep reinforcement learning guidance method is given for the missile to obtain the desired impact time and meet the demand of FOV angle constraint.On basis of the framework of the proportional navigation guidance,an auxiliary control term is supplemented by the distributed deep deterministic policy gradient algorithm,in which the reward functions are developed to decrease the time-to-go error and improve the terminal guid-ance accuracy.The numerical simulation demonstrates that the missile governed by the presented deep reinforcement learning guidance law can hit the target successfully at appointed arrival time. 展开更多
关键词 impact time deep reinforcement learning guidance law field-of-view(FOV)angle deep deterministic policy gradient
在线阅读 下载PDF
Task assignment in ground-to-air confrontation based on multiagent deep reinforcement learning 被引量:4
3
作者 Jia-yi Liu Gang Wang +2 位作者 Qiang Fu Shao-hua Yue Si-yuan Wang 《Defence Technology(防务技术)》 SCIE EI CAS CSCD 2023年第1期210-219,共10页
The scale of ground-to-air confrontation task assignments is large and needs to deal with many concurrent task assignments and random events.Aiming at the problems where existing task assignment methods are applied to... The scale of ground-to-air confrontation task assignments is large and needs to deal with many concurrent task assignments and random events.Aiming at the problems where existing task assignment methods are applied to ground-to-air confrontation,there is low efficiency in dealing with complex tasks,and there are interactive conflicts in multiagent systems.This study proposes a multiagent architecture based on a one-general agent with multiple narrow agents(OGMN)to reduce task assignment conflicts.Considering the slow speed of traditional dynamic task assignment algorithms,this paper proposes the proximal policy optimization for task assignment of general and narrow agents(PPOTAGNA)algorithm.The algorithm based on the idea of the optimal assignment strategy algorithm and combined with the training framework of deep reinforcement learning(DRL)adds a multihead attention mechanism and a stage reward mechanism to the bilateral band clipping PPO algorithm to solve the problem of low training efficiency.Finally,simulation experiments are carried out in the digital battlefield.The multiagent architecture based on OGMN combined with the PPO-TAGNA algorithm can obtain higher rewards faster and has a higher win ratio.By analyzing agent behavior,the efficiency,superiority and rationality of resource utilization of this method are verified. 展开更多
关键词 Ground-to-air confrontation Task assignment General and narrow agents deep reinforcement learning Proximal policy optimization(PPO)
在线阅读 下载PDF
Deep reinforcement learning for UAV swarm rendezvous behavior 被引量:2
4
作者 ZHANG Yaozhong LI Yike +1 位作者 WU Zhuoran XU Jialin 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2023年第2期360-373,共14页
The unmanned aerial vehicle(UAV)swarm technology is one of the research hotspots in recent years.With the continuous improvement of autonomous intelligence of UAV,the swarm technology of UAV will become one of the mai... The unmanned aerial vehicle(UAV)swarm technology is one of the research hotspots in recent years.With the continuous improvement of autonomous intelligence of UAV,the swarm technology of UAV will become one of the main trends of UAV development in the future.This paper studies the behavior decision-making process of UAV swarm rendezvous task based on the double deep Q network(DDQN)algorithm.We design a guided reward function to effectively solve the problem of algorithm convergence caused by the sparse return problem in deep reinforcement learning(DRL)for the long period task.We also propose the concept of temporary storage area,optimizing the memory playback unit of the traditional DDQN algorithm,improving the convergence speed of the algorithm,and speeding up the training process of the algorithm.Different from traditional task environment,this paper establishes a continuous state-space task environment model to improve the authentication process of UAV task environment.Based on the DDQN algorithm,the collaborative tasks of UAV swarm in different task scenarios are trained.The experimental results validate that the DDQN algorithm is efficient in terms of training UAV swarm to complete the given collaborative tasks while meeting the requirements of UAV swarm for centralization and autonomy,and improving the intelligence of UAV swarm collaborative task execution.The simulation results show that after training,the proposed UAV swarm can carry out the rendezvous task well,and the success rate of the mission reaches 90%. 展开更多
关键词 double deep Q network(DDQN)algorithms unmanned aerial vehicle(UAV)swarm task decision deep reinforcement learning(DRL) sparse returns
在线阅读 下载PDF
UAV Frequency-based Crowdsensing Using Grouping Multi-agent Deep Reinforcement Learning
5
作者 Cui ZHANG En WANG +2 位作者 Funing YANG Yong jian YANG Nan JIANG 《计算机科学》 CSCD 北大核心 2023年第2期57-68,共12页
Mobile CrowdSensing(MCS)is a promising sensing paradigm that recruits users to cooperatively perform sensing tasks.Recently,unmanned aerial vehicles(UAVs)as the powerful sensing devices are used to replace user partic... Mobile CrowdSensing(MCS)is a promising sensing paradigm that recruits users to cooperatively perform sensing tasks.Recently,unmanned aerial vehicles(UAVs)as the powerful sensing devices are used to replace user participation and carry out some special tasks,such as epidemic monitoring and earthquakes rescue.In this paper,we focus on scheduling UAVs to sense the task Point-of-Interests(PoIs)with different frequency coverage requirements.To accomplish the sensing task,the scheduling strategy needs to consider the coverage requirement,geographic fairness and energy charging simultaneously.We consider the complex interaction among UAVs and propose a grouping multi-agent deep reinforcement learning approach(G-MADDPG)to schedule UAVs distributively.G-MADDPG groups all UAVs into some teams by a distance-based clustering algorithm(DCA),then it regards each team as an agent.In this way,G-MADDPG solves the problem that the training time of traditional MADDPG is too long to converge when the number of UAVs is large,and the trade-off between training time and result accuracy could be controlled flexibly by adjusting the number of teams.Extensive simulation results show that our scheduling strategy has better performance compared with three baselines and is flexible in balancing training time and result accuracy. 展开更多
关键词 UAV Crowdsensing Frequency coverage Grouping multi-agent deep reinforcement learning
在线阅读 下载PDF
Tactical reward shaping for large-scale combat by multi-agent reinforcement learning
6
作者 DUO Nanxun WANG Qinzhao +1 位作者 LYU Qiang WANG Wei 《Journal of Systems Engineering and Electronics》 CSCD 2024年第6期1516-1529,共14页
Future unmanned battles desperately require intelli-gent combat policies,and multi-agent reinforcement learning offers a promising solution.However,due to the complexity of combat operations and large size of the comb... Future unmanned battles desperately require intelli-gent combat policies,and multi-agent reinforcement learning offers a promising solution.However,due to the complexity of combat operations and large size of the combat group,this task suffers from credit assignment problem more than other rein-forcement learning tasks.This study uses reward shaping to relieve the credit assignment problem and improve policy train-ing for the new generation of large-scale unmanned combat operations.We first prove that multiple reward shaping func-tions would not change the Nash Equilibrium in stochastic games,providing theoretical support for their use.According to the characteristics of combat operations,we propose tactical reward shaping(TRS)that comprises maneuver shaping advice and threat assessment-based attack shaping advice.Then,we investigate the effects of different types and combinations of shaping advice on combat policies through experiments.The results show that TRS improves both the efficiency and attack accuracy of combat policies,with the combination of maneuver reward shaping advice and ally-focused attack shaping advice achieving the best performance compared with that of the base-line strategy. 展开更多
关键词 deep reinforcement learning multi-agent reinforce-ment learning multi-agent combat unmanned battle reward shaping
在线阅读 下载PDF
Cooperative multi-target hunting by unmanned surface vehicles based on multi-agent reinforcement learning 被引量:2
7
作者 Jiawei Xia Yasong Luo +3 位作者 Zhikun Liu Yalun Zhang Haoran Shi Zhong Liu 《Defence Technology(防务技术)》 SCIE EI CAS CSCD 2023年第11期80-94,共15页
To solve the problem of multi-target hunting by an unmanned surface vehicle(USV)fleet,a hunting algorithm based on multi-agent reinforcement learning is proposed.Firstly,the hunting environment and kinematic model wit... To solve the problem of multi-target hunting by an unmanned surface vehicle(USV)fleet,a hunting algorithm based on multi-agent reinforcement learning is proposed.Firstly,the hunting environment and kinematic model without boundary constraints are built,and the criteria for successful target capture are given.Then,the cooperative hunting problem of a USV fleet is modeled as a decentralized partially observable Markov decision process(Dec-POMDP),and a distributed partially observable multitarget hunting Proximal Policy Optimization(DPOMH-PPO)algorithm applicable to USVs is proposed.In addition,an observation model,a reward function and the action space applicable to multi-target hunting tasks are designed.To deal with the dynamic change of observational feature dimension input by partially observable systems,a feature embedding block is proposed.By combining the two feature compression methods of column-wise max pooling(CMP)and column-wise average-pooling(CAP),observational feature encoding is established.Finally,the centralized training and decentralized execution framework is adopted to complete the training of hunting strategy.Each USV in the fleet shares the same policy and perform actions independently.Simulation experiments have verified the effectiveness of the DPOMH-PPO algorithm in the test scenarios with different numbers of USVs.Moreover,the advantages of the proposed model are comprehensively analyzed from the aspects of algorithm performance,migration effect in task scenarios and self-organization capability after being damaged,the potential deployment and application of DPOMH-PPO in the real environment is verified. 展开更多
关键词 Unmanned surface vehicles Multi-agent deep reinforcement learning Cooperative hunting Feature embedding Proximal policy optimization
在线阅读 下载PDF
Hierarchical reinforcement learning guidance with threat avoidance 被引量:1
8
作者 LI Bohao WU Yunjie LI Guofei 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2022年第5期1173-1185,共13页
The guidance strategy is an extremely critical factor in determining the striking effect of the missile operation.A novel guidance law is presented by exploiting the deep reinforcement learning(DRL)with the hierarchic... The guidance strategy is an extremely critical factor in determining the striking effect of the missile operation.A novel guidance law is presented by exploiting the deep reinforcement learning(DRL)with the hierarchical deep deterministic policy gradient(DDPG)algorithm.The reward functions are constructed to minimize the line-of-sight(LOS)angle rate and avoid the threat caused by the opposed obstacles.To attenuate the chattering of the acceleration,a hierarchical reinforcement learning structure and an improved reward function with action penalty are put forward.The simulation results validate that the missile under the proposed method can hit the target successfully and keep away from the threatened areas effectively. 展开更多
关键词 guidance law deep reinforcement learning(DRL) threat avoidance hierarchical reinforcement learning
在线阅读 下载PDF
Maneuvering target tracking of UAV based on MN-DDPG and transfer learning 被引量:15
9
作者 Bo Li Zhi-peng Yang +2 位作者 Da-qing Chen Shi-yang Liang Hao Ma 《Defence Technology(防务技术)》 SCIE EI CAS CSCD 2021年第2期457-466,共10页
Tracking maneuvering target in real time autonomously and accurately in an uncertain environment is one of the challenging missions for unmanned aerial vehicles(UAVs).In this paper,aiming to address the control proble... Tracking maneuvering target in real time autonomously and accurately in an uncertain environment is one of the challenging missions for unmanned aerial vehicles(UAVs).In this paper,aiming to address the control problem of maneuvering target tracking and obstacle avoidance,an online path planning approach for UAV is developed based on deep reinforcement learning.Through end-to-end learning powered by neural networks,the proposed approach can achieve the perception of the environment and continuous motion output control.This proposed approach includes:(1)A deep deterministic policy gradient(DDPG)-based control framework to provide learning and autonomous decision-making capability for UAVs;(2)An improved method named MN-DDPG for introducing a type of mixed noises to assist UAV with exploring stochastic strategies for online optimal planning;and(3)An algorithm of taskdecomposition and pre-training for efficient transfer learning to improve the generalization capability of UAV’s control model built based on MN-DDPG.The experimental simulation results have verified that the proposed approach can achieve good self-adaptive adjustment of UAV’s flight attitude in the tasks of maneuvering target tracking with a significant improvement in generalization capability and training efficiency of UAV tracking controller in uncertain environments. 展开更多
关键词 UAVS Maneuvering target tracking deep reinforcement learning MN-DDPG Mixed noises Transfer learning
在线阅读 下载PDF
A learning-based flexible autonomous motion control method for UAV in dynamic unknown environments 被引量:3
10
作者 WAN Kaifang LI Bo +2 位作者 GAO Xiaoguang HU Zijian YANG Zhipeng 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2021年第6期1490-1508,共19页
This paper presents a deep reinforcement learning(DRL)-based motion control method to provide unmanned aerial vehicles(UAVs)with additional flexibility while flying across dynamic unknown environments autonomously.Thi... This paper presents a deep reinforcement learning(DRL)-based motion control method to provide unmanned aerial vehicles(UAVs)with additional flexibility while flying across dynamic unknown environments autonomously.This method is applicable in both military and civilian fields such as penetration and rescue.The autonomous motion control problem is addressed through motion planning,action interpretation,trajectory tracking,and vehicle movement within the DRL framework.Novel DRL algorithms are presented by combining two difference-amplifying approaches with traditional DRL methods and are used for solving the motion planning problem.An improved Lyapunov guidance vector field(LGVF)method is used to handle the trajectory-tracking problem and provide guidance control commands for the UAV.In contrast to conventional motion-control approaches,the proposed methods directly map the sensorbased detections and measurements into control signals for the inner loop of the UAV,i.e.,an end-to-end control.The training experiment results show that the novel DRL algorithms provide more than a 20%performance improvement over the state-ofthe-art DRL algorithms.The testing experiment results demonstrate that the controller based on the novel DRL and LGVF,which is only trained once in a static environment,enables the UAV to fly autonomously in various dynamic unknown environments.Thus,the proposed technique provides strong flexibility for the controller. 展开更多
关键词 autonomous motion control(AMC) deep reinforcement learning(DRL) difference amplify reward shaping
在线阅读 下载PDF
Targeted multi-agent communication algorithm based on state control
11
作者 Li-yang Zhao Tian-qing Chang +3 位作者 Lei Zhang Jie Zhang Kai-xuan Chu De-peng Kong 《Defence Technology(防务技术)》 SCIE EI CAS CSCD 2024年第1期544-556,共13页
As an important mechanism in multi-agent interaction,communication can make agents form complex team relationships rather than constitute a simple set of multiple independent agents.However,the existing communication ... As an important mechanism in multi-agent interaction,communication can make agents form complex team relationships rather than constitute a simple set of multiple independent agents.However,the existing communication schemes can bring much timing redundancy and irrelevant messages,which seriously affects their practical application.To solve this problem,this paper proposes a targeted multiagent communication algorithm based on state control(SCTC).The SCTC uses a gating mechanism based on state control to reduce the timing redundancy of communication between agents and determines the interaction relationship between agents and the importance weight of a communication message through a series connection of hard-and self-attention mechanisms,realizing targeted communication message processing.In addition,by minimizing the difference between the fusion message generated from a real communication message of each agent and a fusion message generated from the buffered message,the correctness of the final action choice of the agent is ensured.Our evaluation using a challenging set of Star Craft II benchmarks indicates that the SCTC can significantly improve the learning performance and reduce the communication overhead between agents,thus ensuring better cooperation between agents. 展开更多
关键词 Multi-agent deep reinforcement learning State control Targeted interaction Communication mechanism
在线阅读 下载PDF
深度强化学习与移动通信资源管理:算法、进展与展望 被引量:6
12
作者 孙恩昌 袁永仪 +2 位作者 吴兵 屈晗星 张延华 《北京工业大学学报》 CAS CSCD 北大核心 2023年第1期71-88,共18页
深度强化学习(deep reinforcement learning,DRL)将深度学习从高维数据提取低维特征的能力与强化学习的决策能力相结合,是移动通信资源管理与优化的高效算法之一.在引入DRL相关算法概念与原理的基础上,重点对DRL在网络切片、云计算、雾... 深度强化学习(deep reinforcement learning,DRL)将深度学习从高维数据提取低维特征的能力与强化学习的决策能力相结合,是移动通信资源管理与优化的高效算法之一.在引入DRL相关算法概念与原理的基础上,重点对DRL在网络切片、云计算、雾计算、移动边缘计算等通信技术与场景中的资源管理与优化效果进行综述与分析,结合DRL在移动通信资源管理的算法原理与研究进展,论述了DRL面临的问题与挑战,并提出相应解决思路.最后,展望了DRL在移动通信资源管理领域的发展趋势和主要研究方向. 展开更多
关键词 深度强化学习(deep reinforcement learning DRL) 通信资源管理 网络切片 云计算 雾计算 移动边缘计算
在线阅读 下载PDF
Research on virtual entity decision model for LVC tactical confrontation of army units 被引量:4
13
作者 GAO Ang GUO Qisheng +3 位作者 DONG Zhiming TANG Zaijiang ZHANG Ziwei FENG Qiqi 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2022年第5期1249-1267,共19页
According to the requirements of the live-virtual-constructive(LVC)tactical confrontation(TC)on the virtual entity(VE)decision model of graded combat capability,diversified actions,real-time decision-making,and genera... According to the requirements of the live-virtual-constructive(LVC)tactical confrontation(TC)on the virtual entity(VE)decision model of graded combat capability,diversified actions,real-time decision-making,and generalization for the enemy,the confrontation process is modeled as a zero-sum stochastic game(ZSG).By introducing the theory of dynamic relative power potential field,the problem of reward sparsity in the model can be solved.By reward shaping,the problem of credit assignment between agents can be solved.Based on the idea of meta-learning,an extensible multi-agent deep reinforcement learning(EMADRL)framework and solving method is proposed to improve the effectiveness and efficiency of model solving.Experiments show that the model meets the requirements well and the algorithm learning efficiency is high. 展开更多
关键词 live-virtual-constructive(LVC) army unit tactical confrontation(TC) intelligent decision model multi-agent deep reinforcement learning
在线阅读 下载PDF
A UAV collaborative defense scheme driven by DDPG algorithm 被引量:3
14
作者 ZHANG Yaozhong WU Zhuoran +1 位作者 XIONG Zhenkai CHEN Long 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2023年第5期1211-1224,共14页
The deep deterministic policy gradient(DDPG)algo-rithm is an off-policy method that combines two mainstream reinforcement learning methods based on value iteration and policy iteration.Using the DDPG algorithm,agents ... The deep deterministic policy gradient(DDPG)algo-rithm is an off-policy method that combines two mainstream reinforcement learning methods based on value iteration and policy iteration.Using the DDPG algorithm,agents can explore and summarize the environment to achieve autonomous deci-sions in the continuous state space and action space.In this paper,a cooperative defense with DDPG via swarms of unmanned aerial vehicle(UAV)is developed and validated,which has shown promising practical value in the effect of defending.We solve the sparse rewards problem of reinforcement learning pair in a long-term task by building the reward function of UAV swarms and optimizing the learning process of artificial neural network based on the DDPG algorithm to reduce the vibration in the learning process.The experimental results show that the DDPG algorithm can guide the UAVs swarm to perform the defense task efficiently,meeting the requirements of a UAV swarm for non-centralization,autonomy,and promoting the intelligent development of UAVs swarm as well as the decision-making process. 展开更多
关键词 deep deterministic policy gradient(DDPG)algorithm unmanned aerial vehicles(UAVs)swarm task decision making deep reinforcement learning sparse reward problem
在线阅读 下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部