检索结果-维普期刊中文期刊服务平台

基于Policy Gradient的机械臂运动跟踪控制器参数整定被引量：4: 1; 作者韩霖骁胡剑波 +3 位作者宋仕元王应洋贺子厚张鹏《系统工程与电子技术》 EI CSCD 北大核心 2021年第9期2605-2611,共7页; 针对机械臂运动跟踪控制器的参数自整定问题,设计了一种基于强化学习Policy Gradient法的参数整定器。首先,介绍了机械臂的一种混合动力学模型,根据该系统模型进行了比例微分(proportional-derivative,PD)控制器设计和李雅普诺夫稳定性... 展开更多; 关键词机械臂运动跟踪 policy gradient 参数整定比例微分控制; 在线阅读下载PDF 职称材料

Deep reinforcement learning and its application in autonomous fitting optimization for attack areas of UCAVs 被引量：14: 2; 作者 LI Yue QIU Xiaohui +1 位作者 LIU Xiaodong XIA Qunli 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2020年第4期734-742,共9页; The ever-changing battlefield environment requires the use of robust and adaptive technologies integrated into a reliable platform. Unmanned combat aerial vehicles(UCAVs) aim to integrate such advanced technologies wh... 展开更多; 关键词 attack area neural network deep deterministic policy gradient(DDPG) unmanned combat aerial vehicle(UCAV); 在线阅读下载PDF 职称材料

A UAV collaborative defense scheme driven by DDPG algorithm 被引量：3: 3; 作者 ZHANG Yaozhong WU Zhuoran +1 位作者 XIONG Zhenkai CHEN Long 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2023年第5期1211-1224,共14页; The deep deterministic policy gradient(DDPG)algo-rithm is an off-policy method that combines two mainstream reinforcement learning methods based on value iteration and policy iteration.Using the DDPG algorithm,agents ... 展开更多; 关键词 deep deterministic policy gradient(DDPG)algorithm unmanned aerial vehicles(UAVs)swarm task decision making deep reinforcement learning sparse reward problem; 在线阅读下载PDF 职称材料

基于深度强化学习的IRS辅助NOMA-MEC通信资源分配优化被引量：1: 4; 作者方娟刘珍珍 +1 位作者陈思琪李硕朋《北京工业大学学报》 CAS CSCD 北大核心 2024年第8期930-938,共9页; 为了解决无法与边缘服务器建立直连通信链路的盲区边缘用户卸载任务的问题,设计了一个基于深度强化学习(deep reinforcement learning, DRL)的智能反射面(intelligent reflecting surface, IRS)辅助非正交多址(non-orthogonal multiple ... 展开更多; 关键词非正交多址(non-orthogonal multiple access NOMA) 智能反射面(intelligent reflecting surface IRS) 深度确定性策略梯度(deep deterministic policy gradient DDPG)算法移动边缘计算(mobile edge computing MEC) 能源效率(energy efficiency EE) 系统收益; 在线阅读下载PDF 职称材料

Deep reinforcement learning guidance with impact time control: 5; 作者 LI Guofei LI Shituo +1 位作者 LI Bohao WU Yunjie 《Journal of Systems Engineering and Electronics》 CSCD 2024年第6期1594-1603,共10页; In consideration of the field-of-view(FOV)angle con-straint,this study focuses on the guidance problem with impact time control.A deep reinforcement learning guidance method is given for the missile to obtain the desi... 展开更多; 关键词 impact time deep reinforcement learning guidance law field-of-view(FOV)angle deep deterministic policy gradient; 在线阅读下载PDF 职称材料

题名基于Policy Gradient的机械臂运动跟踪控制器参数整定被引量：4: 1; 作者韩霖骁胡剑波宋仕元王应洋贺子厚张鹏; 机构空军工程大学装备管理与无人机工程学院; 出处《系统工程与电子技术》 EI CSCD 北大核心 2021年第9期2605-2611,共7页; 基金工业控制技术国家重点实验室开放课题(ICT20063)资助课题。; 文摘针对机械臂运动跟踪控制器的参数自整定问题,设计了一种基于强化学习Policy Gradient法的参数整定器。首先,介绍了机械臂的一种混合动力学模型,根据该系统模型进行了比例微分(proportional-derivative,PD)控制器设计和李雅普诺夫稳定性证明,并由此给出了参数矩阵的范围。其次,设计了基于Policy Gradient的参数整定器,通过引入积分器的方法对其进行改进,使其控制下的参数行为连续化以进一步提高PD控制器的控制效果。最后,以二阶机械臂系统为例进行了仿真验证。实验数据证明了该参数整定器的有效性和可行性,并能有效提升系统的动态性能。; 关键词机械臂运动跟踪 policy gradient 参数整定比例微分控制; Keywords manipulator motion tracking policy gradient parameter tuning proportional-derivative(PD)control; 分类号 O231.2 [理学—运筹学与控制论]; 在线阅读下载PDF 职称材料

题名Deep reinforcement learning and its application in autonomous fitting optimization for attack areas of UCAVs 被引量：14: 2; 作者 LI Yue QIU Xiaohui LIU Xiaodong XIA Qunli; 机构 School of Aerospace Engineering Science and Technology on Electro-Optic Control Laboratory Beijing Aerospace Automatic Control Research Institute; 出处《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2020年第4期734-742,共9页; 基金 supported by the Key Laboratory of Defense Science and Technology Foundation of Luoyang Electro-optical Equipment Research Institute(6142504200108)。; 文摘 The ever-changing battlefield environment requires the use of robust and adaptive technologies integrated into a reliable platform. Unmanned combat aerial vehicles(UCAVs) aim to integrate such advanced technologies while increasing the tactical capabilities of combat aircraft. As a research object, common UCAV uses the neural network fitting strategy to obtain values of attack areas. However, this simple strategy cannot cope with complex environmental changes and autonomously optimize decision-making problems. To solve the problem, this paper proposes a new deep deterministic policy gradient(DDPG) strategy based on deep reinforcement learning for the attack area fitting of UCAVs in the future battlefield. Simulation results show that the autonomy and environmental adaptability of UCAVs in the future battlefield will be improved based on the new DDPG algorithm and the training process converges quickly. We can obtain the optimal values of attack areas in real time during the whole flight with the well-trained deep network.; 关键词 attack area neural network deep deterministic policy gradient(DDPG) unmanned combat aerial vehicle(UCAV); 分类号 V279 [航空宇航科学与技术—飞行器设计] TP18 [自动化与计算机技术—控制理论与控制工程]; 在线阅读下载PDF 职称材料

题名A UAV collaborative defense scheme driven by DDPG algorithm 被引量：3: 3; 作者 ZHANG Yaozhong WU Zhuoran XIONG Zhenkai CHEN Long; 机构 School of Electronics and Information College of New Energy and Intelligent Connected Vehicle China Research and Development Academy of Machinery Equipment; 出处《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2023年第5期1211-1224,共14页; 基金 supported by the Key Research and Development Program of Shaanxi(2022GY-089) the Natural Science Basic Research Program of Shaanxi(2022JQ-593).; 文摘 The deep deterministic policy gradient(DDPG)algo-rithm is an off-policy method that combines two mainstream reinforcement learning methods based on value iteration and policy iteration.Using the DDPG algorithm,agents can explore and summarize the environment to achieve autonomous deci-sions in the continuous state space and action space.In this paper,a cooperative defense with DDPG via swarms of unmanned aerial vehicle(UAV)is developed and validated,which has shown promising practical value in the effect of defending.We solve the sparse rewards problem of reinforcement learning pair in a long-term task by building the reward function of UAV swarms and optimizing the learning process of artificial neural network based on the DDPG algorithm to reduce the vibration in the learning process.The experimental results show that the DDPG algorithm can guide the UAVs swarm to perform the defense task efficiently,meeting the requirements of a UAV swarm for non-centralization,autonomy,and promoting the intelligent development of UAVs swarm as well as the decision-making process.; 关键词 deep deterministic policy gradient(DDPG)algorithm unmanned aerial vehicles(UAVs)swarm task decision making deep reinforcement learning sparse reward problem; 分类号 V279 [航空宇航科学与技术—飞行器设计] TP18 [自动化与计算机技术—控制理论与控制工程]; 在线阅读下载PDF 职称材料

题名基于深度强化学习的IRS辅助NOMA-MEC通信资源分配优化被引量：1: 4; 作者方娟刘珍珍陈思琪李硕朋; 机构北京工业大学信息学部; 出处《北京工业大学学报》 CAS CSCD 北大核心 2024年第8期930-938,共9页; 基金国家自然科学基金资助项目(61202076) 北京市自然科学基金资助项目(4192007)。; 文摘为了解决无法与边缘服务器建立直连通信链路的盲区边缘用户卸载任务的问题,设计了一个基于深度强化学习(deep reinforcement learning, DRL)的智能反射面(intelligent reflecting surface, IRS)辅助非正交多址(non-orthogonal multiple access, NOMA)通信的资源分配优化算法,以获得由系统和速率和能源效率(energy efficiency, EE)加权的最大系统收益,从而实现绿色高效通信。通过深度确定性策略梯度(deep deterministic policy gradient, DDPG)算法联合优化传输功率分配和IRS的反射相移矩阵。仿真结果表明,使用DDPG算法处理移动边缘计算(mobile edge computing, MEC)的通信资源分配优于其他几种对比实验算法。; 关键词非正交多址(non-orthogonal multiple access NOMA) 智能反射面(intelligent reflecting surface IRS) 深度确定性策略梯度(deep deterministic policy gradient DDPG)算法移动边缘计算(mobile edge computing MEC) 能源效率(energy efficiency EE) 系统收益; Keywords non-orthogonal multiple access(NOMA) intelligent reflecting surface(IRS) deep deterministic policy gradient(DDPG)algorithm mobile edge computing(MEC) energy efficiency(EE) system profits; 分类号 TN929.5 [电子电信—通信与信息系统]; 在线阅读下载PDF 职称材料

题名Deep reinforcement learning guidance with impact time control: 5; 作者 LI Guofei LI Shituo LI Bohao WU Yunjie; 机构 School of Astronautics Beijing Institute of Control and Electronic Technology School of Automation Science and Electrical Engineering; 出处《Journal of Systems Engineering and Electronics》 CSCD 2024年第6期1594-1603,共10页; 基金 supported by the National Natural Science Foundation of China(62003021,62373304) Industry-University-Research Innovation Fund for Chinese Universities(2021ZYA02009) +2 种基金 the Fundamental Research Funds for the Central Universities(D5000210830).; 文摘 In consideration of the field-of-view(FOV)angle con-straint,this study focuses on the guidance problem with impact time control.A deep reinforcement learning guidance method is given for the missile to obtain the desired impact time and meet the demand of FOV angle constraint.On basis of the framework of the proportional navigation guidance,an auxiliary control term is supplemented by the distributed deep deterministic policy gradient algorithm,in which the reward functions are developed to decrease the time-to-go error and improve the terminal guid-ance accuracy.The numerical simulation demonstrates that the missile governed by the presented deep reinforcement learning guidance law can hit the target successfully at appointed arrival time.; 关键词 impact time deep reinforcement learning guidance law field-of-view(FOV)angle deep deterministic policy gradient; 分类号 TJ765.3 [兵器科学与技术—武器系统与运用工程]; 在线阅读下载PDF 职称材料

	题名	作者	出处	发文年	被引量	操作
1	基于Policy Gradient的机械臂运动跟踪控制器参数整定	韩霖骁胡剑波宋仕元王应洋贺子厚张鹏	《系统工程与电子技术》 EI CSCD 北大核心	2021	4	在线阅读下载PDF 职称材料
2	Deep reinforcement learning and its application in autonomous fitting optimization for attack areas of UCAVs	LI Yue QIU Xiaohui LIU Xiaodong XIA Qunli	《Journal of Systems Engineering and Electronics》 SCIE EI CSCD	2020	14	在线阅读下载PDF 职称材料
3	A UAV collaborative defense scheme driven by DDPG algorithm	ZHANG Yaozhong WU Zhuoran XIONG Zhenkai CHEN Long	《Journal of Systems Engineering and Electronics》 SCIE EI CSCD	2023	3	在线阅读下载PDF 职称材料
4	基于深度强化学习的IRS辅助NOMA-MEC通信资源分配优化	方娟刘珍珍陈思琪李硕朋	《北京工业大学学报》 CAS CSCD 北大核心	2024	1	在线阅读下载PDF 职称材料
5	Deep reinforcement learning guidance with impact time control	LI Guofei LI Shituo LI Bohao WU Yunjie	《Journal of Systems Engineering and Electronics》 CSCD	2024	0	在线阅读下载PDF 职称材料