期刊文献+
共找到1,754篇文章
< 1 2 88 >
每页显示 20 50 100
UAV maneuvering decision-making algorithm based on deep reinforcement learning under the guidance of expert experience
1
作者 ZHAN Guang ZHANG Kun +1 位作者 LI Ke PIAO Haiyin 《Journal of Systems Engineering and Electronics》 SCIE CSCD 2024年第3期644-665,共22页
Autonomous umanned aerial vehicle(UAV) manipulation is necessary for the defense department to execute tactical missions given by commanders in the future unmanned battlefield. A large amount of research has been devo... Autonomous umanned aerial vehicle(UAV) manipulation is necessary for the defense department to execute tactical missions given by commanders in the future unmanned battlefield. A large amount of research has been devoted to improving the autonomous decision-making ability of UAV in an interactive environment, where finding the optimal maneuvering decisionmaking policy became one of the key issues for enabling the intelligence of UAV. In this paper, we propose a maneuvering decision-making algorithm for autonomous air-delivery based on deep reinforcement learning under the guidance of expert experience. Specifically, we refine the guidance towards area and guidance towards specific point tasks for the air-delivery process based on the traditional air-to-surface fire control methods.Moreover, we construct the UAV maneuvering decision-making model based on Markov decision processes(MDPs). Specifically, we present a reward shaping method for the guidance towards area and guidance towards specific point tasks using potential-based function and expert-guided advice. The proposed algorithm could accelerate the convergence of the maneuvering decision-making policy and increase the stability of the policy in terms of the output during the later stage of training process. The effectiveness of the proposed maneuvering decision-making policy is illustrated by the curves of training parameters and extensive experimental results for testing the trained policy. 展开更多
关键词 unmanned aerial vehicle(UAV) maneuvering decision-making autonomous air-delivery deep reinforcement learning reward shaping expert experience
在线阅读 下载PDF
Deep reinforcement learning guidance with impact time control
2
作者 LI Guofei LI Shituo +1 位作者 LI Bohao WU Yunjie 《Journal of Systems Engineering and Electronics》 CSCD 2024年第6期1594-1603,共10页
In consideration of the field-of-view(FOV)angle con-straint,this study focuses on the guidance problem with impact time control.A deep reinforcement learning guidance method is given for the missile to obtain the desi... In consideration of the field-of-view(FOV)angle con-straint,this study focuses on the guidance problem with impact time control.A deep reinforcement learning guidance method is given for the missile to obtain the desired impact time and meet the demand of FOV angle constraint.On basis of the framework of the proportional navigation guidance,an auxiliary control term is supplemented by the distributed deep deterministic policy gradient algorithm,in which the reward functions are developed to decrease the time-to-go error and improve the terminal guid-ance accuracy.The numerical simulation demonstrates that the missile governed by the presented deep reinforcement learning guidance law can hit the target successfully at appointed arrival time. 展开更多
关键词 impact time deep reinforcement learning guidance law field-of-view(FOV)angle deep deterministic policy gradient
在线阅读 下载PDF
Tactical reward shaping for large-scale combat by multi-agent reinforcement learning
3
作者 DUO Nanxun WANG Qinzhao +1 位作者 LYU Qiang WANG Wei 《Journal of Systems Engineering and Electronics》 CSCD 2024年第6期1516-1529,共14页
Future unmanned battles desperately require intelli-gent combat policies,and multi-agent reinforcement learning offers a promising solution.However,due to the complexity of combat operations and large size of the comb... Future unmanned battles desperately require intelli-gent combat policies,and multi-agent reinforcement learning offers a promising solution.However,due to the complexity of combat operations and large size of the combat group,this task suffers from credit assignment problem more than other rein-forcement learning tasks.This study uses reward shaping to relieve the credit assignment problem and improve policy train-ing for the new generation of large-scale unmanned combat operations.We first prove that multiple reward shaping func-tions would not change the Nash Equilibrium in stochastic games,providing theoretical support for their use.According to the characteristics of combat operations,we propose tactical reward shaping(TRS)that comprises maneuver shaping advice and threat assessment-based attack shaping advice.Then,we investigate the effects of different types and combinations of shaping advice on combat policies through experiments.The results show that TRS improves both the efficiency and attack accuracy of combat policies,with the combination of maneuver reward shaping advice and ally-focused attack shaping advice achieving the best performance compared with that of the base-line strategy. 展开更多
关键词 deep reinforcement learning multi-agent reinforce-ment learning multi-agent combat unmanned battle reward shaping
在线阅读 下载PDF
Task assignment in ground-to-air confrontation based on multiagent deep reinforcement learning 被引量:4
4
作者 Jia-yi Liu Gang Wang +2 位作者 Qiang Fu Shao-hua Yue Si-yuan Wang 《Defence Technology(防务技术)》 SCIE EI CAS CSCD 2023年第1期210-219,共10页
The scale of ground-to-air confrontation task assignments is large and needs to deal with many concurrent task assignments and random events.Aiming at the problems where existing task assignment methods are applied to... The scale of ground-to-air confrontation task assignments is large and needs to deal with many concurrent task assignments and random events.Aiming at the problems where existing task assignment methods are applied to ground-to-air confrontation,there is low efficiency in dealing with complex tasks,and there are interactive conflicts in multiagent systems.This study proposes a multiagent architecture based on a one-general agent with multiple narrow agents(OGMN)to reduce task assignment conflicts.Considering the slow speed of traditional dynamic task assignment algorithms,this paper proposes the proximal policy optimization for task assignment of general and narrow agents(PPOTAGNA)algorithm.The algorithm based on the idea of the optimal assignment strategy algorithm and combined with the training framework of deep reinforcement learning(DRL)adds a multihead attention mechanism and a stage reward mechanism to the bilateral band clipping PPO algorithm to solve the problem of low training efficiency.Finally,simulation experiments are carried out in the digital battlefield.The multiagent architecture based on OGMN combined with the PPO-TAGNA algorithm can obtain higher rewards faster and has a higher win ratio.By analyzing agent behavior,the efficiency,superiority and rationality of resource utilization of this method are verified. 展开更多
关键词 Ground-to-air confrontation Task assignment General and narrow agents deep reinforcement learning Proximal policy optimization(PPO)
在线阅读 下载PDF
Deep reinforcement learning for UAV swarm rendezvous behavior 被引量:2
5
作者 ZHANG Yaozhong LI Yike +1 位作者 WU Zhuoran XU Jialin 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2023年第2期360-373,共14页
The unmanned aerial vehicle(UAV)swarm technology is one of the research hotspots in recent years.With the continuous improvement of autonomous intelligence of UAV,the swarm technology of UAV will become one of the mai... The unmanned aerial vehicle(UAV)swarm technology is one of the research hotspots in recent years.With the continuous improvement of autonomous intelligence of UAV,the swarm technology of UAV will become one of the main trends of UAV development in the future.This paper studies the behavior decision-making process of UAV swarm rendezvous task based on the double deep Q network(DDQN)algorithm.We design a guided reward function to effectively solve the problem of algorithm convergence caused by the sparse return problem in deep reinforcement learning(DRL)for the long period task.We also propose the concept of temporary storage area,optimizing the memory playback unit of the traditional DDQN algorithm,improving the convergence speed of the algorithm,and speeding up the training process of the algorithm.Different from traditional task environment,this paper establishes a continuous state-space task environment model to improve the authentication process of UAV task environment.Based on the DDQN algorithm,the collaborative tasks of UAV swarm in different task scenarios are trained.The experimental results validate that the DDQN algorithm is efficient in terms of training UAV swarm to complete the given collaborative tasks while meeting the requirements of UAV swarm for centralization and autonomy,and improving the intelligence of UAV swarm collaborative task execution.The simulation results show that after training,the proposed UAV swarm can carry out the rendezvous task well,and the success rate of the mission reaches 90%. 展开更多
关键词 double deep Q network(DDQN)algorithms unmanned aerial vehicle(UAV)swarm task decision deep reinforcement learning(DRL) sparse returns
在线阅读 下载PDF
A deep reinforcement learning method for multi-stage equipment development planning in uncertain environments 被引量:1
6
作者 LIU Peng XIA Boyuan +2 位作者 YANG Zhiwei LI Jichao TAN Yuejin 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2022年第6期1159-1175,共17页
Equipment development planning(EDP)is usually a long-term process often performed in an environment with high uncertainty.The traditional multi-stage dynamic programming cannot cope with this kind of uncertainty with ... Equipment development planning(EDP)is usually a long-term process often performed in an environment with high uncertainty.The traditional multi-stage dynamic programming cannot cope with this kind of uncertainty with unpredictable situations.To deal with this problem,a multi-stage EDP model based on a deep reinforcement learning(DRL)algorithm is proposed to respond quickly to any environmental changes within a reasonable range.Firstly,the basic problem of multi-stage EDP is described,and a mathematical planning model is constructed.Then,for two kinds of uncertainties(future capabi lity requirements and the amount of investment in each stage),a corresponding DRL framework is designed to define the environment,state,action,and reward function for multi-stage EDP.After that,the dueling deep Q-network(Dueling DQN)algorithm is used to solve the multi-stage EDP to generate an approximately optimal multi-stage equipment development scheme.Finally,a case of ten kinds of equipment in 100 possible environments,which are randomly generated,is used to test the feasibility and effectiveness of the proposed models.The results show that the algorithm can respond instantaneously in any state of the multistage EDP environment and unlike traditional algorithms,the algorithm does not need to re-optimize the problem for any change in the environment.In addition,the algorithm can flexibly adjust at subsequent planning stages in the event of a change to the equipment capability requirements to adapt to the new requirements. 展开更多
关键词 equipment development planning(EDP) MULTI-STAGE reinforcement learning uncertainty dueling deep Q-network(Dueling DQN)
在线阅读 下载PDF
UAV Frequency-based Crowdsensing Using Grouping Multi-agent Deep Reinforcement Learning
7
作者 Cui ZHANG En WANG +2 位作者 Funing YANG Yong jian YANG Nan JIANG 《计算机科学》 CSCD 北大核心 2023年第2期57-68,共12页
Mobile CrowdSensing(MCS)is a promising sensing paradigm that recruits users to cooperatively perform sensing tasks.Recently,unmanned aerial vehicles(UAVs)as the powerful sensing devices are used to replace user partic... Mobile CrowdSensing(MCS)is a promising sensing paradigm that recruits users to cooperatively perform sensing tasks.Recently,unmanned aerial vehicles(UAVs)as the powerful sensing devices are used to replace user participation and carry out some special tasks,such as epidemic monitoring and earthquakes rescue.In this paper,we focus on scheduling UAVs to sense the task Point-of-Interests(PoIs)with different frequency coverage requirements.To accomplish the sensing task,the scheduling strategy needs to consider the coverage requirement,geographic fairness and energy charging simultaneously.We consider the complex interaction among UAVs and propose a grouping multi-agent deep reinforcement learning approach(G-MADDPG)to schedule UAVs distributively.G-MADDPG groups all UAVs into some teams by a distance-based clustering algorithm(DCA),then it regards each team as an agent.In this way,G-MADDPG solves the problem that the training time of traditional MADDPG is too long to converge when the number of UAVs is large,and the trade-off between training time and result accuracy could be controlled flexibly by adjusting the number of teams.Extensive simulation results show that our scheduling strategy has better performance compared with three baselines and is flexible in balancing training time and result accuracy. 展开更多
关键词 UAV Crowdsensing Frequency coverage Grouping multi-agent deep reinforcement learning
在线阅读 下载PDF
A guidance method for coplanar orbital interception based on reinforcement learning 被引量:6
8
作者 ZENG Xin ZHU Yanwei +1 位作者 YANG Leping ZHANG Chengming 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2021年第4期927-938,共12页
This paper investigates the guidance method based on reinforcement learning(RL)for the coplanar orbital interception in a continuous low-thrust scenario.The problem is formulated into a Markov decision process(MDP)mod... This paper investigates the guidance method based on reinforcement learning(RL)for the coplanar orbital interception in a continuous low-thrust scenario.The problem is formulated into a Markov decision process(MDP)model,then a welldesigned RL algorithm,experience based deep deterministic policy gradient(EBDDPG),is proposed to solve it.By taking the advantage of prior information generated through the optimal control model,the proposed algorithm not only resolves the convergence problem of the common RL algorithm,but also successfully trains an efficient deep neural network(DNN)controller for the chaser spacecraft to generate the control sequence.Numerical simulation results show that the proposed algorithm is feasible and the trained DNN controller significantly improves the efficiency over traditional optimization methods by roughly two orders of magnitude. 展开更多
关键词 orbital interception reinforcement learning(RL) Markov decision process(MDP) deep neural network(DNN)
在线阅读 下载PDF
Cooperative multi-target hunting by unmanned surface vehicles based on multi-agent reinforcement learning 被引量:2
9
作者 Jiawei Xia Yasong Luo +3 位作者 Zhikun Liu Yalun Zhang Haoran Shi Zhong Liu 《Defence Technology(防务技术)》 SCIE EI CAS CSCD 2023年第11期80-94,共15页
To solve the problem of multi-target hunting by an unmanned surface vehicle(USV)fleet,a hunting algorithm based on multi-agent reinforcement learning is proposed.Firstly,the hunting environment and kinematic model wit... To solve the problem of multi-target hunting by an unmanned surface vehicle(USV)fleet,a hunting algorithm based on multi-agent reinforcement learning is proposed.Firstly,the hunting environment and kinematic model without boundary constraints are built,and the criteria for successful target capture are given.Then,the cooperative hunting problem of a USV fleet is modeled as a decentralized partially observable Markov decision process(Dec-POMDP),and a distributed partially observable multitarget hunting Proximal Policy Optimization(DPOMH-PPO)algorithm applicable to USVs is proposed.In addition,an observation model,a reward function and the action space applicable to multi-target hunting tasks are designed.To deal with the dynamic change of observational feature dimension input by partially observable systems,a feature embedding block is proposed.By combining the two feature compression methods of column-wise max pooling(CMP)and column-wise average-pooling(CAP),observational feature encoding is established.Finally,the centralized training and decentralized execution framework is adopted to complete the training of hunting strategy.Each USV in the fleet shares the same policy and perform actions independently.Simulation experiments have verified the effectiveness of the DPOMH-PPO algorithm in the test scenarios with different numbers of USVs.Moreover,the advantages of the proposed model are comprehensively analyzed from the aspects of algorithm performance,migration effect in task scenarios and self-organization capability after being damaged,the potential deployment and application of DPOMH-PPO in the real environment is verified. 展开更多
关键词 Unmanned surface vehicles Multi-agent deep reinforcement learning Cooperative hunting Feature embedding Proximal policy optimization
在线阅读 下载PDF
Hierarchical reinforcement learning guidance with threat avoidance 被引量:1
10
作者 LI Bohao WU Yunjie LI Guofei 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2022年第5期1173-1185,共13页
The guidance strategy is an extremely critical factor in determining the striking effect of the missile operation.A novel guidance law is presented by exploiting the deep reinforcement learning(DRL)with the hierarchic... The guidance strategy is an extremely critical factor in determining the striking effect of the missile operation.A novel guidance law is presented by exploiting the deep reinforcement learning(DRL)with the hierarchical deep deterministic policy gradient(DDPG)algorithm.The reward functions are constructed to minimize the line-of-sight(LOS)angle rate and avoid the threat caused by the opposed obstacles.To attenuate the chattering of the acceleration,a hierarchical reinforcement learning structure and an improved reward function with action penalty are put forward.The simulation results validate that the missile under the proposed method can hit the target successfully and keep away from the threatened areas effectively. 展开更多
关键词 guidance law deep reinforcement learning(DRL) threat avoidance hierarchical reinforcement learning
在线阅读 下载PDF
改进Deep Q Networks的交通信号均衡调度算法
11
作者 贺道坤 《机械设计与制造》 北大核心 2025年第4期135-140,共6页
为进一步缓解城市道路高峰时段十字路口的交通拥堵现象,实现路口各道路车流均衡通过,基于改进Deep Q Networks提出了一种的交通信号均衡调度算法。提取十字路口与交通信号调度最相关的特征,分别建立单向十字路口交通信号模型和线性双向... 为进一步缓解城市道路高峰时段十字路口的交通拥堵现象,实现路口各道路车流均衡通过,基于改进Deep Q Networks提出了一种的交通信号均衡调度算法。提取十字路口与交通信号调度最相关的特征,分别建立单向十字路口交通信号模型和线性双向十字路口交通信号模型,并基于此构建交通信号调度优化模型;针对Deep Q Networks算法在交通信号调度问题应用中所存在的收敛性、过估计等不足,对Deep Q Networks进行竞争网络改进、双网络改进以及梯度更新策略改进,提出相适应的均衡调度算法。通过与经典Deep Q Networks仿真比对,验证论文算法对交通信号调度问题的适用性和优越性。基于城市道路数据,分别针对两种场景进行仿真计算,仿真结果表明该算法能够有效缩减十字路口车辆排队长度,均衡各路口车流通行量,缓解高峰出行方向的道路拥堵现象,有利于十字路口交通信号调度效益的提升。 展开更多
关键词 交通信号调度 十字路口 deep Q Networks 深度强化学习 智能交通
在线阅读 下载PDF
一种利用优先经验回放深度Q-Learning的频谱接入算法 被引量:7
12
作者 盘小娜 陈哲 +1 位作者 李金泽 覃团发 《电讯技术》 北大核心 2020年第5期489-495,共7页
针对认知无线传感器网络中频谱接入算法的频谱利用率不高、重要经验利用率不足、收敛速度慢等问题,提出了一种采用优先经验回放双深度Q-Learning的动态频谱接入算法。该算法的次用户对经验库进行抽样时,采用基于优先级抽样的方式,以打... 针对认知无线传感器网络中频谱接入算法的频谱利用率不高、重要经验利用率不足、收敛速度慢等问题,提出了一种采用优先经验回放双深度Q-Learning的动态频谱接入算法。该算法的次用户对经验库进行抽样时,采用基于优先级抽样的方式,以打破样本相关性并充分利用重要的经验样本,并采用一种非排序批量删除方式删除经验库的无用经验样本,以降低能量开销。仿真结果表明,该算法与采用双深度Q-Learning的频谱接入算法相比提高了收敛速度;与传统随机频谱接入算法相比,其阻塞概率降低了6%~10%,吞吐量提高了18%~20%,提高了系统的性能。 展开更多
关键词 认知无线传感器网络 动态频谱接入 强化学习 深度Q-learning
在线阅读 下载PDF
Maneuvering target tracking of UAV based on MN-DDPG and transfer learning 被引量:15
13
作者 Bo Li Zhi-peng Yang +2 位作者 Da-qing Chen Shi-yang Liang Hao Ma 《Defence Technology(防务技术)》 SCIE EI CAS CSCD 2021年第2期457-466,共10页
Tracking maneuvering target in real time autonomously and accurately in an uncertain environment is one of the challenging missions for unmanned aerial vehicles(UAVs).In this paper,aiming to address the control proble... Tracking maneuvering target in real time autonomously and accurately in an uncertain environment is one of the challenging missions for unmanned aerial vehicles(UAVs).In this paper,aiming to address the control problem of maneuvering target tracking and obstacle avoidance,an online path planning approach for UAV is developed based on deep reinforcement learning.Through end-to-end learning powered by neural networks,the proposed approach can achieve the perception of the environment and continuous motion output control.This proposed approach includes:(1)A deep deterministic policy gradient(DDPG)-based control framework to provide learning and autonomous decision-making capability for UAVs;(2)An improved method named MN-DDPG for introducing a type of mixed noises to assist UAV with exploring stochastic strategies for online optimal planning;and(3)An algorithm of taskdecomposition and pre-training for efficient transfer learning to improve the generalization capability of UAV’s control model built based on MN-DDPG.The experimental simulation results have verified that the proposed approach can achieve good self-adaptive adjustment of UAV’s flight attitude in the tasks of maneuvering target tracking with a significant improvement in generalization capability and training efficiency of UAV tracking controller in uncertain environments. 展开更多
关键词 UAVS Maneuvering target tracking deep reinforcement learning MN-DDPG Mixed noises Transfer learning
在线阅读 下载PDF
A learning-based flexible autonomous motion control method for UAV in dynamic unknown environments 被引量:3
14
作者 WAN Kaifang LI Bo +2 位作者 GAO Xiaoguang HU Zijian YANG Zhipeng 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2021年第6期1490-1508,共19页
This paper presents a deep reinforcement learning(DRL)-based motion control method to provide unmanned aerial vehicles(UAVs)with additional flexibility while flying across dynamic unknown environments autonomously.Thi... This paper presents a deep reinforcement learning(DRL)-based motion control method to provide unmanned aerial vehicles(UAVs)with additional flexibility while flying across dynamic unknown environments autonomously.This method is applicable in both military and civilian fields such as penetration and rescue.The autonomous motion control problem is addressed through motion planning,action interpretation,trajectory tracking,and vehicle movement within the DRL framework.Novel DRL algorithms are presented by combining two difference-amplifying approaches with traditional DRL methods and are used for solving the motion planning problem.An improved Lyapunov guidance vector field(LGVF)method is used to handle the trajectory-tracking problem and provide guidance control commands for the UAV.In contrast to conventional motion-control approaches,the proposed methods directly map the sensorbased detections and measurements into control signals for the inner loop of the UAV,i.e.,an end-to-end control.The training experiment results show that the novel DRL algorithms provide more than a 20%performance improvement over the state-ofthe-art DRL algorithms.The testing experiment results demonstrate that the controller based on the novel DRL and LGVF,which is only trained once in a static environment,enables the UAV to fly autonomously in various dynamic unknown environments.Thus,the proposed technique provides strong flexibility for the controller. 展开更多
关键词 autonomous motion control(AMC) deep reinforcement learning(DRL) difference amplify reward shaping
在线阅读 下载PDF
基于深度强化学习的温室环境协调控制系统设计 被引量:2
15
作者 左志宇 牟晋东 +4 位作者 毛罕平 韩绿化 胡建平 张晓东 金文帅 《农机化研究》 北大核心 2025年第5期22-27,共6页
针对温室温度、光照、水肥控制不协调导致的能耗高、水肥利用率低的问题,提出了基于深度强化学习的温室环境协调控制方法。以能耗、光合速率为优化目标,采用深度强化学习算法训练模型,对温度、光照调控目标值进行优化;通过分析不同营养... 针对温室温度、光照、水肥控制不协调导致的能耗高、水肥利用率低的问题,提出了基于深度强化学习的温室环境协调控制方法。以能耗、光合速率为优化目标,采用深度强化学习算法训练模型,对温度、光照调控目标值进行优化;通过分析不同营养液灌溉量对作物长势的影响,确定灌溉量动态调整方法;开发了基于深度强化学习的温室环境协调控制系统软硬件。实验结果表明:该方法能够协调控制温室温度、光照和水肥环境因子,与传统控制方法相比,环境调控能耗降低8.1%,营养液灌溉量降低7.9%,光合速率提升2.7%,能够为温室环境高效控制提供决策支持。 展开更多
关键词 温室 深度强化学习 协调控制 光合速率 能耗
在线阅读 下载PDF
多目标联合优化的车联网动态资源分配算法 被引量:3
16
作者 宋晓勤 张文静 +2 位作者 雷磊 宋铁成 赵丽屏 《东南大学学报(自然科学版)》 北大核心 2025年第1期266-274,共9页
为了解决车联网(IoV)信道高动态不确定性及多用户干扰所导致的通信传输性能下降问题,提出了一种基于多智能体增强型双深度Q网络(EDDQN)的多目标联合优化资源分配算法。首先,考虑车辆运动和信道时变特性,建立多用户干扰下频谱共享和功率... 为了解决车联网(IoV)信道高动态不确定性及多用户干扰所导致的通信传输性能下降问题,提出了一种基于多智能体增强型双深度Q网络(EDDQN)的多目标联合优化资源分配算法。首先,考虑车辆运动和信道时变特性,建立多用户干扰下频谱共享和功率控制联合优化的资源分配决策模型,在满足时延和可靠性等约束下,最小化网络时延和能耗加权和(成本);然后,将模型转换为马尔可夫决策过程(MDP),利用双深度Q网络(DDQN),并引入优先经验回放和多步学习,通过集中式训练和分布式执行,优化车间(V2V)链路的频谱共享和功率分配策略。结果表明,所提算法具有良好的收敛性,在不同负载下相较对比算法成本减少8%以上,负载传输成功率提升19%以上,有效提高了通信传输性能。 展开更多
关键词 车联网 多用户干扰 多目标联合优化 深度强化学习
在线阅读 下载PDF
基于多智能体深度强化学习的随机事件驱动故障恢复策略 被引量:2
17
作者 王冲 石大夯 +3 位作者 万灿 陈霞 吴峰 鞠平 《电力自动化设备》 北大核心 2025年第3期186-193,共8页
为了减少配电网故障引起的失负荷,提升配电网弹性,提出一种基于多智能体深度强化学习的随机事件驱动故障恢复策略:提出了在电力交通耦合网故障恢复中的随机事件驱动问题,将该问题描述为半马尔可夫随机决策过程问题;综合考虑系统故障恢... 为了减少配电网故障引起的失负荷,提升配电网弹性,提出一种基于多智能体深度强化学习的随机事件驱动故障恢复策略:提出了在电力交通耦合网故障恢复中的随机事件驱动问题,将该问题描述为半马尔可夫随机决策过程问题;综合考虑系统故障恢复优化目标,构建基于半马尔可夫的随机事件驱动故障恢复模型;利用多智能体深度强化学习算法对所构建的随机事件驱动模型进行求解。在IEEE 33节点配电网与Sioux Falls市交通网形成的电力交通耦合系统中进行算例验证,结果表明所提模型和方法在电力交通耦合网故障恢复中有着较好的应用效果,可实时调控由随机事件(故障维修和交通行驶)导致的故障恢复变化。 展开更多
关键词 随机事件驱动 故障恢复 深度强化学习 电力交通耦合网 多智能体
在线阅读 下载PDF
面向DAG任务的分布式智能计算卸载和服务缓存联合优化 被引量:1
18
作者 李云 南子煜 +2 位作者 姚枝秀 夏士超 鲜永菊 《中山大学学报(自然科学版)(中英文)》 CAS 北大核心 2025年第1期71-82,共12页
建立了一种有向无环图(DAG,directed acyclic graph)任务卸载和资源优化问题,旨在应用最大可容忍时延等约束实现系统能耗最小化。考虑到网络中计算请求高度动态、完整的系统状态信息难以获取等因素,最后使用多智能体深度确定性策略梯度(... 建立了一种有向无环图(DAG,directed acyclic graph)任务卸载和资源优化问题,旨在应用最大可容忍时延等约束实现系统能耗最小化。考虑到网络中计算请求高度动态、完整的系统状态信息难以获取等因素,最后使用多智能体深度确定性策略梯度(MADDPG,multi-agent deep deterministic policy gradient)算法来探寻最优的策略。相比于现有的任务卸载算法,MADDPG算法能够降低14.2%至40.8%的系统平均能耗,并且本地缓存命中率提高3.7%至4.1%。 展开更多
关键词 移动边缘计算 多智能体深度强化学习 计算卸载 资源分配 服务缓存
在线阅读 下载PDF
基于深度强化学习的移动边缘计算安全传输策略研究 被引量:2
19
作者 王义君 李嘉欣 +2 位作者 闫志颖 吕婧莹 钱志鸿 《通信学报》 北大核心 2025年第4期272-281,共10页
在移动边缘计算中,任务卸载过程中会面临信息泄露和被窃听等安全问题。为了提高移动边缘计算系统的安全传输效率,提出了无人机辅助物理层安全传输策略。首先,构建了无人机(UAV)搭载的移动边缘计算系统,由I个用户设备、M架合法无人机(L-U... 在移动边缘计算中,任务卸载过程中会面临信息泄露和被窃听等安全问题。为了提高移动边缘计算系统的安全传输效率,提出了无人机辅助物理层安全传输策略。首先,构建了无人机(UAV)搭载的移动边缘计算系统,由I个用户设备、M架合法无人机(L-UAV)和N架窃听无人机(E-UAV)构成;其次,保证L-UAV在规定周期内完成卸载任务的同时,以通信系统安全传输效率最大化为目标,采用引入注意力机制的多智能体深度确定性策略梯度(A-MADDPG)算法进行问题求解与优化;最后,在保证卸载前提下实现用户的机密信息不被窃听者窃听和安全计算效率最大化,保障系统整体安全性。仿真结果表明,所提算法相较于其他基准算法展现了更佳性能,在安全传输效率方面表现优越。 展开更多
关键词 移动边缘计算 物理层安全 深度强化学习 无人机辅助卸载
在线阅读 下载PDF
车联网边缘计算环境下基于流量预测的高效任务卸载策略研究 被引量:1
20
作者 许小龙 杨威 +4 位作者 杨辰翊 程勇 齐连永 项昊龙 窦万春 《电子学报》 北大核心 2025年第2期329-343,共15页
车联网(Internet of Vehicles,IoV)边缘计算通过将移动边缘计算和车联网相结合,实现了车辆计算任务从云服务器向边缘服务器的下沉,从而有效降低了车联网服务的响应时延.然而,车联网中不规则的交通流时空分布会导致边缘服务器计算负载不... 车联网(Internet of Vehicles,IoV)边缘计算通过将移动边缘计算和车联网相结合,实现了车辆计算任务从云服务器向边缘服务器的下沉,从而有效降低了车联网服务的响应时延.然而,车联网中不规则的交通流时空分布会导致边缘服务器计算负载不均衡,进而影响车联网服务的实时响应.为此,本文提出了一种车联网边缘计算环境下基于流量预测的高效任务卸载策略.具体而言,首先设计了能充分挖掘路段间连通性和距离信息的切比雪夫图加权网络(Chebyshev graph Weighted Network,ChebWN)进行交通流量预测.然后,设计了一种基于深度强化学习的二元任务卸载方法(DRL-based Binary task Offloading Algorithm,DBOA),该算法将二元任务卸载的决策过程分为两个阶段,即首先通过深度强化学习得到卸载策略,再通过一维双端查找算法确定最大化总计算速率的时间片分配方案,降低了决策过程的复杂度.最后,通过大量的对比实验验证了ChebWN在预测交通流量方面的准确性,以及DBOA在提升车联网服务响应速度方面的优越性. 展开更多
关键词 移动边缘计算 深度强化学习 车联网 图神经网络(GNN) 任务卸载
在线阅读 下载PDF
上一页 1 2 88 下一页 到第
使用帮助 返回顶部