The scale of ground-to-air confrontation task assignments is large and needs to deal with many concurrent task assignments and random events.Aiming at the problems where existing task assignment methods are applied to...The scale of ground-to-air confrontation task assignments is large and needs to deal with many concurrent task assignments and random events.Aiming at the problems where existing task assignment methods are applied to ground-to-air confrontation,there is low efficiency in dealing with complex tasks,and there are interactive conflicts in multiagent systems.This study proposes a multiagent architecture based on a one-general agent with multiple narrow agents(OGMN)to reduce task assignment conflicts.Considering the slow speed of traditional dynamic task assignment algorithms,this paper proposes the proximal policy optimization for task assignment of general and narrow agents(PPOTAGNA)algorithm.The algorithm based on the idea of the optimal assignment strategy algorithm and combined with the training framework of deep reinforcement learning(DRL)adds a multihead attention mechanism and a stage reward mechanism to the bilateral band clipping PPO algorithm to solve the problem of low training efficiency.Finally,simulation experiments are carried out in the digital battlefield.The multiagent architecture based on OGMN combined with the PPO-TAGNA algorithm can obtain higher rewards faster and has a higher win ratio.By analyzing agent behavior,the efficiency,superiority and rationality of resource utilization of this method are verified.展开更多
城市电网在发生N-1故障后,极可能新增运行风险,导致N-1-1时出现大面积停电事故。为管控城市电网N-1后运行风险,该文提出一种改进双智能体竞争双深度Q网络(dueling double deep Q network,D3QN)的城市电网N-1风险管控转供策略。根据风险...城市电网在发生N-1故障后,极可能新增运行风险,导致N-1-1时出现大面积停电事故。为管控城市电网N-1后运行风险,该文提出一种改进双智能体竞争双深度Q网络(dueling double deep Q network,D3QN)的城市电网N-1风险管控转供策略。根据风险管控原则,提出一种无需额外历史数据、考虑备自投装置、单供变电站风险和单供负荷母线风险的N-1场景指标;建立计及动作次序、指标间关系的负荷转供三阶段求解模型。以含预动作-变化探索值选择策略的改进双智能体D3QN方法,将负荷转供分为多个子转供环节学习,使转供思路清晰化,对动作空间进行降维,提高训练寻优效果,得到管控N-1风险的负荷转供策略。通过城市电网多场景算例分析,验证该文模型和方法的有效性。展开更多
现有研究在多QoS(quality of service)调度问题中,由于仅依赖即时奖励反馈机制,在资源受限的场景下处理时延敏感数据和具有连续传输需求的媒体数据时,存在可扩展性差和资源浪费的问题。为此,提出了一种基于奖励回溯的DQN(reward backtra...现有研究在多QoS(quality of service)调度问题中,由于仅依赖即时奖励反馈机制,在资源受限的场景下处理时延敏感数据和具有连续传输需求的媒体数据时,存在可扩展性差和资源浪费的问题。为此,提出了一种基于奖励回溯的DQN(reward backtracking based deep Q-network,RB-DQN)算法。该算法通过未来时刻的交互来回溯调整当前状态的策略评估,以更加有效地识别并解决因不合理调度策略导致的丢包。同时,设计了一种时延-吞吐均衡度量(latency throughput trade-off,LTT)指标,该指标综合考虑了时延敏感数据和媒体类型数据的业务需求,并可通过权重调整来突出不同的侧重点。大量仿真结果表明,与其他调度策略相比,所提算法能够有效降低时延敏感数据的延迟和抖动,同时确保媒体类型数据的流畅性与稳定性。展开更多
针对消防设施选址问题,构建考虑时效性、市民等待救援的焦急心理和建设成本的三目标消防设施选址模型,以实现更科学的消防设施布局。鉴于该问题的NP难特性,提出基于算子学习的多目标深度强化学习模型(multi-objective deep reinforcemen...针对消防设施选址问题,构建考虑时效性、市民等待救援的焦急心理和建设成本的三目标消防设施选址模型,以实现更科学的消防设施布局。鉴于该问题的NP难特性,提出基于算子学习的多目标深度强化学习模型(multi-objective deep reinforcement learning,MDRL)。设计多种优化算子作为强化学习的动作空间,训练策略网络以选择最佳优化算子来改进解决方案。针对多目标问题,设计基于优势差异的方法(MDRL-AD)和基于支配性评估的方法(MDRL-DE)。采用四种规模的测试算例及实际案例进行数值实验,将MDRL和改进的NSGA-Ⅱ、MOPSO、L2I算法进行比较,并利用Hypervolume指标、Spacing指标、Ω指标、IGD指标对算法性能进行评估。实验结果表明,MDRL-AD方法更适用于求解小规模算例,MDRL-DE方法则在求解大规模和超大规模算例时相比其他算法优势明显。MDRL在非劣解集的收敛性和均匀性方面明显优于其他对比算法,为消防设施布局规划提供了一种有竞争力的解决方案。展开更多
基金the Project of National Natural Science Foundation of China(Grant No.62106283)the Project of National Natural Science Foundation of China(Grant No.72001214)to provide fund for conducting experimentsthe Project of Natural Science Foundation of Shaanxi Province(Grant No.2020JQ-484)。
文摘The scale of ground-to-air confrontation task assignments is large and needs to deal with many concurrent task assignments and random events.Aiming at the problems where existing task assignment methods are applied to ground-to-air confrontation,there is low efficiency in dealing with complex tasks,and there are interactive conflicts in multiagent systems.This study proposes a multiagent architecture based on a one-general agent with multiple narrow agents(OGMN)to reduce task assignment conflicts.Considering the slow speed of traditional dynamic task assignment algorithms,this paper proposes the proximal policy optimization for task assignment of general and narrow agents(PPOTAGNA)algorithm.The algorithm based on the idea of the optimal assignment strategy algorithm and combined with the training framework of deep reinforcement learning(DRL)adds a multihead attention mechanism and a stage reward mechanism to the bilateral band clipping PPO algorithm to solve the problem of low training efficiency.Finally,simulation experiments are carried out in the digital battlefield.The multiagent architecture based on OGMN combined with the PPO-TAGNA algorithm can obtain higher rewards faster and has a higher win ratio.By analyzing agent behavior,the efficiency,superiority and rationality of resource utilization of this method are verified.
文摘城市电网在发生N-1故障后,极可能新增运行风险,导致N-1-1时出现大面积停电事故。为管控城市电网N-1后运行风险,该文提出一种改进双智能体竞争双深度Q网络(dueling double deep Q network,D3QN)的城市电网N-1风险管控转供策略。根据风险管控原则,提出一种无需额外历史数据、考虑备自投装置、单供变电站风险和单供负荷母线风险的N-1场景指标;建立计及动作次序、指标间关系的负荷转供三阶段求解模型。以含预动作-变化探索值选择策略的改进双智能体D3QN方法,将负荷转供分为多个子转供环节学习,使转供思路清晰化,对动作空间进行降维,提高训练寻优效果,得到管控N-1风险的负荷转供策略。通过城市电网多场景算例分析,验证该文模型和方法的有效性。
文摘现有研究在多QoS(quality of service)调度问题中,由于仅依赖即时奖励反馈机制,在资源受限的场景下处理时延敏感数据和具有连续传输需求的媒体数据时,存在可扩展性差和资源浪费的问题。为此,提出了一种基于奖励回溯的DQN(reward backtracking based deep Q-network,RB-DQN)算法。该算法通过未来时刻的交互来回溯调整当前状态的策略评估,以更加有效地识别并解决因不合理调度策略导致的丢包。同时,设计了一种时延-吞吐均衡度量(latency throughput trade-off,LTT)指标,该指标综合考虑了时延敏感数据和媒体类型数据的业务需求,并可通过权重调整来突出不同的侧重点。大量仿真结果表明,与其他调度策略相比,所提算法能够有效降低时延敏感数据的延迟和抖动,同时确保媒体类型数据的流畅性与稳定性。
文摘针对消防设施选址问题,构建考虑时效性、市民等待救援的焦急心理和建设成本的三目标消防设施选址模型,以实现更科学的消防设施布局。鉴于该问题的NP难特性,提出基于算子学习的多目标深度强化学习模型(multi-objective deep reinforcement learning,MDRL)。设计多种优化算子作为强化学习的动作空间,训练策略网络以选择最佳优化算子来改进解决方案。针对多目标问题,设计基于优势差异的方法(MDRL-AD)和基于支配性评估的方法(MDRL-DE)。采用四种规模的测试算例及实际案例进行数值实验,将MDRL和改进的NSGA-Ⅱ、MOPSO、L2I算法进行比较,并利用Hypervolume指标、Spacing指标、Ω指标、IGD指标对算法性能进行评估。实验结果表明,MDRL-AD方法更适用于求解小规模算例,MDRL-DE方法则在求解大规模和超大规模算例时相比其他算法优势明显。MDRL在非劣解集的收敛性和均匀性方面明显优于其他对比算法,为消防设施布局规划提供了一种有竞争力的解决方案。