期刊文献+
共找到177篇文章
< 1 2 9 >
每页显示 20 50 100
Tactical reward shaping for large-scale combat by multi-agent reinforcement learning
1
作者 DUO Nanxun WANG Qinzhao +1 位作者 LYU Qiang WANG Wei 《Journal of Systems Engineering and Electronics》 CSCD 2024年第6期1516-1529,共14页
Future unmanned battles desperately require intelli-gent combat policies,and multi-agent reinforcement learning offers a promising solution.However,due to the complexity of combat operations and large size of the comb... Future unmanned battles desperately require intelli-gent combat policies,and multi-agent reinforcement learning offers a promising solution.However,due to the complexity of combat operations and large size of the combat group,this task suffers from credit assignment problem more than other rein-forcement learning tasks.This study uses reward shaping to relieve the credit assignment problem and improve policy train-ing for the new generation of large-scale unmanned combat operations.We first prove that multiple reward shaping func-tions would not change the Nash Equilibrium in stochastic games,providing theoretical support for their use.According to the characteristics of combat operations,we propose tactical reward shaping(TRS)that comprises maneuver shaping advice and threat assessment-based attack shaping advice.Then,we investigate the effects of different types and combinations of shaping advice on combat policies through experiments.The results show that TRS improves both the efficiency and attack accuracy of combat policies,with the combination of maneuver reward shaping advice and ally-focused attack shaping advice achieving the best performance compared with that of the base-line strategy. 展开更多
关键词 deep reinforcement learning multi-agent reinforce-ment learning multi-agent combat unmanned battle reward shaping
在线阅读 下载PDF
A single-task and multi-decision evolutionary game model based on multi-agent reinforcement learning 被引量:3
2
作者 MA Ye CHANG Tianqing FAN Wenhui 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2021年第3期642-657,共16页
In the evolutionary game of the same task for groups,the changes in game rules,personal interests,the crowd size,and external supervision cause uncertain effects on individual decision-making and game results.In the M... In the evolutionary game of the same task for groups,the changes in game rules,personal interests,the crowd size,and external supervision cause uncertain effects on individual decision-making and game results.In the Markov decision framework,a single-task multi-decision evolutionary game model based on multi-agent reinforcement learning is proposed to explore the evolutionary rules in the process of a game.The model can improve the result of a evolutionary game and facilitate the completion of the task.First,based on the multi-agent theory,to solve the existing problems in the original model,a negative feedback tax penalty mechanism is proposed to guide the strategy selection of individuals in the group.In addition,in order to evaluate the evolutionary game results of the group in the model,a calculation method of the group intelligence level is defined.Secondly,the Q-learning algorithm is used to improve the guiding effect of the negative feedback tax penalty mechanism.In the model,the selection strategy of the Q-learning algorithm is improved and a bounded rationality evolutionary game strategy is proposed based on the rule of evolutionary games and the consideration of the bounded rationality of individuals.Finally,simulation results show that the proposed model can effectively guide individuals to choose cooperation strategies which are beneficial to task completion and stability under different negative feedback factor values and different group sizes,so as to improve the group intelligence level. 展开更多
关键词 multi-agent reinforcement learning evolutionary game Q-learning
在线阅读 下载PDF
Knowledge transfer in multi-agent reinforcement learning with incremental number of agents 被引量:4
3
作者 LIU Wenzhang DONG Lu +1 位作者 LIU Jian SUN Changyin 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2022年第2期447-460,共14页
In this paper, the reinforcement learning method for cooperative multi-agent systems(MAS) with incremental number of agents is studied. The existing multi-agent reinforcement learning approaches deal with the MAS with... In this paper, the reinforcement learning method for cooperative multi-agent systems(MAS) with incremental number of agents is studied. The existing multi-agent reinforcement learning approaches deal with the MAS with a specific number of agents, and can learn well-performed policies. However, if there is an increasing number of agents, the previously learned in may not perform well in the current scenario. The new agents need to learn from scratch to find optimal policies with others,which may slow down the learning speed of the whole team. To solve that problem, in this paper, we propose a new algorithm to take full advantage of the historical knowledge which was learned before, and transfer it from the previous agents to the new agents. Since the previous agents have been trained well in the source environment, they are treated as teacher agents in the target environment. Correspondingly, the new agents are called student agents. To enable the student agents to learn from the teacher agents, we first modify the input nodes of the networks for teacher agents to adapt to the current environment. Then, the teacher agents take the observations of the student agents as input, and output the advised actions and values as supervising information. Finally, the student agents combine the reward from the environment and the supervising information from the teacher agents, and learn the optimal policies with modified loss functions. By taking full advantage of the knowledge of teacher agents, the search space for the student agents will be reduced significantly, which can accelerate the learning speed of the holistic system. The proposed algorithm is verified in some multi-agent simulation environments, and its efficiency has been demonstrated by the experiment results. 展开更多
关键词 knowledge transfer multi-agent reinforcement learning(MARL) new agents
在线阅读 下载PDF
Cooperative multi-target hunting by unmanned surface vehicles based on multi-agent reinforcement learning 被引量:2
4
作者 Jiawei Xia Yasong Luo +3 位作者 Zhikun Liu Yalun Zhang Haoran Shi Zhong Liu 《Defence Technology(防务技术)》 SCIE EI CAS CSCD 2023年第11期80-94,共15页
To solve the problem of multi-target hunting by an unmanned surface vehicle(USV)fleet,a hunting algorithm based on multi-agent reinforcement learning is proposed.Firstly,the hunting environment and kinematic model wit... To solve the problem of multi-target hunting by an unmanned surface vehicle(USV)fleet,a hunting algorithm based on multi-agent reinforcement learning is proposed.Firstly,the hunting environment and kinematic model without boundary constraints are built,and the criteria for successful target capture are given.Then,the cooperative hunting problem of a USV fleet is modeled as a decentralized partially observable Markov decision process(Dec-POMDP),and a distributed partially observable multitarget hunting Proximal Policy Optimization(DPOMH-PPO)algorithm applicable to USVs is proposed.In addition,an observation model,a reward function and the action space applicable to multi-target hunting tasks are designed.To deal with the dynamic change of observational feature dimension input by partially observable systems,a feature embedding block is proposed.By combining the two feature compression methods of column-wise max pooling(CMP)and column-wise average-pooling(CAP),observational feature encoding is established.Finally,the centralized training and decentralized execution framework is adopted to complete the training of hunting strategy.Each USV in the fleet shares the same policy and perform actions independently.Simulation experiments have verified the effectiveness of the DPOMH-PPO algorithm in the test scenarios with different numbers of USVs.Moreover,the advantages of the proposed model are comprehensively analyzed from the aspects of algorithm performance,migration effect in task scenarios and self-organization capability after being damaged,the potential deployment and application of DPOMH-PPO in the real environment is verified. 展开更多
关键词 Unmanned surface vehicles multi-agent deep reinforcement learning Cooperative hunting Feature embedding Proximal policy optimization
在线阅读 下载PDF
Multi-agent reinforcement learning based on policies of global objective
5
作者 张化祥 黄上腾 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2005年第3期676-681,共6页
In general-sum games, taking all agent's collective rationality into account, we define agents' global objective, and propose a novel multi-agent reinforcement learning(RL) algorithm based on global policy. In eac... In general-sum games, taking all agent's collective rationality into account, we define agents' global objective, and propose a novel multi-agent reinforcement learning(RL) algorithm based on global policy. In each learning step, all agents commit to select the global policy to achieve the global goal. We prove this learning algorithm converges given certain restrictions on stage games of learned Q values, and show that it has quite lower computation time complexity than already developed multi-agent learning algorithms for general-sum games. An example is analyzed to show the algorithm' s merits. 展开更多
关键词 Markov games reinforcement learning collective rationality policy.
在线阅读 下载PDF
UAV Frequency-based Crowdsensing Using Grouping Multi-agent Deep Reinforcement Learning
6
作者 Cui ZHANG En WANG +2 位作者 Funing YANG Yong jian YANG Nan JIANG 《计算机科学》 CSCD 北大核心 2023年第2期57-68,共12页
Mobile CrowdSensing(MCS)is a promising sensing paradigm that recruits users to cooperatively perform sensing tasks.Recently,unmanned aerial vehicles(UAVs)as the powerful sensing devices are used to replace user partic... Mobile CrowdSensing(MCS)is a promising sensing paradigm that recruits users to cooperatively perform sensing tasks.Recently,unmanned aerial vehicles(UAVs)as the powerful sensing devices are used to replace user participation and carry out some special tasks,such as epidemic monitoring and earthquakes rescue.In this paper,we focus on scheduling UAVs to sense the task Point-of-Interests(PoIs)with different frequency coverage requirements.To accomplish the sensing task,the scheduling strategy needs to consider the coverage requirement,geographic fairness and energy charging simultaneously.We consider the complex interaction among UAVs and propose a grouping multi-agent deep reinforcement learning approach(G-MADDPG)to schedule UAVs distributively.G-MADDPG groups all UAVs into some teams by a distance-based clustering algorithm(DCA),then it regards each team as an agent.In this way,G-MADDPG solves the problem that the training time of traditional MADDPG is too long to converge when the number of UAVs is large,and the trade-off between training time and result accuracy could be controlled flexibly by adjusting the number of teams.Extensive simulation results show that our scheduling strategy has better performance compared with three baselines and is flexible in balancing training time and result accuracy. 展开更多
关键词 UAV Crowdsensing Frequency coverage Grouping multi-agent deep reinforcement learning
在线阅读 下载PDF
Computational intelligence interception guidance law using online off-policy integral reinforcement learning
7
作者 WANG Qi LIAO Zhizhong 《Journal of Systems Engineering and Electronics》 SCIE CSCD 2024年第4期1042-1052,共11页
Missile interception problem can be regarded as a two-person zero-sum differential games problem,which depends on the solution of Hamilton-Jacobi-Isaacs(HJI)equa-tion.It has been proved impossible to obtain a closed-f... Missile interception problem can be regarded as a two-person zero-sum differential games problem,which depends on the solution of Hamilton-Jacobi-Isaacs(HJI)equa-tion.It has been proved impossible to obtain a closed-form solu-tion due to the nonlinearity of HJI equation,and many iterative algorithms are proposed to solve the HJI equation.Simultane-ous policy updating algorithm(SPUA)is an effective algorithm for solving HJI equation,but it is an on-policy integral reinforce-ment learning(IRL).For online implementation of SPUA,the dis-turbance signals need to be adjustable,which is unrealistic.In this paper,an off-policy IRL algorithm based on SPUA is pro-posed without making use of any knowledge of the systems dynamics.Then,a neural-network based online adaptive critic implementation scheme of the off-policy IRL algorithm is pre-sented.Based on the online off-policy IRL method,a computa-tional intelligence interception guidance(CIIG)law is developed for intercepting high-maneuvering target.As a model-free method,intercepting targets can be achieved through measur-ing system data online.The effectiveness of the CIIG is verified through two missile and target engagement scenarios. 展开更多
关键词 two-person zero-sum differential games Hamilton–Jacobi–Isaacs(HJI)equation off-policy integral reinforcement learning(IRL) online learning computational intelligence inter-ception guidance(CIIG)law
在线阅读 下载PDF
Recognition and interfere deceptive behavior based on inverse reinforcement learning and game theory 被引量:2
8
作者 ZENG Yunxiu XU Kai 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2023年第2期270-288,共19页
In real-time strategy(RTS)games,the ability of recognizing other players’goals is important for creating artifical intelligence(AI)players.However,most current goal recognition methods do not take the player’s decep... In real-time strategy(RTS)games,the ability of recognizing other players’goals is important for creating artifical intelligence(AI)players.However,most current goal recognition methods do not take the player’s deceptive behavior into account which often occurs in RTS game scenarios,resulting in poor recognition results.In order to solve this problem,this paper proposes goal recognition for deceptive agent,which is an extended goal recognition method applying the deductive reason method(from general to special)to model the deceptive agent’s behavioral strategy.First of all,the general deceptive behavior model is proposed to abstract features of deception,and then these features are applied to construct a behavior strategy that best matches the deceiver’s historical behavior data by the inverse reinforcement learning(IRL)method.Final,to interfere with the deceptive behavior implementation,we construct a game model to describe the confrontation scenario and the most effective interference measures. 展开更多
关键词 deceptive path planning inverse reinforcement learning(IRL) game theory goal recognition
在线阅读 下载PDF
Multi-agent system application in accordance with game theory in bi-directional coordination network model 被引量:3
9
作者 ZHANG Jie WANG Gang +3 位作者 YUE Shaohua SONG Yafei LIU Jiayi YAO Xiaoqiang 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2020年第2期279-289,共11页
The multi-agent system is the optimal solution to complex intelligent problems. In accordance with the game theory, the concept of loyalty is introduced to analyze the relationship between agents' individual incom... The multi-agent system is the optimal solution to complex intelligent problems. In accordance with the game theory, the concept of loyalty is introduced to analyze the relationship between agents' individual income and global benefits and build the logical architecture of the multi-agent system. Besides, to verify the feasibility of the method, the cyclic neural network is optimized, the bi-directional coordination network is built as the training network for deep learning, and specific training scenes are simulated as the training background. After a certain number of training iterations, the model can learn simple strategies autonomously. Also,as the training time increases, the complexity of learning strategies rises gradually. Strategies such as obstacle avoidance, firepower distribution and collaborative cover are adopted to demonstrate the achievability of the model. The model is verified to be realizable by the examples of obstacle avoidance, fire distribution and cooperative cover. Under the same resource background, the model exhibits better convergence than other deep learning training networks, and it is not easy to fall into the local endless loop.Furthermore, the ability of the learning strategy is stronger than that of the training model based on rules, which is of great practical values. 展开更多
关键词 LOYALTY game theory bi-directional COORDINATION network multi-agent system learning STRATEGY
在线阅读 下载PDF
Targeted multi-agent communication algorithm based on state control
10
作者 Li-yang Zhao Tian-qing Chang +3 位作者 Lei Zhang Jie Zhang Kai-xuan Chu De-peng Kong 《Defence Technology(防务技术)》 SCIE EI CAS CSCD 2024年第1期544-556,共13页
As an important mechanism in multi-agent interaction,communication can make agents form complex team relationships rather than constitute a simple set of multiple independent agents.However,the existing communication ... As an important mechanism in multi-agent interaction,communication can make agents form complex team relationships rather than constitute a simple set of multiple independent agents.However,the existing communication schemes can bring much timing redundancy and irrelevant messages,which seriously affects their practical application.To solve this problem,this paper proposes a targeted multiagent communication algorithm based on state control(SCTC).The SCTC uses a gating mechanism based on state control to reduce the timing redundancy of communication between agents and determines the interaction relationship between agents and the importance weight of a communication message through a series connection of hard-and self-attention mechanisms,realizing targeted communication message processing.In addition,by minimizing the difference between the fusion message generated from a real communication message of each agent and a fusion message generated from the buffered message,the correctness of the final action choice of the agent is ensured.Our evaluation using a challenging set of Star Craft II benchmarks indicates that the SCTC can significantly improve the learning performance and reduce the communication overhead between agents,thus ensuring better cooperation between agents. 展开更多
关键词 multi-agent deep reinforcement learning State control Targeted interaction Communication mechanism
在线阅读 下载PDF
结合行为树与Q-learning优化UT2004中agent行为决策 被引量:6
11
作者 刘晓伟 高春鸣 《计算机工程与应用》 CSCD 北大核心 2016年第3期113-118,共6页
针对FPS游戏UT2004中的NPC(Non-Player-Character,即非玩家角色)的行为决策不够灵活多变,不够智能等问题,结合行为树与Q-learning强化学习算法,提出了一种预处理与在线学习结合的方式优化NPC行为决策的方法。通过在行为树上的强化学习,... 针对FPS游戏UT2004中的NPC(Non-Player-Character,即非玩家角色)的行为决策不够灵活多变,不够智能等问题,结合行为树与Q-learning强化学习算法,提出了一种预处理与在线学习结合的方式优化NPC行为决策的方法。通过在行为树上的强化学习,NPC行为决策更为灵活、智能,即human-like。实验结果表明了该方法的有效性与可行性。 展开更多
关键词 行为决策 游戏人工智能(AI) Q学习 强化学习 行为树
在线阅读 下载PDF
基于深度强化学习的游戏智能引导算法 被引量:2
12
作者 白天 吕璐瑶 +1 位作者 李储 何加亮 《吉林大学学报(理学版)》 北大核心 2025年第1期91-98,共8页
针对传统游戏智能体算法存在模型输入维度大及训练时间长的问题,提出一种结合状态信息转换与奖励函数塑形技术的新型深度强化学习游戏智能引导算法.首先,利用Unity引擎提供的接口直接读取游戏后台信息,以有效压缩状态空间的维度,减少输... 针对传统游戏智能体算法存在模型输入维度大及训练时间长的问题,提出一种结合状态信息转换与奖励函数塑形技术的新型深度强化学习游戏智能引导算法.首先,利用Unity引擎提供的接口直接读取游戏后台信息,以有效压缩状态空间的维度,减少输入数据量;其次,通过精细化设计奖励机制,加速模型的收敛过程;最后,从主观定性和客观定量两方面对该算法模型与现有方法进行对比实验,实验结果表明,该算法不仅显著提高了模型的训练效率,还大幅度提高了智能体的性能. 展开更多
关键词 深度强化学习 游戏智能体 奖励函数塑形 近端策略优化算法
在线阅读 下载PDF
多智能体强化学习控制与决策研究综述 被引量:5
13
作者 罗彪 胡天萌 +3 位作者 周育豪 黄廷文 阳春华 桂卫华 《自动化学报》 北大核心 2025年第3期510-539,共30页
强化学习作为一类重要的人工智能方法,广泛应用于解决复杂的控制和决策问题,其在众多领域的应用已展示出巨大潜力.近年来,强化学习从单智能体决策逐渐扩展到多智能体协作与博弈,形成多智能体强化学习这一研究热点.多智能体系统由多个具... 强化学习作为一类重要的人工智能方法,广泛应用于解决复杂的控制和决策问题,其在众多领域的应用已展示出巨大潜力.近年来,强化学习从单智能体决策逐渐扩展到多智能体协作与博弈,形成多智能体强化学习这一研究热点.多智能体系统由多个具有自主感知和决策能力的实体组成,有望解决传统单智能体方法难以应对的大规模复杂问题.多智能体强化学习不仅需要考虑环境的动态性,还需要应对其他智能体策略的不确定性,从而增加学习和决策过程的复杂度.为此,梳理多智能体强化学习在控制与决策领域的研究,分析其面临的主要问题与挑战,从控制理论与自主决策两个层次综述现有的研究成果与进展,并对未来的研究方向进行展望.通过分析,期望为未来多智能体强化学习的研究提供有价值的参考和启示. 展开更多
关键词 强化学习 多智能体系统 序列决策 协同控制 博弈论
在线阅读 下载PDF
基于多智能体强化学习的博弈综述 被引量:1
14
作者 李艺春 刘泽娇 +4 位作者 洪艺天 王继超 王健瑞 李毅 唐漾 《自动化学报》 北大核心 2025年第3期540-558,共19页
多智能体强化学习(Multi-agent reinforcement learning,MARL)作为博弈论、控制论和多智能体学习的交叉研究领域,是多智能体系统(Multi-agent systems,MASs)研究中的前沿方向,赋予智能体在动态多维的复杂环境中通过交互和决策完成多样... 多智能体强化学习(Multi-agent reinforcement learning,MARL)作为博弈论、控制论和多智能体学习的交叉研究领域,是多智能体系统(Multi-agent systems,MASs)研究中的前沿方向,赋予智能体在动态多维的复杂环境中通过交互和决策完成多样化任务的能力.多智能体强化学习正在向应用对象开放化、应用问题具身化、应用场景复杂化的方向发展,并逐渐成为解决现实世界中博弈决策问题的最有效工具.本文对基于多智能体强化学习的博弈进行系统性综述.首先,介绍多智能体强化学习的基本理论,梳理多智能体强化学习算法与基线测试环境的发展进程.其次,针对合作、对抗以及混合三种多智能体强化学习任务,从提高智能体合作效率、提升智能体对抗能力的维度来介绍多智能体强化学习的最新进展,并结合实际应用探讨混合博弈的前沿研究方向.最后,对多智能体强化学习的应用前景和发展趋势进行总结与展望. 展开更多
关键词 多智能体强化学习 多智能体系统 博弈决策 均衡求解
在线阅读 下载PDF
基于多智能体Actor-double-critic深度强化学习的源-网-荷-储实时优化调度方法 被引量:1
15
作者 徐业琰 姚良忠 +4 位作者 廖思阳 程帆 徐箭 蒲天骄 王新迎 《中国电机工程学报》 北大核心 2025年第2期513-526,I0010,共15页
为保证新型电力系统的安全高效运行,针对模型驱动调度方法存在的调度优化模型求解困难、实时决策求解速度慢等问题,该文提出一种基于多智能体Actor-double-critic深度强化学习的源-网-荷-储实时优化调度方法。通过构建考虑调节资源运行... 为保证新型电力系统的安全高效运行,针对模型驱动调度方法存在的调度优化模型求解困难、实时决策求解速度慢等问题,该文提出一种基于多智能体Actor-double-critic深度强化学习的源-网-荷-储实时优化调度方法。通过构建考虑调节资源运行约束和系统安全约束的实时优化调度模型和引入Vickey-Clark-Groves拍卖机制,设计带约束马尔科夫合作博弈模型,将集中调度模型转换为多智能体间的分布式优化问题进行求解。然后,提出多智能体Actor-double-critic算法,分别采用Self-critic和Cons-critic网络评估智能体的动作-价值和动作-成本,降低训练难度、避免即时奖励和安全约束成本稀疏性的影响,提高多智能体训练收敛速度,保证实时调度决策满足系统安全运行约束。最后,通过仿真算例验证所提方法可大幅缩短实时调度决策时间,实现保证系统运行安全可靠性和经济性的源-网-荷-储实时调度。 展开更多
关键词 源-网-荷-储 实时调度 带约束马尔科夫合作博弈 多智能体深度强化学习
在线阅读 下载PDF
混合博弈问题的求解与应用综述 被引量:1
16
作者 董绍康 李超 +7 位作者 杨光 葛振兴 曹宏业 陈武兵 杨尚东 陈兴国 李文斌 高阳 《软件学报》 北大核心 2025年第1期107-151,共45页
近年来,随着人工智能技术在序贯决策和博弈对抗等问题的应用方面取得了飞速发展,围棋、游戏、德扑和麻将等领域取得了巨大的进步,例如,AlphaGo、OpenAI Five、AlphaStar、DeepStack、Libratus、Pluribus和Suphx等系统都在这些领域中达... 近年来,随着人工智能技术在序贯决策和博弈对抗等问题的应用方面取得了飞速发展,围棋、游戏、德扑和麻将等领域取得了巨大的进步,例如,AlphaGo、OpenAI Five、AlphaStar、DeepStack、Libratus、Pluribus和Suphx等系统都在这些领域中达到或超过人类专家水平.这些应用集中在双人、两队或者多人的零和博弈问题中,而对于混合博弈问题的研究缺乏实质性的进展与突破.区别于零和博弈,混合博弈需要综合考虑个体收益、集体收益和均衡收益等诸多目标,被广泛应用于公共资源分配、任务调度和自动驾驶等现实场景.因此,对于混合博弈问题的研究至关重要.通过梳理当前混合博弈领域中的重要概念和相关工作,深入分析国内外研究现状和未来发展方向.具体地,首先介绍混合博弈问题的定义与分类;其次详细阐述博弈解概念和求解目标,包含纳什均衡、相关均衡、帕累托最优等解概念,最大化个体收益、最大化集体收益以及兼顾公平等求解目标;接下来根据不同的求解目标,分别对博弈论方法、强化学习方法以及这两种方法的结合进行详细探讨和分析;最后介绍相关的应用场景和实验仿真环境,并对未来研究的方向进行总结与展望. 展开更多
关键词 混合博弈 博弈论 强化学习
在线阅读 下载PDF
基于强化学习的负荷聚合商电价激励响应调频中的博弈与策略分析
17
作者 吴静 程文娟 +2 位作者 梁肖 王正风 唐昊 《科学技术与工程》 北大核心 2025年第3期1087-1092,共6页
为了解决分布式负荷响应调频指令中的效率问题,提出了一种基于强化学习的负荷聚合商电价激励响应调频创新策略,在该策略中,构建聚合商和负荷集群博弈模型,聚合商根据调频指令和激励电价策略调整激励电价,而负荷根据自身用电成本调节用... 为了解决分布式负荷响应调频指令中的效率问题,提出了一种基于强化学习的负荷聚合商电价激励响应调频创新策略,在该策略中,构建聚合商和负荷集群博弈模型,聚合商根据调频指令和激励电价策略调整激励电价,而负荷根据自身用电成本调节用电功率,灵活地响应调频指令,采用多智能体软演员批评家(multi-agent soft actor-critic,MASAC)算法求解。结果表明电价激励方法可以使得负荷有效响应调频指令,通过MASAC算法不仅可以优化决策过程,还能有效降低运算复杂性,实现高效的动态调节。可见,该方法为电力系统的频率调节提供了一种有效的解决方案,具有重要的理论意义和实际应用价值。 展开更多
关键词 负荷聚合商 博弈 调频 强化学习 电价激励
在线阅读 下载PDF
基于联邦强化学习的主动配电网多主体博弈协同优化策略
18
作者 杨文伟 彭显刚 +3 位作者 全欢 褚卓卓 王星华 赵卓立 《电力系统自动化》 北大核心 2025年第13期70-82,共13页
针对主动配电网多主体协同优化调度面临的隐私保护和信任不足问题,提出了基于多主体博弈和联邦强化学习算法的日前-日内协同优化调度策略。首先,建立含分布式电源运营商、主动配电网运营商、储能运营商等不同主体的协同优化架构。在该... 针对主动配电网多主体协同优化调度面临的隐私保护和信任不足问题,提出了基于多主体博弈和联邦强化学习算法的日前-日内协同优化调度策略。首先,建立含分布式电源运营商、主动配电网运营商、储能运营商等不同主体的协同优化架构。在该架构下,提出了以最大化综合收益和最小化调整量为目标的多主体日前-日内优化调度模型。然后,在日前阶段采用考虑有限理性的信任演化博弈方法得到日内调度计划,在日内阶段则采用联邦自然策略梯度算法进行滚动校正,在满足运行约束的同时,避免了调度过程中产生的隐私信息泄露问题。最后,通过仿真分析验证了所提模型的经济性和算法的有效性。 展开更多
关键词 主动配电网 多主体博弈 优化调度 联邦强化学习 隐私保护
在线阅读 下载PDF
多无人机分布式感知任务分配-通信基站关联与飞行策略联合优化设计
19
作者 何江 喻莞芯 +1 位作者 黄浩 蒋卫恒 《电子与信息学报》 北大核心 2025年第5期1402-1417,共16页
针对多无人机(UAV)分布式感知开展研究,为协调各UAV行为,该文设计了任务感知-数据回传协议,并建立了UAV任务分配、数据回传基站关联与飞行策略联合优化混合整数非线性规划问题模型。鉴于该问题数学结构的复杂性,以及集中式优化算法设计... 针对多无人机(UAV)分布式感知开展研究,为协调各UAV行为,该文设计了任务感知-数据回传协议,并建立了UAV任务分配、数据回传基站关联与飞行策略联合优化混合整数非线性规划问题模型。鉴于该问题数学结构的复杂性,以及集中式优化算法设计面临计算复杂度高且信息交互开销大等不足,提出将该问题转化为协作式马尔可夫博弈(MG),定义了基于成本-效用复合的收益函数。考虑到MG问题连续-离散动作空间复杂耦合特点,设计了基于独立学习者(IL)的复合动作表演评论家(MA-IL-CA2C)的MG问题求解算法。仿真分析结果表明,相对于基线算法,所提算法能显著提高系统收益并降低网络能耗。 展开更多
关键词 无人机 分布式感知 联合优化 强化学习 马尔可夫博弈
在线阅读 下载PDF
基于强化学习的多阶段资源分配对策模型
20
作者 张骁雄 丁松 +2 位作者 彭锐 伍国华 刘忠 《同济大学学报(自然科学版)》 北大核心 2025年第6期985-992,共8页
针对资源受限下的攻防博弈资源分配问题,提出一种基于强化学习的多阶段攻防资源分配对策模型。防守者考虑如何在多阶段攻防中有效分配资源部署伪装目标以及加强真实目标防护,而多个进攻者考虑如何合作在多阶段攻防中有效分配资源识别伪... 针对资源受限下的攻防博弈资源分配问题,提出一种基于强化学习的多阶段攻防资源分配对策模型。防守者考虑如何在多阶段攻防中有效分配资源部署伪装目标以及加强真实目标防护,而多个进攻者考虑如何合作在多阶段攻防中有效分配资源识别伪装目标以及攻击真实目标。在各阶段以真实目标发挥期望效益为奖励准则,设计基于强化学习Q-learning算法的资源分配模型,生成整个周期内的攻防双方最优资源分配策略。示例研究验证了所提模型算法的有效性,能为多阶段攻防博弈资源分配提供辅助决策。 展开更多
关键词 资源分配 攻防博弈 伪装目标 强化学习 Q-learning
在线阅读 下载PDF
上一页 1 2 9 下一页 到第
使用帮助 返回顶部