期刊文献+
共找到182篇文章
< 1 2 10 >
每页显示 20 50 100
Tactical reward shaping for large-scale combat by multi-agent reinforcement learning
1
作者 DUO Nanxun WANG Qinzhao +1 位作者 LYU Qiang WANG Wei 《Journal of Systems Engineering and Electronics》 CSCD 2024年第6期1516-1529,共14页
Future unmanned battles desperately require intelli-gent combat policies,and multi-agent reinforcement learning offers a promising solution.However,due to the complexity of combat operations and large size of the comb... Future unmanned battles desperately require intelli-gent combat policies,and multi-agent reinforcement learning offers a promising solution.However,due to the complexity of combat operations and large size of the combat group,this task suffers from credit assignment problem more than other rein-forcement learning tasks.This study uses reward shaping to relieve the credit assignment problem and improve policy train-ing for the new generation of large-scale unmanned combat operations.We first prove that multiple reward shaping func-tions would not change the Nash Equilibrium in stochastic games,providing theoretical support for their use.According to the characteristics of combat operations,we propose tactical reward shaping(TRS)that comprises maneuver shaping advice and threat assessment-based attack shaping advice.Then,we investigate the effects of different types and combinations of shaping advice on combat policies through experiments.The results show that TRS improves both the efficiency and attack accuracy of combat policies,with the combination of maneuver reward shaping advice and ally-focused attack shaping advice achieving the best performance compared with that of the base-line strategy. 展开更多
关键词 deep reinforcement learning multi-agent reinforce-ment learning multi-agent combat unmanned battle reward shaping
在线阅读 下载PDF
A single-task and multi-decision evolutionary game model based on multi-agent reinforcement learning 被引量:3
2
作者 MA Ye CHANG Tianqing FAN Wenhui 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2021年第3期642-657,共16页
In the evolutionary game of the same task for groups,the changes in game rules,personal interests,the crowd size,and external supervision cause uncertain effects on individual decision-making and game results.In the M... In the evolutionary game of the same task for groups,the changes in game rules,personal interests,the crowd size,and external supervision cause uncertain effects on individual decision-making and game results.In the Markov decision framework,a single-task multi-decision evolutionary game model based on multi-agent reinforcement learning is proposed to explore the evolutionary rules in the process of a game.The model can improve the result of a evolutionary game and facilitate the completion of the task.First,based on the multi-agent theory,to solve the existing problems in the original model,a negative feedback tax penalty mechanism is proposed to guide the strategy selection of individuals in the group.In addition,in order to evaluate the evolutionary game results of the group in the model,a calculation method of the group intelligence level is defined.Secondly,the Q-learning algorithm is used to improve the guiding effect of the negative feedback tax penalty mechanism.In the model,the selection strategy of the Q-learning algorithm is improved and a bounded rationality evolutionary game strategy is proposed based on the rule of evolutionary games and the consideration of the bounded rationality of individuals.Finally,simulation results show that the proposed model can effectively guide individuals to choose cooperation strategies which are beneficial to task completion and stability under different negative feedback factor values and different group sizes,so as to improve the group intelligence level. 展开更多
关键词 multi-agent reinforcement learning evolutionary game Q-learning
在线阅读 下载PDF
Knowledge transfer in multi-agent reinforcement learning with incremental number of agents 被引量:4
3
作者 LIU Wenzhang DONG Lu +1 位作者 LIU Jian SUN Changyin 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2022年第2期447-460,共14页
In this paper, the reinforcement learning method for cooperative multi-agent systems(MAS) with incremental number of agents is studied. The existing multi-agent reinforcement learning approaches deal with the MAS with... In this paper, the reinforcement learning method for cooperative multi-agent systems(MAS) with incremental number of agents is studied. The existing multi-agent reinforcement learning approaches deal with the MAS with a specific number of agents, and can learn well-performed policies. However, if there is an increasing number of agents, the previously learned in may not perform well in the current scenario. The new agents need to learn from scratch to find optimal policies with others,which may slow down the learning speed of the whole team. To solve that problem, in this paper, we propose a new algorithm to take full advantage of the historical knowledge which was learned before, and transfer it from the previous agents to the new agents. Since the previous agents have been trained well in the source environment, they are treated as teacher agents in the target environment. Correspondingly, the new agents are called student agents. To enable the student agents to learn from the teacher agents, we first modify the input nodes of the networks for teacher agents to adapt to the current environment. Then, the teacher agents take the observations of the student agents as input, and output the advised actions and values as supervising information. Finally, the student agents combine the reward from the environment and the supervising information from the teacher agents, and learn the optimal policies with modified loss functions. By taking full advantage of the knowledge of teacher agents, the search space for the student agents will be reduced significantly, which can accelerate the learning speed of the holistic system. The proposed algorithm is verified in some multi-agent simulation environments, and its efficiency has been demonstrated by the experiment results. 展开更多
关键词 knowledge transfer multi-agent reinforcement learning(MARL) new agents
在线阅读 下载PDF
Cooperative multi-target hunting by unmanned surface vehicles based on multi-agent reinforcement learning 被引量:2
4
作者 Jiawei Xia Yasong Luo +3 位作者 Zhikun Liu Yalun Zhang Haoran Shi Zhong Liu 《Defence Technology(防务技术)》 SCIE EI CAS CSCD 2023年第11期80-94,共15页
To solve the problem of multi-target hunting by an unmanned surface vehicle(USV)fleet,a hunting algorithm based on multi-agent reinforcement learning is proposed.Firstly,the hunting environment and kinematic model wit... To solve the problem of multi-target hunting by an unmanned surface vehicle(USV)fleet,a hunting algorithm based on multi-agent reinforcement learning is proposed.Firstly,the hunting environment and kinematic model without boundary constraints are built,and the criteria for successful target capture are given.Then,the cooperative hunting problem of a USV fleet is modeled as a decentralized partially observable Markov decision process(Dec-POMDP),and a distributed partially observable multitarget hunting Proximal Policy Optimization(DPOMH-PPO)algorithm applicable to USVs is proposed.In addition,an observation model,a reward function and the action space applicable to multi-target hunting tasks are designed.To deal with the dynamic change of observational feature dimension input by partially observable systems,a feature embedding block is proposed.By combining the two feature compression methods of column-wise max pooling(CMP)and column-wise average-pooling(CAP),observational feature encoding is established.Finally,the centralized training and decentralized execution framework is adopted to complete the training of hunting strategy.Each USV in the fleet shares the same policy and perform actions independently.Simulation experiments have verified the effectiveness of the DPOMH-PPO algorithm in the test scenarios with different numbers of USVs.Moreover,the advantages of the proposed model are comprehensively analyzed from the aspects of algorithm performance,migration effect in task scenarios and self-organization capability after being damaged,the potential deployment and application of DPOMH-PPO in the real environment is verified. 展开更多
关键词 Unmanned surface vehicles multi-agent deep reinforcement learning Cooperative hunting Feature embedding Proximal policy optimization
在线阅读 下载PDF
Multi-agent reinforcement learning based on policies of global objective
5
作者 张化祥 黄上腾 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2005年第3期676-681,共6页
In general-sum games, taking all agent's collective rationality into account, we define agents' global objective, and propose a novel multi-agent reinforcement learning(RL) algorithm based on global policy. In eac... In general-sum games, taking all agent's collective rationality into account, we define agents' global objective, and propose a novel multi-agent reinforcement learning(RL) algorithm based on global policy. In each learning step, all agents commit to select the global policy to achieve the global goal. We prove this learning algorithm converges given certain restrictions on stage games of learned Q values, and show that it has quite lower computation time complexity than already developed multi-agent learning algorithms for general-sum games. An example is analyzed to show the algorithm' s merits. 展开更多
关键词 Markov games reinforcement learning collective rationality policy.
在线阅读 下载PDF
UAV Frequency-based Crowdsensing Using Grouping Multi-agent Deep Reinforcement Learning
6
作者 Cui ZHANG En WANG +2 位作者 Funing YANG Yong jian YANG Nan JIANG 《计算机科学》 CSCD 北大核心 2023年第2期57-68,共12页
Mobile CrowdSensing(MCS)is a promising sensing paradigm that recruits users to cooperatively perform sensing tasks.Recently,unmanned aerial vehicles(UAVs)as the powerful sensing devices are used to replace user partic... Mobile CrowdSensing(MCS)is a promising sensing paradigm that recruits users to cooperatively perform sensing tasks.Recently,unmanned aerial vehicles(UAVs)as the powerful sensing devices are used to replace user participation and carry out some special tasks,such as epidemic monitoring and earthquakes rescue.In this paper,we focus on scheduling UAVs to sense the task Point-of-Interests(PoIs)with different frequency coverage requirements.To accomplish the sensing task,the scheduling strategy needs to consider the coverage requirement,geographic fairness and energy charging simultaneously.We consider the complex interaction among UAVs and propose a grouping multi-agent deep reinforcement learning approach(G-MADDPG)to schedule UAVs distributively.G-MADDPG groups all UAVs into some teams by a distance-based clustering algorithm(DCA),then it regards each team as an agent.In this way,G-MADDPG solves the problem that the training time of traditional MADDPG is too long to converge when the number of UAVs is large,and the trade-off between training time and result accuracy could be controlled flexibly by adjusting the number of teams.Extensive simulation results show that our scheduling strategy has better performance compared with three baselines and is flexible in balancing training time and result accuracy. 展开更多
关键词 UAV Crowdsensing Frequency coverage Grouping multi-agent deep reinforcement learning
在线阅读 下载PDF
Computational intelligence interception guidance law using online off-policy integral reinforcement learning
7
作者 WANG Qi LIAO Zhizhong 《Journal of Systems Engineering and Electronics》 SCIE CSCD 2024年第4期1042-1052,共11页
Missile interception problem can be regarded as a two-person zero-sum differential games problem,which depends on the solution of Hamilton-Jacobi-Isaacs(HJI)equa-tion.It has been proved impossible to obtain a closed-f... Missile interception problem can be regarded as a two-person zero-sum differential games problem,which depends on the solution of Hamilton-Jacobi-Isaacs(HJI)equa-tion.It has been proved impossible to obtain a closed-form solu-tion due to the nonlinearity of HJI equation,and many iterative algorithms are proposed to solve the HJI equation.Simultane-ous policy updating algorithm(SPUA)is an effective algorithm for solving HJI equation,but it is an on-policy integral reinforce-ment learning(IRL).For online implementation of SPUA,the dis-turbance signals need to be adjustable,which is unrealistic.In this paper,an off-policy IRL algorithm based on SPUA is pro-posed without making use of any knowledge of the systems dynamics.Then,a neural-network based online adaptive critic implementation scheme of the off-policy IRL algorithm is pre-sented.Based on the online off-policy IRL method,a computa-tional intelligence interception guidance(CIIG)law is developed for intercepting high-maneuvering target.As a model-free method,intercepting targets can be achieved through measur-ing system data online.The effectiveness of the CIIG is verified through two missile and target engagement scenarios. 展开更多
关键词 two-person zero-sum differential games Hamilton–Jacobi–Isaacs(HJI)equation off-policy integral reinforcement learning(IRL) online learning computational intelligence inter-ception guidance(CIIG)law
在线阅读 下载PDF
Recognition and interfere deceptive behavior based on inverse reinforcement learning and game theory 被引量:2
8
作者 ZENG Yunxiu XU Kai 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2023年第2期270-288,共19页
In real-time strategy(RTS)games,the ability of recognizing other players’goals is important for creating artifical intelligence(AI)players.However,most current goal recognition methods do not take the player’s decep... In real-time strategy(RTS)games,the ability of recognizing other players’goals is important for creating artifical intelligence(AI)players.However,most current goal recognition methods do not take the player’s deceptive behavior into account which often occurs in RTS game scenarios,resulting in poor recognition results.In order to solve this problem,this paper proposes goal recognition for deceptive agent,which is an extended goal recognition method applying the deductive reason method(from general to special)to model the deceptive agent’s behavioral strategy.First of all,the general deceptive behavior model is proposed to abstract features of deception,and then these features are applied to construct a behavior strategy that best matches the deceiver’s historical behavior data by the inverse reinforcement learning(IRL)method.Final,to interfere with the deceptive behavior implementation,we construct a game model to describe the confrontation scenario and the most effective interference measures. 展开更多
关键词 deceptive path planning inverse reinforcement learning(IRL) game theory goal recognition
在线阅读 下载PDF
Targeted multi-agent communication algorithm based on state control
9
作者 Li-yang Zhao Tian-qing Chang +3 位作者 Lei Zhang Jie Zhang Kai-xuan Chu De-peng Kong 《Defence Technology(防务技术)》 SCIE EI CAS CSCD 2024年第1期544-556,共13页
As an important mechanism in multi-agent interaction,communication can make agents form complex team relationships rather than constitute a simple set of multiple independent agents.However,the existing communication ... As an important mechanism in multi-agent interaction,communication can make agents form complex team relationships rather than constitute a simple set of multiple independent agents.However,the existing communication schemes can bring much timing redundancy and irrelevant messages,which seriously affects their practical application.To solve this problem,this paper proposes a targeted multiagent communication algorithm based on state control(SCTC).The SCTC uses a gating mechanism based on state control to reduce the timing redundancy of communication between agents and determines the interaction relationship between agents and the importance weight of a communication message through a series connection of hard-and self-attention mechanisms,realizing targeted communication message processing.In addition,by minimizing the difference between the fusion message generated from a real communication message of each agent and a fusion message generated from the buffered message,the correctness of the final action choice of the agent is ensured.Our evaluation using a challenging set of Star Craft II benchmarks indicates that the SCTC can significantly improve the learning performance and reduce the communication overhead between agents,thus ensuring better cooperation between agents. 展开更多
关键词 multi-agent deep reinforcement learning State control Targeted interaction Communication mechanism
在线阅读 下载PDF
Variable reward function-driven strategies for impulsive orbital attack-defense games under multiple constraints and victory conditions
10
作者 Liran Zhao Sihan Xu +1 位作者 Qinbo Sun Zhaohui Dang 《Defence Technology(防务技术)》 2025年第9期159-183,共25页
This paper investigates impulsive orbital attack-defense(AD)games under multiple constraints and victory conditions,involving three spacecraft:attacker,target,and defender.In the AD scenario,the attacker aims to breac... This paper investigates impulsive orbital attack-defense(AD)games under multiple constraints and victory conditions,involving three spacecraft:attacker,target,and defender.In the AD scenario,the attacker aims to breach the defender's interception to rendezvous with the target,while the defender seeks to protect the target by blocking or actively pursuing the attacker.Four different maneuvering constraints and five potential game outcomes are incorporated to more accurately model AD game problems and increase complexity,thereby reducing the effectiveness of traditional methods such as differential games and game-tree searches.To address these challenges,this study proposes a multiagent deep reinforcement learning solution with variable reward functions.Two attack strategies,Direct attack(DA)and Bypass attack(BA),are developed for the attacker,each focusing on different mission priorities.Similarly,two defense strategies,Direct interdiction(DI)and Collinear interdiction(CI),are designed for the defender,each optimizing specific defensive actions through tailored reward functions.Each reward function incorporates both process rewards(e.g.,distance and angle)and outcome rewards,derived from physical principles and validated via geometric analysis.Extensive simulations of four strategy confrontations demonstrate average defensive success rates of 75%for DI vs.DA,40%for DI vs.BA,80%for CI vs.DA,and 70%for CI vs.BA.Results indicate that CI outperforms DI for defenders,while BA outperforms DA for attackers.Moreover,defenders achieve their objectives more effectively under identical maneuvering capabilities.Trajectory evolution analyses further illustrate the effectiveness of the proposed variable reward function-driven strategies.These strategies and analyses offer valuable guidance for practical orbital defense scenarios and lay a foundation for future multi-agent game research. 展开更多
关键词 Orbital attack-defense game Impulsive maneuver multi-agent deep reinforcement learning Reward function design
在线阅读 下载PDF
Self-play training and analysis for GEO inspection game with modular actions
11
作者 ZHOU Rui ZHONG Weichao +1 位作者 LI Wenlong ZHANG Hao 《Journal of Systems Engineering and Electronics》 2025年第5期1353-1373,共21页
This paper comprehensively explores the impulsive on-orbit inspection game problem utilizing reinforcement learning and game training methods.The purpose of the spacecraft is to inspect the entire surface of a non-coo... This paper comprehensively explores the impulsive on-orbit inspection game problem utilizing reinforcement learning and game training methods.The purpose of the spacecraft is to inspect the entire surface of a non-cooperative target with active maneuverability in front lighting.First,the impulsive orbital game problem is formulated as a turn-based sequential game problem.Second,several typical relative orbit transfers are encapsulated into modules to construct a parameterized action space containing discrete modules and continuous parameters,and multi-pass deep Q-networks(MPDQN)algorithm is used to implement autonomous decision-making.Then,a curriculum learning method is used to gradually increase the difficulty of the training scenario.The backtracking proportional self-play training framework is used to enhance the agent’s ability to defeat inconsistent strategies by building a pool of opponents.The behavior variations of the agents during training indicate that the intelligent game system gradually evolves towards an equilibrium situation.The restraint relations between the agents show that the agents steadily improve the strategy.The influence of various factors on game results is tested. 展开更多
关键词 impulsive orbital game inspection mission turnbased reinforcement learning modular action self-play
在线阅读 下载PDF
Multi-agent system application in accordance with game theory in bi-directional coordination network model 被引量:3
12
作者 ZHANG Jie WANG Gang +3 位作者 YUE Shaohua SONG Yafei LIU Jiayi YAO Xiaoqiang 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2020年第2期279-289,共11页
The multi-agent system is the optimal solution to complex intelligent problems. In accordance with the game theory, the concept of loyalty is introduced to analyze the relationship between agents' individual incom... The multi-agent system is the optimal solution to complex intelligent problems. In accordance with the game theory, the concept of loyalty is introduced to analyze the relationship between agents' individual income and global benefits and build the logical architecture of the multi-agent system. Besides, to verify the feasibility of the method, the cyclic neural network is optimized, the bi-directional coordination network is built as the training network for deep learning, and specific training scenes are simulated as the training background. After a certain number of training iterations, the model can learn simple strategies autonomously. Also,as the training time increases, the complexity of learning strategies rises gradually. Strategies such as obstacle avoidance, firepower distribution and collaborative cover are adopted to demonstrate the achievability of the model. The model is verified to be realizable by the examples of obstacle avoidance, fire distribution and cooperative cover. Under the same resource background, the model exhibits better convergence than other deep learning training networks, and it is not easy to fall into the local endless loop.Furthermore, the ability of the learning strategy is stronger than that of the training model based on rules, which is of great practical values. 展开更多
关键词 LOYALTY game theory bi-directional COORDINATION network multi-agent system learning STRATEGY
在线阅读 下载PDF
结合行为树与Q-learning优化UT2004中agent行为决策 被引量:6
13
作者 刘晓伟 高春鸣 《计算机工程与应用》 CSCD 北大核心 2016年第3期113-118,共6页
针对FPS游戏UT2004中的NPC(Non-Player-Character,即非玩家角色)的行为决策不够灵活多变,不够智能等问题,结合行为树与Q-learning强化学习算法,提出了一种预处理与在线学习结合的方式优化NPC行为决策的方法。通过在行为树上的强化学习,... 针对FPS游戏UT2004中的NPC(Non-Player-Character,即非玩家角色)的行为决策不够灵活多变,不够智能等问题,结合行为树与Q-learning强化学习算法,提出了一种预处理与在线学习结合的方式优化NPC行为决策的方法。通过在行为树上的强化学习,NPC行为决策更为灵活、智能,即human-like。实验结果表明了该方法的有效性与可行性。 展开更多
关键词 行为决策 游戏人工智能(AI) Q学习 强化学习 行为树
在线阅读 下载PDF
基于深度强化学习的游戏智能引导算法 被引量:2
14
作者 白天 吕璐瑶 +1 位作者 李储 何加亮 《吉林大学学报(理学版)》 北大核心 2025年第1期91-98,共8页
针对传统游戏智能体算法存在模型输入维度大及训练时间长的问题,提出一种结合状态信息转换与奖励函数塑形技术的新型深度强化学习游戏智能引导算法.首先,利用Unity引擎提供的接口直接读取游戏后台信息,以有效压缩状态空间的维度,减少输... 针对传统游戏智能体算法存在模型输入维度大及训练时间长的问题,提出一种结合状态信息转换与奖励函数塑形技术的新型深度强化学习游戏智能引导算法.首先,利用Unity引擎提供的接口直接读取游戏后台信息,以有效压缩状态空间的维度,减少输入数据量;其次,通过精细化设计奖励机制,加速模型的收敛过程;最后,从主观定性和客观定量两方面对该算法模型与现有方法进行对比实验,实验结果表明,该算法不仅显著提高了模型的训练效率,还大幅度提高了智能体的性能. 展开更多
关键词 深度强化学习 游戏智能体 奖励函数塑形 近端策略优化算法
在线阅读 下载PDF
基于ME-DDPG算法的无人机多对一追逃博弈
15
作者 张耀中 吴卓然 +3 位作者 张建东 杨啟明 史国庆 徐自祥 《系统工程与电子技术》 北大核心 2025年第10期3288-3299,共12页
针对无人机(unmanned aerial vehicle,UAV)多对一追逃博弈问题,以强化学习的深度确定性策略梯度算法(deep deterministic policy gradient, DDPG)为基础,结合追逃问题的微分博弈对抗数值求解结果,提出一种混合经验的DDPG(mixed experien... 针对无人机(unmanned aerial vehicle,UAV)多对一追逃博弈问题,以强化学习的深度确定性策略梯度算法(deep deterministic policy gradient, DDPG)为基础,结合追逃问题的微分博弈对抗数值求解结果,提出一种混合经验的DDPG(mixed experienced DDPG,ME-DDPG)算法。在探索学习的策略集中加入博弈对抗数值解,计算出指向性策略,提升UAV追击策略的训练效率,改善UAV追逃博弈问题中由于回合任务过长、回报奖励稀疏、强化学习算法探索不足而导致的算法收敛速度缓慢且容易局部收敛的问题,提高了强化学习算法的学习效率。仿真实验结果表明,使用ME-DDPG算法解决UAV多对一博弈追逃任务时能够快速收敛,任务成功率达到83%。对比实验验证了所提算法相较DDPG算法在收敛性、稳定性以及任务成功率方面的优势。 展开更多
关键词 博弈论 深度强化学习 追逃博弈 无人机 多智能体
在线阅读 下载PDF
多智能体风险决策理论与方法研究综述
16
作者 李鹏 陈少飞 +3 位作者 易楚舒 李顺 兴军亮 陈璟 《计算机学报》 北大核心 2025年第10期2338-2370,共33页
目前,学术界已有众多关于多智能体决策的研究,形成了一系列理论与方法,能够有效表达多个决策主体在合作、竞争等环境下的交互关系并求解得到合理的行为策略,在策略游戏、交通控制等诸多方面取得了成功应用。然而,在现实世界中,智能体在... 目前,学术界已有众多关于多智能体决策的研究,形成了一系列理论与方法,能够有效表达多个决策主体在合作、竞争等环境下的交互关系并求解得到合理的行为策略,在策略游戏、交通控制等诸多方面取得了成功应用。然而,在现实世界中,智能体在决策时可能会面临环境状态变化、自身方法误差等风险因素,使得智能体获得的损益值往往偏离预期值,而且其他智能体策略带来的非平稳性、应对风险的不同态度也会给该智能体带来进一步的决策挑战。因此,许多学者致力于研究多智能体风险决策理论与方法,为多智能体系统面临风险决策时提供合理的策略选择方案。本文针对当前关于多智能体风险决策理论和方法的研究工作进行系统性综述,首先形式化描述环境风险、智能体自身风险、其他智能体风险等三个方面风险来源,然后综述了基于最优控制理论、强化学习理论和博弈均衡理论等多智能体风险决策的理论与方法,最后总结了多智能体风险决策方法在人机协作、自动驾驶、交通控制和智能电网领域中的应用,并展望了多智能体风险决策研究中可能需要重点关注的五个开放性问题。 展开更多
关键词 多智能体 风险决策 最优控制 强化学习 博弈均衡
在线阅读 下载PDF
藏族久棋计算机博弈研究综述
17
作者 李霞丽 顾旌世 +2 位作者 高乔 张皓扬 何非凡 《重庆理工大学学报(自然科学)》 北大核心 2025年第8期90-96,共7页
藏族久棋是国家级非物质文化遗产,其博弈研究不仅能够推动人工智能的发展,也有助于促进藏棋文化的保护与传承。久棋的复杂性高于围棋,其博弈分为3个阶段,每个阶段规则不同;其动作与空间复杂度极大,给低资源高效率博弈算法研究带来挑战... 藏族久棋是国家级非物质文化遗产,其博弈研究不仅能够推动人工智能的发展,也有助于促进藏棋文化的保护与传承。久棋的复杂性高于围棋,其博弈分为3个阶段,每个阶段规则不同;其动作与空间复杂度极大,给低资源高效率博弈算法研究带来挑战。研究梳理了当前久棋博弈研究的主要算法,并分析了现有久棋AI的水平。尽管基于专家知识的算法在实际对战中表现较好,但受到专家知识匮乏的限制;而结合知识与数据的深度强化学习算法,虽然在方法上较为先进,但因硬件资源的限制,AI水平提升受限。此外还分析了现有的久棋线上对弈平台,并探讨了当前博弈研究中存在的问题,提出了未来研究的可能方向。 展开更多
关键词 计算机博弈 藏族久棋 专家知识 深度强化学习
在线阅读 下载PDF
一种空中近距离多机自主博弈决策方法研究
18
作者 霍琳 王楚迪 李泽铎 《兵器装备工程学报》 北大核心 2025年第S1期193-199,共7页
针对空中近距离多机自主博弈训练过程中对手的构建以及选择问题,本文提出一种新的自主博弈决策方法以提升决策效率及性能。主要是结合优先级虚构自博弈(priority fictitious self-play,PFSP)和多智能体近端策略优化(multi-agent proxima... 针对空中近距离多机自主博弈训练过程中对手的构建以及选择问题,本文提出一种新的自主博弈决策方法以提升决策效率及性能。主要是结合优先级虚构自博弈(priority fictitious self-play,PFSP)和多智能体近端策略优化(multi-agent proximal policy optimization,MAPPO)的方法,采用F-16模型构建了高保真的2v2空中对抗场景。通过虚拟对抗场景实施了一系列对比仿真实验,结果显示,PFSP-MAPPO算法在多智能体协同决策任务中展现出卓越的策略性能,验证了所提方法的有效性与优越性。 展开更多
关键词 多智能体 协同决策 博弈对抗 自主决策 强化学习 MAPPO算法
在线阅读 下载PDF
多智能体强化学习控制与决策研究综述 被引量:8
19
作者 罗彪 胡天萌 +3 位作者 周育豪 黄廷文 阳春华 桂卫华 《自动化学报》 北大核心 2025年第3期510-539,共30页
强化学习作为一类重要的人工智能方法,广泛应用于解决复杂的控制和决策问题,其在众多领域的应用已展示出巨大潜力.近年来,强化学习从单智能体决策逐渐扩展到多智能体协作与博弈,形成多智能体强化学习这一研究热点.多智能体系统由多个具... 强化学习作为一类重要的人工智能方法,广泛应用于解决复杂的控制和决策问题,其在众多领域的应用已展示出巨大潜力.近年来,强化学习从单智能体决策逐渐扩展到多智能体协作与博弈,形成多智能体强化学习这一研究热点.多智能体系统由多个具有自主感知和决策能力的实体组成,有望解决传统单智能体方法难以应对的大规模复杂问题.多智能体强化学习不仅需要考虑环境的动态性,还需要应对其他智能体策略的不确定性,从而增加学习和决策过程的复杂度.为此,梳理多智能体强化学习在控制与决策领域的研究,分析其面临的主要问题与挑战,从控制理论与自主决策两个层次综述现有的研究成果与进展,并对未来的研究方向进行展望.通过分析,期望为未来多智能体强化学习的研究提供有价值的参考和启示. 展开更多
关键词 强化学习 多智能体系统 序列决策 协同控制 博弈论
在线阅读 下载PDF
基于多智能体强化学习的博弈综述 被引量:4
20
作者 李艺春 刘泽娇 +4 位作者 洪艺天 王继超 王健瑞 李毅 唐漾 《自动化学报》 北大核心 2025年第3期540-558,共19页
多智能体强化学习(Multi-agent reinforcement learning,MARL)作为博弈论、控制论和多智能体学习的交叉研究领域,是多智能体系统(Multi-agent systems,MASs)研究中的前沿方向,赋予智能体在动态多维的复杂环境中通过交互和决策完成多样... 多智能体强化学习(Multi-agent reinforcement learning,MARL)作为博弈论、控制论和多智能体学习的交叉研究领域,是多智能体系统(Multi-agent systems,MASs)研究中的前沿方向,赋予智能体在动态多维的复杂环境中通过交互和决策完成多样化任务的能力.多智能体强化学习正在向应用对象开放化、应用问题具身化、应用场景复杂化的方向发展,并逐渐成为解决现实世界中博弈决策问题的最有效工具.本文对基于多智能体强化学习的博弈进行系统性综述.首先,介绍多智能体强化学习的基本理论,梳理多智能体强化学习算法与基线测试环境的发展进程.其次,针对合作、对抗以及混合三种多智能体强化学习任务,从提高智能体合作效率、提升智能体对抗能力的维度来介绍多智能体强化学习的最新进展,并结合实际应用探讨混合博弈的前沿研究方向.最后,对多智能体强化学习的应用前景和发展趋势进行总结与展望. 展开更多
关键词 多智能体强化学习 多智能体系统 博弈决策 均衡求解
在线阅读 下载PDF
上一页 1 2 10 下一页 到第
使用帮助 返回顶部