期刊文献+
共找到76篇文章
< 1 2 4 >
每页显示 20 50 100
Tactical reward shaping for large-scale combat by multi-agent reinforcement learning
1
作者 DUO Nanxun WANG Qinzhao +1 位作者 LYU Qiang WANG Wei 《Journal of Systems Engineering and Electronics》 CSCD 2024年第6期1516-1529,共14页
Future unmanned battles desperately require intelli-gent combat policies,and multi-agent reinforcement learning offers a promising solution.However,due to the complexity of combat operations and large size of the comb... Future unmanned battles desperately require intelli-gent combat policies,and multi-agent reinforcement learning offers a promising solution.However,due to the complexity of combat operations and large size of the combat group,this task suffers from credit assignment problem more than other rein-forcement learning tasks.This study uses reward shaping to relieve the credit assignment problem and improve policy train-ing for the new generation of large-scale unmanned combat operations.We first prove that multiple reward shaping func-tions would not change the Nash Equilibrium in stochastic games,providing theoretical support for their use.According to the characteristics of combat operations,we propose tactical reward shaping(TRS)that comprises maneuver shaping advice and threat assessment-based attack shaping advice.Then,we investigate the effects of different types and combinations of shaping advice on combat policies through experiments.The results show that TRS improves both the efficiency and attack accuracy of combat policies,with the combination of maneuver reward shaping advice and ally-focused attack shaping advice achieving the best performance compared with that of the base-line strategy. 展开更多
关键词 deep reinforcement learning multi-agent reinforce-ment learning multi-agent combat unmanned battle reward shaping
在线阅读 下载PDF
Knowledge transfer in multi-agent reinforcement learning with incremental number of agents 被引量:4
2
作者 LIU Wenzhang DONG Lu +1 位作者 LIU Jian SUN Changyin 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2022年第2期447-460,共14页
In this paper, the reinforcement learning method for cooperative multi-agent systems(MAS) with incremental number of agents is studied. The existing multi-agent reinforcement learning approaches deal with the MAS with... In this paper, the reinforcement learning method for cooperative multi-agent systems(MAS) with incremental number of agents is studied. The existing multi-agent reinforcement learning approaches deal with the MAS with a specific number of agents, and can learn well-performed policies. However, if there is an increasing number of agents, the previously learned in may not perform well in the current scenario. The new agents need to learn from scratch to find optimal policies with others,which may slow down the learning speed of the whole team. To solve that problem, in this paper, we propose a new algorithm to take full advantage of the historical knowledge which was learned before, and transfer it from the previous agents to the new agents. Since the previous agents have been trained well in the source environment, they are treated as teacher agents in the target environment. Correspondingly, the new agents are called student agents. To enable the student agents to learn from the teacher agents, we first modify the input nodes of the networks for teacher agents to adapt to the current environment. Then, the teacher agents take the observations of the student agents as input, and output the advised actions and values as supervising information. Finally, the student agents combine the reward from the environment and the supervising information from the teacher agents, and learn the optimal policies with modified loss functions. By taking full advantage of the knowledge of teacher agents, the search space for the student agents will be reduced significantly, which can accelerate the learning speed of the holistic system. The proposed algorithm is verified in some multi-agent simulation environments, and its efficiency has been demonstrated by the experiment results. 展开更多
关键词 knowledge transfer multi-agent reinforcement learning(MARL) new agents
在线阅读 下载PDF
A single-task and multi-decision evolutionary game model based on multi-agent reinforcement learning 被引量:3
3
作者 MA Ye CHANG Tianqing FAN Wenhui 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2021年第3期642-657,共16页
In the evolutionary game of the same task for groups,the changes in game rules,personal interests,the crowd size,and external supervision cause uncertain effects on individual decision-making and game results.In the M... In the evolutionary game of the same task for groups,the changes in game rules,personal interests,the crowd size,and external supervision cause uncertain effects on individual decision-making and game results.In the Markov decision framework,a single-task multi-decision evolutionary game model based on multi-agent reinforcement learning is proposed to explore the evolutionary rules in the process of a game.The model can improve the result of a evolutionary game and facilitate the completion of the task.First,based on the multi-agent theory,to solve the existing problems in the original model,a negative feedback tax penalty mechanism is proposed to guide the strategy selection of individuals in the group.In addition,in order to evaluate the evolutionary game results of the group in the model,a calculation method of the group intelligence level is defined.Secondly,the Q-learning algorithm is used to improve the guiding effect of the negative feedback tax penalty mechanism.In the model,the selection strategy of the Q-learning algorithm is improved and a bounded rationality evolutionary game strategy is proposed based on the rule of evolutionary games and the consideration of the bounded rationality of individuals.Finally,simulation results show that the proposed model can effectively guide individuals to choose cooperation strategies which are beneficial to task completion and stability under different negative feedback factor values and different group sizes,so as to improve the group intelligence level. 展开更多
关键词 multi-agent reinforcement learning evolutionary game Q-learning
在线阅读 下载PDF
Cooperative multi-target hunting by unmanned surface vehicles based on multi-agent reinforcement learning 被引量:2
4
作者 Jiawei Xia Yasong Luo +3 位作者 Zhikun Liu Yalun Zhang Haoran Shi Zhong Liu 《Defence Technology(防务技术)》 SCIE EI CAS CSCD 2023年第11期80-94,共15页
To solve the problem of multi-target hunting by an unmanned surface vehicle(USV)fleet,a hunting algorithm based on multi-agent reinforcement learning is proposed.Firstly,the hunting environment and kinematic model wit... To solve the problem of multi-target hunting by an unmanned surface vehicle(USV)fleet,a hunting algorithm based on multi-agent reinforcement learning is proposed.Firstly,the hunting environment and kinematic model without boundary constraints are built,and the criteria for successful target capture are given.Then,the cooperative hunting problem of a USV fleet is modeled as a decentralized partially observable Markov decision process(Dec-POMDP),and a distributed partially observable multitarget hunting Proximal Policy Optimization(DPOMH-PPO)algorithm applicable to USVs is proposed.In addition,an observation model,a reward function and the action space applicable to multi-target hunting tasks are designed.To deal with the dynamic change of observational feature dimension input by partially observable systems,a feature embedding block is proposed.By combining the two feature compression methods of column-wise max pooling(CMP)and column-wise average-pooling(CAP),observational feature encoding is established.Finally,the centralized training and decentralized execution framework is adopted to complete the training of hunting strategy.Each USV in the fleet shares the same policy and perform actions independently.Simulation experiments have verified the effectiveness of the DPOMH-PPO algorithm in the test scenarios with different numbers of USVs.Moreover,the advantages of the proposed model are comprehensively analyzed from the aspects of algorithm performance,migration effect in task scenarios and self-organization capability after being damaged,the potential deployment and application of DPOMH-PPO in the real environment is verified. 展开更多
关键词 Unmanned surface vehicles multi-agent deep reinforcement learning Cooperative hunting Feature embedding Proximal policy optimization
在线阅读 下载PDF
UAV Frequency-based Crowdsensing Using Grouping Multi-agent Deep Reinforcement Learning
5
作者 Cui ZHANG En WANG +2 位作者 Funing YANG Yong jian YANG Nan JIANG 《计算机科学》 CSCD 北大核心 2023年第2期57-68,共12页
Mobile CrowdSensing(MCS)is a promising sensing paradigm that recruits users to cooperatively perform sensing tasks.Recently,unmanned aerial vehicles(UAVs)as the powerful sensing devices are used to replace user partic... Mobile CrowdSensing(MCS)is a promising sensing paradigm that recruits users to cooperatively perform sensing tasks.Recently,unmanned aerial vehicles(UAVs)as the powerful sensing devices are used to replace user participation and carry out some special tasks,such as epidemic monitoring and earthquakes rescue.In this paper,we focus on scheduling UAVs to sense the task Point-of-Interests(PoIs)with different frequency coverage requirements.To accomplish the sensing task,the scheduling strategy needs to consider the coverage requirement,geographic fairness and energy charging simultaneously.We consider the complex interaction among UAVs and propose a grouping multi-agent deep reinforcement learning approach(G-MADDPG)to schedule UAVs distributively.G-MADDPG groups all UAVs into some teams by a distance-based clustering algorithm(DCA),then it regards each team as an agent.In this way,G-MADDPG solves the problem that the training time of traditional MADDPG is too long to converge when the number of UAVs is large,and the trade-off between training time and result accuracy could be controlled flexibly by adjusting the number of teams.Extensive simulation results show that our scheduling strategy has better performance compared with three baselines and is flexible in balancing training time and result accuracy. 展开更多
关键词 UAV Crowdsensing Frequency coverage Grouping multi-agent deep reinforcement learning
在线阅读 下载PDF
Maneuvering target tracking of UAV based on MN-DDPG and transfer learning 被引量:15
6
作者 Bo Li Zhi-peng Yang +2 位作者 Da-qing Chen Shi-yang Liang Hao Ma 《Defence Technology(防务技术)》 SCIE EI CAS CSCD 2021年第2期457-466,共10页
Tracking maneuvering target in real time autonomously and accurately in an uncertain environment is one of the challenging missions for unmanned aerial vehicles(UAVs).In this paper,aiming to address the control proble... Tracking maneuvering target in real time autonomously and accurately in an uncertain environment is one of the challenging missions for unmanned aerial vehicles(UAVs).In this paper,aiming to address the control problem of maneuvering target tracking and obstacle avoidance,an online path planning approach for UAV is developed based on deep reinforcement learning.Through end-to-end learning powered by neural networks,the proposed approach can achieve the perception of the environment and continuous motion output control.This proposed approach includes:(1)A deep deterministic policy gradient(DDPG)-based control framework to provide learning and autonomous decision-making capability for UAVs;(2)An improved method named MN-DDPG for introducing a type of mixed noises to assist UAV with exploring stochastic strategies for online optimal planning;and(3)An algorithm of taskdecomposition and pre-training for efficient transfer learning to improve the generalization capability of UAV’s control model built based on MN-DDPG.The experimental simulation results have verified that the proposed approach can achieve good self-adaptive adjustment of UAV’s flight attitude in the tasks of maneuvering target tracking with a significant improvement in generalization capability and training efficiency of UAV tracking controller in uncertain environments. 展开更多
关键词 UAVS Maneuvering target tracking Deep reinforcement learning MN-DDPG Mixed noises transfer learning
在线阅读 下载PDF
Targeted multi-agent communication algorithm based on state control
7
作者 Li-yang Zhao Tian-qing Chang +3 位作者 Lei Zhang Jie Zhang Kai-xuan Chu De-peng Kong 《Defence Technology(防务技术)》 SCIE EI CAS CSCD 2024年第1期544-556,共13页
As an important mechanism in multi-agent interaction,communication can make agents form complex team relationships rather than constitute a simple set of multiple independent agents.However,the existing communication ... As an important mechanism in multi-agent interaction,communication can make agents form complex team relationships rather than constitute a simple set of multiple independent agents.However,the existing communication schemes can bring much timing redundancy and irrelevant messages,which seriously affects their practical application.To solve this problem,this paper proposes a targeted multiagent communication algorithm based on state control(SCTC).The SCTC uses a gating mechanism based on state control to reduce the timing redundancy of communication between agents and determines the interaction relationship between agents and the importance weight of a communication message through a series connection of hard-and self-attention mechanisms,realizing targeted communication message processing.In addition,by minimizing the difference between the fusion message generated from a real communication message of each agent and a fusion message generated from the buffered message,the correctness of the final action choice of the agent is ensured.Our evaluation using a challenging set of Star Craft II benchmarks indicates that the SCTC can significantly improve the learning performance and reduce the communication overhead between agents,thus ensuring better cooperation between agents. 展开更多
关键词 multi-agent deep reinforcement learning State control Targeted interaction Communication mechanism
在线阅读 下载PDF
基于改进双智能体D3QN的电网N-1新增风险防控策略 被引量:1
8
作者 安军 黎梓聪 +2 位作者 周毅博 石岩 毕建航 《中国电机工程学报》 北大核心 2025年第3期858-869,I0005,共13页
城市电网在发生N-1故障后,极可能新增运行风险,导致N-1-1时出现大面积停电事故。为管控城市电网N-1后运行风险,该文提出一种改进双智能体竞争双深度Q网络(dueling double deep Q network,D3QN)的城市电网N-1风险管控转供策略。根据风险... 城市电网在发生N-1故障后,极可能新增运行风险,导致N-1-1时出现大面积停电事故。为管控城市电网N-1后运行风险,该文提出一种改进双智能体竞争双深度Q网络(dueling double deep Q network,D3QN)的城市电网N-1风险管控转供策略。根据风险管控原则,提出一种无需额外历史数据、考虑备自投装置、单供变电站风险和单供负荷母线风险的N-1场景指标;建立计及动作次序、指标间关系的负荷转供三阶段求解模型。以含预动作-变化探索值选择策略的改进双智能体D3QN方法,将负荷转供分为多个子转供环节学习,使转供思路清晰化,对动作空间进行降维,提高训练寻优效果,得到管控N-1风险的负荷转供策略。通过城市电网多场景算例分析,验证该文模型和方法的有效性。 展开更多
关键词 城市电网 负荷转供 深度强化学习 N-1新增风险 双智能体
在线阅读 下载PDF
小样本数据驱动模式下的新建微电网优化调度策略
9
作者 陈实 杨林森 +3 位作者 刘艺洪 罗欢 臧天磊 周步祥 《上海交通大学学报》 北大核心 2025年第6期732-745,I0003,共15页
新建微电网缺少历史运行数据,常规数据驱动的方法难以精确预测可再生能源出力,进而影响调度计划制定的准确性.为此,提出一种适用于新建微电网小样本数据场景的微电网优化调度方法.首先,设计融合域对抗神经网络和长短期记忆网络的改进网... 新建微电网缺少历史运行数据,常规数据驱动的方法难以精确预测可再生能源出力,进而影响调度计划制定的准确性.为此,提出一种适用于新建微电网小样本数据场景的微电网优化调度方法.首先,设计融合域对抗神经网络和长短期记忆网络的改进网络结构,将域对抗思想和梯度反转机制引入迁移学习中,提高模型泛化能力,减小数据的域分布差异,使用出力特征相似电站的丰富运行数据对目标电站出力进行预测,克服小样本条件下出力预测精度不高的问题.进一步,将优化调度模型转化为马尔可夫决策过程,使用双延迟深度确定性策略梯度算法求解.最后,以改进CIGRE 14节点微电网为例验证了所提方法的有效性. 展开更多
关键词 小样本 可再生能源出力 对抗迁移学习 深度强化学习 微电网优化调度
在线阅读 下载PDF
弹性资源介入下基于知识矩阵跨维度迁移的电力调度优化
10
作者 唐昊 杨晨芳 +2 位作者 程文娟 王正风 史明光 《控制工程》 北大核心 2025年第6期995-1007,共13页
各类源荷弹性资源逐渐介入电网,使得电力系统的动态特性进一步复杂化。为提升电力系统调度优化任务的学习效率,基于弹性资源介入前源电力系统的调度知识矩阵提出一种跨维度迁移的学习优化方法。首先,利用欧式-动态时间弯曲距离,给出源... 各类源荷弹性资源逐渐介入电网,使得电力系统的动态特性进一步复杂化。为提升电力系统调度优化任务的学习效率,基于弹性资源介入前源电力系统的调度知识矩阵提出一种跨维度迁移的学习优化方法。首先,利用欧式-动态时间弯曲距离,给出源任务与目标任务间关联特征的相似度判定方法。然后,引入主成分分析特征降维技术,建立源任务与目标任务间相似状态/动作的映射关系,提出基于调度知识矩阵跨维度迁移的强化学习方法,解决因源任务与目标任务状态或动作维数不同而导致的历史调度知识不能直接利用的问题。最后,以IEEE-300节点系统进行仿真分析,结果表明,所提方法能有效利用源任务的历史调度知识,实现弹性资源介入时复杂电力系统的快速调度优化。 展开更多
关键词 跨维度知识迁移 电力调度 欧式-动态时间弯曲距离 弹性资源 强化学习
在线阅读 下载PDF
查询负载变化自适应的数据库索引推荐方法
11
作者 吴康 牛祥虞 +3 位作者 游进国 虞文波 李晓武 丁家满 《计算机应用研究》 北大核心 2025年第9期2758-2764,共7页
在目前基于深度强化学习的数据库索引推荐中,当负载变化时,由于实际负载与训练负载差距较大,模型的推荐效果会显著下降。针对现有基于深度强化学习的索引推荐算法在负载增量变化下自适应性和模型泛化性不足的问题,提出了一个基于多智能... 在目前基于深度强化学习的数据库索引推荐中,当负载变化时,由于实际负载与训练负载差距较大,模型的推荐效果会显著下降。针对现有基于深度强化学习的索引推荐算法在负载增量变化下自适应性和模型泛化性不足的问题,提出了一个基于多智能体迁移强化学习的索引推荐算法MARLIA(multi-agent reinforcement learning index advisor)。该算法结合了迁移学习的思想,使用多智能体进行模型训练。在负载变化更新导致模型推荐效果下降时,该算法可以利用策略蒸馏的方式将旧索引推荐策略传递给新索引推荐智能体,提高了模型的泛化性和对动态负载的支持。在TPC-H数据集上的实验结果表明,该算法的负载代价提升率与基线算法相比稳定在7%以内,在负载为120条时缓存命中率为76.3%。该研究表明,MARLIA算法在负载变化时具有强大的自适应性和模型泛化能力。 展开更多
关键词 数据库优化 负载自适应 索引推荐 强化学习 迁移学习
在线阅读 下载PDF
面向空间自主装配验证评估的机械臂避障运动规划
12
作者 谢芳霖 汪凌昕 +2 位作者 张亚航 王耀兵 王捷 《航天器工程》 北大核心 2025年第2期82-89,共8页
面向空间机械臂自主装配的验证评估问题,针对人机、多机协同作业过程中彼此进入对方作业区域后可能出现的高风险碰撞工况,将机械臂避障抓取的运动规划定义为强化学习问题,并提出一种改进的双经验池组合采样经验回放方法。仿真结果表明:... 面向空间机械臂自主装配的验证评估问题,针对人机、多机协同作业过程中彼此进入对方作业区域后可能出现的高风险碰撞工况,将机械臂避障抓取的运动规划定义为强化学习问题,并提出一种改进的双经验池组合采样经验回放方法。仿真结果表明:采用该方法进行训练后,机械臂可有效避开动态障碍物,且末端到位精度从0.2717 m提高至0.0413 m。基于物理样机开展了试验验证,进一步证明了避障抓取策略能使机械臂自主避开任意障碍物并准确抓取目标,不需要对障碍物的运动状态进行预测。文章提出的避障运动规划,可用于各类固定基座空间机械臂的地面验证评估。 展开更多
关键词 空间机械臂 避障运动规划 深度强化学习 优先经验回放 迁移学习
在线阅读 下载PDF
基于深度强化学习的入口匝道流量调控方法
13
作者 韩雨 陈志轩 +4 位作者 王翊萱 李春杰 雷伟 焦彦利 刘攀 《汽车安全与节能学报》 北大核心 2025年第4期587-597,共11页
针对当前基于强化学习的匝道控制方法对策略训练中的学习成本、策略迁移性等研究不充分,导致控制策略难以在实际中应用的问题,该文提出一种匝道控制策略优化的强化学习方法,并通过大量仿真实验对方法的可移植性进行了深入研究。构建匝... 针对当前基于强化学习的匝道控制方法对策略训练中的学习成本、策略迁移性等研究不充分,导致控制策略难以在实际中应用的问题,该文提出一种匝道控制策略优化的强化学习方法,并通过大量仿真实验对方法的可移植性进行了深入研究。构建匝道控制模型,提出基于深度强化学习的模型训练方法;选取雄安新区对外主干路网中荣乌高速公路某合流区瓶颈作为实验场景,利用深度强化学习算法对模型进行训练,并将训练过程中控制策略的表现与经典匝道控制方法比较,从而对学习成本进行量化分析;选取不同仿真模型及多组模型参数作为测试环境,分析训练环境与测试环境差异对控制策略的影响。结果表明:当训练环境与测试环境差异程度在20%以内时,强化学习控制方法在提升通行效率方面显著优于经典匝道控制方法;而当差异程度超过20%时,两种方法效果差异不明显。 展开更多
关键词 匝道控制 强化学习 迁移性 学习成本
在线阅读 下载PDF
基于语义选择的迁移强化学习
14
作者 吕彦佑 韩飞 《江苏大学学报(自然科学版)》 北大核心 2025年第5期548-555,共8页
针对现有Fine-Tuning方法在面临微小视觉变化时迁移效率低的问题,提出一种基于语义选择的迁移强化学习模型.受非注意盲视现象(人类在注意转移时会忽略背景中的无关刺激)的启发,新方法通过无监督语义分割和语义权重选择,分离视觉任务迁... 针对现有Fine-Tuning方法在面临微小视觉变化时迁移效率低的问题,提出一种基于语义选择的迁移强化学习模型.受非注意盲视现象(人类在注意转移时会忽略背景中的无关刺激)的启发,新方法通过无监督语义分割和语义权重选择,分离视觉任务迁移与策略控制任务,保留对策略任务导向性强的视觉特征.选用Flappy Bird游戏变体作为试验环境,在不同视觉干扰情况下进行迁移环境试验,对比了多种方法注意力机制迁移效果.结果表明:与基于注意力机制的迁移方法相比,文中方法在迁移效率、可解释性及复杂背景适应性方面均表现更优,尤其在视觉干扰显著的环境中展现出更强的鲁棒性. 展开更多
关键词 图像分层 强化学习 迁移学习 注意力机制 非注意盲视 语义分割 语义选择
在线阅读 下载PDF
基于DDQN的否定约束规则迁移
15
作者 秦建斌 杜玉琪 林毅斌 《深圳大学学报(理工版)》 北大核心 2025年第2期242-248,共7页
传统的数据清洗方法因需专家手动定义数据质量规则,不仅复杂且耗时巨大,且清洗后的数据可能不能被重复利用,降低了数据清洗的质量和效率.为此,提出基于双重深度Q网络的否定约束迁移(double deep Q-network for denial constraints trans... 传统的数据清洗方法因需专家手动定义数据质量规则,不仅复杂且耗时巨大,且清洗后的数据可能不能被重复利用,降低了数据清洗的质量和效率.为此,提出基于双重深度Q网络的否定约束迁移(double deep Q-network for denial constraints transfer,DDQN-DCT)算法.该算法设计否定约束(denial constraint,DC)的相似性度量方法,并结合相似性和DC的简洁性和覆盖度,使用双重深度Q网络(double deep Q-network,DDQN)对DC规则中的谓词进行修改从而实现DC迁移,目标是使迁移后的规则与原规则具有最大的相似性,以保留原有规则的信息.基于DDQN-DCT进一步设计了DDQN-DCT+算法,把DDQN的动作选择策略划为增加和删除2个阶段,并通过对比实验证明DDQN-DCT+在DC简洁性上有更好的表现.通过与暴力依赖约束迁移(brute-force dependency constraint transfer,BFDC)、DDQN-DCT+和结构扩大/缩小(structure expansion/reduction,SER)方法进行对比实验,发现DDQN-DCT算法在规则相似度上相较于BFDC平均提升约10%,相较于DDQN-DCT+平均提升约10.6%,相较于SER平均提升约16.4%.DDQN-DCT能够有效地将源域的规则迁移到相似的目标域数据上. 展开更多
关键词 计算机技术 规则迁移 否定约束 相似度度量 强化学习 数据清洗
在线阅读 下载PDF
Variable reward function-driven strategies for impulsive orbital attack-defense games under multiple constraints and victory conditions
16
作者 Liran Zhao Sihan Xu +1 位作者 Qinbo Sun Zhaohui Dang 《Defence Technology(防务技术)》 2025年第9期159-183,共25页
This paper investigates impulsive orbital attack-defense(AD)games under multiple constraints and victory conditions,involving three spacecraft:attacker,target,and defender.In the AD scenario,the attacker aims to breac... This paper investigates impulsive orbital attack-defense(AD)games under multiple constraints and victory conditions,involving three spacecraft:attacker,target,and defender.In the AD scenario,the attacker aims to breach the defender's interception to rendezvous with the target,while the defender seeks to protect the target by blocking or actively pursuing the attacker.Four different maneuvering constraints and five potential game outcomes are incorporated to more accurately model AD game problems and increase complexity,thereby reducing the effectiveness of traditional methods such as differential games and game-tree searches.To address these challenges,this study proposes a multiagent deep reinforcement learning solution with variable reward functions.Two attack strategies,Direct attack(DA)and Bypass attack(BA),are developed for the attacker,each focusing on different mission priorities.Similarly,two defense strategies,Direct interdiction(DI)and Collinear interdiction(CI),are designed for the defender,each optimizing specific defensive actions through tailored reward functions.Each reward function incorporates both process rewards(e.g.,distance and angle)and outcome rewards,derived from physical principles and validated via geometric analysis.Extensive simulations of four strategy confrontations demonstrate average defensive success rates of 75%for DI vs.DA,40%for DI vs.BA,80%for CI vs.DA,and 70%for CI vs.BA.Results indicate that CI outperforms DI for defenders,while BA outperforms DA for attackers.Moreover,defenders achieve their objectives more effectively under identical maneuvering capabilities.Trajectory evolution analyses further illustrate the effectiveness of the proposed variable reward function-driven strategies.These strategies and analyses offer valuable guidance for practical orbital defense scenarios and lay a foundation for future multi-agent game research. 展开更多
关键词 Orbital attack-defense game Impulsive maneuver multi-agent deep reinforcement learning Reward function design
在线阅读 下载PDF
面向机器人系统的虚实迁移强化学习综述 被引量:2
17
作者 林谦 余超 +4 位作者 伍夏威 董银昭 徐昕 张强 郭宪 《软件学报》 EI CSCD 北大核心 2024年第2期711-738,共28页
近年来,基于环境交互的强化学习方法在机器人相关应用领域取得巨大成功,为机器人行为控制策略优化提供一个现实可行的解决方案.但在真实世界中收集交互样本存在高成本以及低效率等问题,因此仿真环境被广泛应用于机器人强化学习训练过程... 近年来,基于环境交互的强化学习方法在机器人相关应用领域取得巨大成功,为机器人行为控制策略优化提供一个现实可行的解决方案.但在真实世界中收集交互样本存在高成本以及低效率等问题,因此仿真环境被广泛应用于机器人强化学习训练过程中.通过在虚拟仿真环境中以较低成本获取大量训练样本进行策略训练,并将学习策略迁移至真实环境,能有效缓解真实机器人训练中存在的安全性、可靠性以及实时性等问题.然而,由于仿真环境与真实环境存在差异,仿真环境中训练得到的策略直接迁移到真实机器人往往难以获得理想的性能表现.针对这一问题,虚实迁移强化学习方法被提出用以缩小环境差异,进而实现有效的策略迁移.按照迁移强化学习过程中信息的流动方向和智能化方法作用的不同对象,提出一个虚实迁移强化学习系统的流程框架,并基于此框架将现有相关工作分为3大类:基于真实环境的模型优化方法、基于仿真环境的知识迁移方法、基于虚实环境的策略迭代提升方法,并对每一分类中的代表技术与关联工作进行阐述.最后,讨论虚实迁移强化学习研究领域面临的机遇和挑战. 展开更多
关键词 强化学习 迁移学习 虚实迁移 现实差距 机器人控制
在线阅读 下载PDF
一种超参数自适应航天器交会变轨策略优化方法 被引量:2
18
作者 孙雷翔 郭延宁 +2 位作者 邓武东 吕跃勇 马广富 《宇航学报》 EI CAS CSCD 北大核心 2024年第1期52-62,共11页
利用强化学习技术,本文提出了一种超参数自适应的燃料最优地球同步轨道(GEO)航天器交会变轨策略优化方法。首先,建立了GEO航天器交会Lambert变轨模型。以变轨时刻为决策变量、燃料消耗为适应度函数,使用改进式综合学习粒子群算法(ICLPSO... 利用强化学习技术,本文提出了一种超参数自适应的燃料最优地球同步轨道(GEO)航天器交会变轨策略优化方法。首先,建立了GEO航天器交会Lambert变轨模型。以变轨时刻为决策变量、燃料消耗为适应度函数,使用改进式综合学习粒子群算法(ICLPSO)作为变轨策略优化的基础方法。其次,考虑到求解的最优性和快速性,重新设计了以粒子群算法(PSO)优化结果为参考基线的奖励函数。使用一族典型GEO航天器交会工况训练深度确定性策略梯度神经网络(DDPG)。将DDPG与ICLPSO组合为强化学习粒子群算法(RLPSO),从而实现算法超参数根据实时迭代收敛情况的自适应动态调整。最后,仿真结果表明与PSO、综合学习粒子群算法(CLPSO)相比,RLPSO在较少迭代后即可给出适应度较高的规划结果,减轻了迭代过程中的计算资源消耗。 展开更多
关键词 地球同步轨道 Lambert变轨 强化学习 粒子群算法 深度确定性策略梯度
在线阅读 下载PDF
基于策略融合及Spiking DRL的移动机器人路径规划方法 被引量:1
19
作者 安阳 王秀青 赵明华 《计算机科学》 CSCD 北大核心 2024年第S02期59-69,共11页
深度强化学习(DRL)已被成功应用于移动机器人路径规划中,基于DRL的移动机器人路径规划算法适用于高维环境,是实现移动机器人自主学习的重要方法。而训练DRL模型需要大量的环境交互经验,这意味着更高的计算成本。此外,DRL算法的经验池容... 深度强化学习(DRL)已被成功应用于移动机器人路径规划中,基于DRL的移动机器人路径规划算法适用于高维环境,是实现移动机器人自主学习的重要方法。而训练DRL模型需要大量的环境交互经验,这意味着更高的计算成本。此外,DRL算法的经验池容量有限,无法确保经验的有效利用。作为类脑计算重要工具之一的脉冲神经网络(Spiking Neural Networks,SNNs)以其独有的生物似真性,能同时融入时空信息,适用于机器人环境感知及控制。结合SNNs、卷积神经网络(CNNs)和策略融合,针对基于DRL的移动机器人路径规划算法进行研究,完成了以下工作:1)提出SCDDPG(SCDDP)算法。该算法利用CNNs对输入状态进行多通道特征提取,利用SNNs对提取的特征进行时空学习。2)在SCDDPG的基础上,提出SC2DDPG(SC2DDPG)算法。SC2DDPG通过设计状态约束策略对机器人运行状态进行约束,避免了不必要的环境探索,提升了SC2DDPG中DRL的收敛速度。3)在SCDDPG的基础上,提出了PFTDDPG(Policy Fusion and Transfer SCDDPG,PFTDDPG)算法。该算法采用分阶控制模式与DRL算法融合,针对环境中的楔形障碍物实施沿墙行走策略,并引入迁移学习对先验知识进行策略迁移。PFTDDPG算法不仅完成了单纯依靠RL不能完成的路径规划任务,还可以得到最优无碰路径。此外PFTDDPG提升了模型的收敛速度和路径规划性能。实验结果证明了所提出的3种路径规划算法的有效性,对比实验结果表明:在SpikeDDPG,SCDDPG,SC2DDPG和PFTDDPG算法中,PFTDDPG算法在路径规划成功率、训练收敛速度、规划路径长度等性能指标上表现最佳。本工作为移动机器人路径规划提出了新思路,丰富了DRL在移动机器人路径规划中的解决方案。 展开更多
关键词 深度强化学习 脉冲神经网络 卷积神经网络 迁移学习 移动机器人路径规划
在线阅读 下载PDF
基于多奖励强化学习的半监督文本风格迁移方法 被引量:1
20
作者 李静文 叶琪 +2 位作者 阮彤 林宇翩 薛万东 《计算机科学》 CSCD 北大核心 2024年第8期263-271,共9页
文本风格迁移是自然语言处理中的一项重要任务,其主要目的在于改变文本的风格属性,同时保留必要的语义信息。然而,在许多任务缺乏大规模平行语料库的情况下,现有的无监督方法存在文本多样性不足和语义一致性较差的问题。针对这些问题,... 文本风格迁移是自然语言处理中的一项重要任务,其主要目的在于改变文本的风格属性,同时保留必要的语义信息。然而,在许多任务缺乏大规模平行语料库的情况下,现有的无监督方法存在文本多样性不足和语义一致性较差的问题。针对这些问题,文中提出了一种半监督的多阶段训练框架。该框架首先利用风格标注模型和掩码语言模型构造伪平行语料库,以有监督的方式引导模型学习多样性的迁移方式。其次,设计了对抗性相似奖励、Mis奖励和风格奖励,从未标记的数据中进行强化学习以增强模型的语义一致性、逻辑一致性和风格准确性。在基于YELP数据集的情感极性转换任务中,该方法的BLEURT分数提升了3.1%,Mis分数提升了2.5%,BLEU分数提升了9.5%;在基于GYAFC数据集的正式文体转换实验中,该方法的BLEURT分数提高了6.2%,BLEU分数提高了3%。 展开更多
关键词 文本生成 文本风格迁移 多阶段训练 风格标注模型 强化学习
在线阅读 下载PDF
上一页 1 2 4 下一页 到第
使用帮助 返回顶部