期刊文献+
共找到156篇文章
< 1 2 8 >
每页显示 20 50 100
Multi-QoS routing algorithm based on reinforcement learning for LEO satellite networks 被引量:1
1
作者 ZHANG Yifan DONG Tao +1 位作者 LIU Zhihui JIN Shichao 《Journal of Systems Engineering and Electronics》 2025年第1期37-47,共11页
Low Earth orbit(LEO)satellite networks exhibit distinct characteristics,e.g.,limited resources of individual satellite nodes and dynamic network topology,which have brought many challenges for routing algorithms.To sa... Low Earth orbit(LEO)satellite networks exhibit distinct characteristics,e.g.,limited resources of individual satellite nodes and dynamic network topology,which have brought many challenges for routing algorithms.To satisfy quality of service(QoS)requirements of various users,it is critical to research efficient routing strategies to fully utilize satellite resources.This paper proposes a multi-QoS information optimized routing algorithm based on reinforcement learning for LEO satellite networks,which guarantees high level assurance demand services to be prioritized under limited satellite resources while considering the load balancing performance of the satellite networks for low level assurance demand services to ensure the full and effective utilization of satellite resources.An auxiliary path search algorithm is proposed to accelerate the convergence of satellite routing algorithm.Simulation results show that the generated routing strategy can timely process and fully meet the QoS demands of high assurance services while effectively improving the load balancing performance of the link. 展开更多
关键词 low Earth orbit(LEO)satellite network reinforcement learning multi-quality of service(QoS) routing algorithm
在线阅读 下载PDF
Ground reaction curves for strain-softening rock masses with ground reinforcement based on unified strength criterion
2
作者 CHEN Xuan-hao ZHANG Ding-li +1 位作者 SUN Zhen-yu CHEN Wen-bo 《Journal of Central South University》 2025年第9期3383-3404,共22页
Ground reinforcement is crucial for tunnel construction, especially in soft rock tunnels. Existing analytical models are inadequate for predicting the ground reaction curves (GRCs) for reinforced tunnels in strain-sof... Ground reinforcement is crucial for tunnel construction, especially in soft rock tunnels. Existing analytical models are inadequate for predicting the ground reaction curves (GRCs) for reinforced tunnels in strain-softening (SS) rock masses. This study proposes a novel analytical model to determine the GRCs of SS rock masses, incorporating ground reinforcement and intermediate principal stress (IPS). The SS constitutive model captures the progressive post- peak failure, while the elastic-brittle model simulates reinforced rock masses. Nine combined states are innovatively investigated to analyze plastic zone development in natural and reinforced regions. Each region is analyzed separately, and coupled through boundary conditions at interface. Comparison with three types of existing models indicates that these models overestimate reinforcement effects. The deformation prediction errors of single geological material models may exceed 75%. Furthermore, neglecting softening and residual zones in natural regions could lead to errors over 50%. Considering the IPS can effectively utilize the rock strength to reduce tunnel deformation by at least 30%, thereby saving on reinforcement and support costs. The computational results show a satisfactory agreement with the monitoring data from a model test and two tunnel projects. The proposed model may offer valuable insights into the design and construction of reinforced tunnel engineering. 展开更多
关键词 ground reinforcement STRAIN-SOFTENING unified strength criterion tunnel responses analytical model
在线阅读 下载PDF
Interfacial reinforcement of core-shell HMX@energetic polymer composites featuring enhanced thermal and safety performance 被引量:2
3
作者 Binghui Duan Hongchang Mo +3 位作者 Bojun Tan Xianming Lu Bozhou Wang Ning Liu 《Defence Technology(防务技术)》 SCIE EI CAS CSCD 2024年第1期387-399,共13页
The weak interface interaction and solid-solid phase transition have long been a conundrum for 1,3,5,7-tetranitro-1,3,5,7-tetraazacyclooctane(HMX)-based polymer-bonded explosives(PBX).A two-step strategy that involves... The weak interface interaction and solid-solid phase transition have long been a conundrum for 1,3,5,7-tetranitro-1,3,5,7-tetraazacyclooctane(HMX)-based polymer-bonded explosives(PBX).A two-step strategy that involves the pretreatment of HMX to endow—OH groups on the surface via polyalcohol bonding agent modification and in situ coating with nitrate ester-containing polymer,was proposed to address the problem.Two types of energetic polyether—glycidyl azide polymer(GAP)and nitrate modified GAP(GNP)were grafted onto HMX crystal based on isocyanate addition reaction bridged through neutral polymeric bonding agent(NPBA)layer.The morphology and structure of the HMX-based composites were characterized in detail and the core-shell structure was validated.The grafted polymers obviously enhanced the adhesion force between HMX crystals and fluoropolymer(F2314)binder.Due to the interfacial reinforcement among the components,the two HMX-based composites exhibited a remarkable increment of phase transition peak temperature by 10.2°C and 19.6°C with no more than 1.5%shell content,respectively.Furthermore,the impact and friction sensitivity of the composites decreased significantly as a result of the barrier produced by the grafted polymers.These findings will enhance the future prospects for the interface design of energetic composites aiming to solve the weak interface and safety concerns. 展开更多
关键词 HMX crystals Polyalcohol bonding agent Energetic polymer Core-shell structure Interfacial reinforcement
在线阅读 下载PDF
Cognitive interference decision method for air defense missile fuze based on reinforcement learning 被引量:1
4
作者 Dingkun Huang Xiaopeng Yan +2 位作者 Jian Dai Xinwei Wang Yangtian Liu 《Defence Technology(防务技术)》 SCIE EI CAS CSCD 2024年第2期393-404,共12页
To solve the problem of the low interference success rate of air defense missile radio fuzes due to the unified interference form of the traditional fuze interference system,an interference decision method based Q-lea... To solve the problem of the low interference success rate of air defense missile radio fuzes due to the unified interference form of the traditional fuze interference system,an interference decision method based Q-learning algorithm is proposed.First,dividing the distance between the missile and the target into multiple states to increase the quantity of state spaces.Second,a multidimensional motion space is utilized,and the search range of which changes with the distance of the projectile,to select parameters and minimize the amount of ineffective interference parameters.The interference effect is determined by detecting whether the fuze signal disappears.Finally,a weighted reward function is used to determine the reward value based on the range state,output power,and parameter quantity information of the interference form.The effectiveness of the proposed method in selecting the range of motion space parameters and designing the discrimination degree of the reward function has been verified through offline experiments involving full-range missile rendezvous.The optimal interference form for each distance state has been obtained.Compared with the single-interference decision method,the proposed decision method can effectively improve the success rate of interference. 展开更多
关键词 Cognitive radio Interference decision Radio fuze reinforcement learning Interference strategy optimization
在线阅读 下载PDF
Recorded recurrent deep reinforcement learning guidance laws for intercepting endoatmospheric maneuvering missiles 被引量:1
5
作者 Xiaoqi Qiu Peng Lai +1 位作者 Changsheng Gao Wuxing Jing 《Defence Technology(防务技术)》 SCIE EI CAS CSCD 2024年第1期457-470,共14页
This work proposes a recorded recurrent twin delayed deep deterministic(RRTD3)policy gradient algorithm to solve the challenge of constructing guidance laws for intercepting endoatmospheric maneuvering missiles with u... This work proposes a recorded recurrent twin delayed deep deterministic(RRTD3)policy gradient algorithm to solve the challenge of constructing guidance laws for intercepting endoatmospheric maneuvering missiles with uncertainties and observation noise.The attack-defense engagement scenario is modeled as a partially observable Markov decision process(POMDP).Given the benefits of recurrent neural networks(RNNs)in processing sequence information,an RNN layer is incorporated into the agent’s policy network to alleviate the bottleneck of traditional deep reinforcement learning methods while dealing with POMDPs.The measurements from the interceptor’s seeker during each guidance cycle are combined into one sequence as the input to the policy network since the detection frequency of an interceptor is usually higher than its guidance frequency.During training,the hidden states of the RNN layer in the policy network are recorded to overcome the partially observable problem that this RNN layer causes inside the agent.The training curves show that the proposed RRTD3 successfully enhances data efficiency,training speed,and training stability.The test results confirm the advantages of the RRTD3-based guidance laws over some conventional guidance laws. 展开更多
关键词 Endoatmospheric interception Missile guidance reinforcement learning Markov decision process Recurrent neural networks
在线阅读 下载PDF
Reinforcement learning based adaptive control for uncertain mechanical systems with asymptotic tracking 被引量:1
6
作者 Xiang-long Liang Zhi-kai Yao +1 位作者 Yao-wen Ge Jian-yong Yao 《Defence Technology(防务技术)》 SCIE EI CAS CSCD 2024年第4期19-28,共10页
This paper mainly focuses on the development of a learning-based controller for a class of uncertain mechanical systems modeled by the Euler-Lagrange formulation.The considered system can depict the behavior of a larg... This paper mainly focuses on the development of a learning-based controller for a class of uncertain mechanical systems modeled by the Euler-Lagrange formulation.The considered system can depict the behavior of a large class of engineering systems,such as vehicular systems,robot manipulators and satellites.All these systems are often characterized by highly nonlinear characteristics,heavy modeling uncertainties and unknown perturbations,therefore,accurate-model-based nonlinear control approaches become unavailable.Motivated by the challenge,a reinforcement learning(RL)adaptive control methodology based on the actor-critic framework is investigated to compensate the uncertain mechanical dynamics.The approximation inaccuracies caused by RL and the exogenous unknown disturbances are circumvented via a continuous robust integral of the sign of the error(RISE)control approach.Different from a classical RISE control law,a tanh(·)function is utilized instead of a sign(·)function to acquire a more smooth control signal.The developed controller requires very little prior knowledge of the dynamic model,is robust to unknown dynamics and exogenous disturbances,and can achieve asymptotic output tracking.Eventually,co-simulations through ADAMS and MATLAB/Simulink on a three degrees-of-freedom(3-DOF)manipulator and experiments on a real-time electromechanical servo system are performed to verify the performance of the proposed approach. 展开更多
关键词 Adaptive control reinforcement learning Uncertain mechanical systems Asymptotic tracking
在线阅读 下载PDF
UAV maneuvering decision-making algorithm based on deep reinforcement learning under the guidance of expert experience 被引量:1
7
作者 ZHAN Guang ZHANG Kun +1 位作者 LI Ke PIAO Haiyin 《Journal of Systems Engineering and Electronics》 SCIE CSCD 2024年第3期644-665,共22页
Autonomous umanned aerial vehicle(UAV) manipulation is necessary for the defense department to execute tactical missions given by commanders in the future unmanned battlefield. A large amount of research has been devo... Autonomous umanned aerial vehicle(UAV) manipulation is necessary for the defense department to execute tactical missions given by commanders in the future unmanned battlefield. A large amount of research has been devoted to improving the autonomous decision-making ability of UAV in an interactive environment, where finding the optimal maneuvering decisionmaking policy became one of the key issues for enabling the intelligence of UAV. In this paper, we propose a maneuvering decision-making algorithm for autonomous air-delivery based on deep reinforcement learning under the guidance of expert experience. Specifically, we refine the guidance towards area and guidance towards specific point tasks for the air-delivery process based on the traditional air-to-surface fire control methods.Moreover, we construct the UAV maneuvering decision-making model based on Markov decision processes(MDPs). Specifically, we present a reward shaping method for the guidance towards area and guidance towards specific point tasks using potential-based function and expert-guided advice. The proposed algorithm could accelerate the convergence of the maneuvering decision-making policy and increase the stability of the policy in terms of the output during the later stage of training process. The effectiveness of the proposed maneuvering decision-making policy is illustrated by the curves of training parameters and extensive experimental results for testing the trained policy. 展开更多
关键词 unmanned aerial vehicle(UAV) maneuvering decision-making autonomous air-delivery deep reinforcement learning reward shaping expert experience
在线阅读 下载PDF
Computational intelligence interception guidance law using online off-policy integral reinforcement learning 被引量:1
8
作者 WANG Qi LIAO Zhizhong 《Journal of Systems Engineering and Electronics》 SCIE CSCD 2024年第4期1042-1052,共11页
Missile interception problem can be regarded as a two-person zero-sum differential games problem,which depends on the solution of Hamilton-Jacobi-Isaacs(HJI)equa-tion.It has been proved impossible to obtain a closed-f... Missile interception problem can be regarded as a two-person zero-sum differential games problem,which depends on the solution of Hamilton-Jacobi-Isaacs(HJI)equa-tion.It has been proved impossible to obtain a closed-form solu-tion due to the nonlinearity of HJI equation,and many iterative algorithms are proposed to solve the HJI equation.Simultane-ous policy updating algorithm(SPUA)is an effective algorithm for solving HJI equation,but it is an on-policy integral reinforce-ment learning(IRL).For online implementation of SPUA,the dis-turbance signals need to be adjustable,which is unrealistic.In this paper,an off-policy IRL algorithm based on SPUA is pro-posed without making use of any knowledge of the systems dynamics.Then,a neural-network based online adaptive critic implementation scheme of the off-policy IRL algorithm is pre-sented.Based on the online off-policy IRL method,a computa-tional intelligence interception guidance(CIIG)law is developed for intercepting high-maneuvering target.As a model-free method,intercepting targets can be achieved through measur-ing system data online.The effectiveness of the CIIG is verified through two missile and target engagement scenarios. 展开更多
关键词 two-person zero-sum differential games Hamilton–Jacobi–Isaacs(HJI)equation off-policy integral reinforcement learning(IRL) online learning computational intelligence inter-ception guidance(CIIG)law
在线阅读 下载PDF
Tactical reward shaping for large-scale combat by multi-agent reinforcement learning
9
作者 DUO Nanxun WANG Qinzhao +1 位作者 LYU Qiang WANG Wei 《Journal of Systems Engineering and Electronics》 CSCD 2024年第6期1516-1529,共14页
Future unmanned battles desperately require intelli-gent combat policies,and multi-agent reinforcement learning offers a promising solution.However,due to the complexity of combat operations and large size of the comb... Future unmanned battles desperately require intelli-gent combat policies,and multi-agent reinforcement learning offers a promising solution.However,due to the complexity of combat operations and large size of the combat group,this task suffers from credit assignment problem more than other rein-forcement learning tasks.This study uses reward shaping to relieve the credit assignment problem and improve policy train-ing for the new generation of large-scale unmanned combat operations.We first prove that multiple reward shaping func-tions would not change the Nash Equilibrium in stochastic games,providing theoretical support for their use.According to the characteristics of combat operations,we propose tactical reward shaping(TRS)that comprises maneuver shaping advice and threat assessment-based attack shaping advice.Then,we investigate the effects of different types and combinations of shaping advice on combat policies through experiments.The results show that TRS improves both the efficiency and attack accuracy of combat policies,with the combination of maneuver reward shaping advice and ally-focused attack shaping advice achieving the best performance compared with that of the base-line strategy. 展开更多
关键词 deep reinforcement learning multi-agent reinforce-ment learning multi-agent combat unmanned battle reward shaping
在线阅读 下载PDF
Deep reinforcement learning guidance with impact time control
10
作者 LI Guofei LI Shituo +1 位作者 LI Bohao WU Yunjie 《Journal of Systems Engineering and Electronics》 CSCD 2024年第6期1594-1603,共10页
In consideration of the field-of-view(FOV)angle con-straint,this study focuses on the guidance problem with impact time control.A deep reinforcement learning guidance method is given for the missile to obtain the desi... In consideration of the field-of-view(FOV)angle con-straint,this study focuses on the guidance problem with impact time control.A deep reinforcement learning guidance method is given for the missile to obtain the desired impact time and meet the demand of FOV angle constraint.On basis of the framework of the proportional navigation guidance,an auxiliary control term is supplemented by the distributed deep deterministic policy gradient algorithm,in which the reward functions are developed to decrease the time-to-go error and improve the terminal guid-ance accuracy.The numerical simulation demonstrates that the missile governed by the presented deep reinforcement learning guidance law can hit the target successfully at appointed arrival time. 展开更多
关键词 impact time deep reinforcement learning guidance law field-of-view(FOV)angle deep deterministic policy gradient
在线阅读 下载PDF
倒立摆的Reinforcement Learning模糊自适应控制 被引量:1
11
作者 廉自生 孟巧荣 《太原理工大学学报》 CAS 北大核心 2005年第4期405-408,共4页
根据Lagrange方程建立了单级倒立摆系统的数学模型,利用模糊自适应控制算法设计了倒立摆系统的控制器,并在Matlab的仿真模块中将倒立摆系统的数学模型和控制器结合起来,对倒立摆控制系统进行了仿真研究。结果表明,对于要求实时性较高的... 根据Lagrange方程建立了单级倒立摆系统的数学模型,利用模糊自适应控制算法设计了倒立摆系统的控制器,并在Matlab的仿真模块中将倒立摆系统的数学模型和控制器结合起来,对倒立摆控制系统进行了仿真研究。结果表明,对于要求实时性较高的非线性不稳定系统,用模糊自适应控制算法可以按照控制要求在线调节控制参数,在最短的调整时间内取得良好的控制效果。 展开更多
关键词 单级倒立摆 reinforcement LEARNING 模糊自适应控制
在线阅读 下载PDF
Hierarchical reinforcement learning guidance with threat avoidance 被引量:1
12
作者 LI Bohao WU Yunjie LI Guofei 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2022年第5期1173-1185,共13页
The guidance strategy is an extremely critical factor in determining the striking effect of the missile operation.A novel guidance law is presented by exploiting the deep reinforcement learning(DRL)with the hierarchic... The guidance strategy is an extremely critical factor in determining the striking effect of the missile operation.A novel guidance law is presented by exploiting the deep reinforcement learning(DRL)with the hierarchical deep deterministic policy gradient(DDPG)algorithm.The reward functions are constructed to minimize the line-of-sight(LOS)angle rate and avoid the threat caused by the opposed obstacles.To attenuate the chattering of the acceleration,a hierarchical reinforcement learning structure and an improved reward function with action penalty are put forward.The simulation results validate that the missile under the proposed method can hit the target successfully and keep away from the threatened areas effectively. 展开更多
关键词 guidance law deep reinforcement learning(DRL) threat avoidance hierarchical reinforcement learning
在线阅读 下载PDF
Application of a combined supporting technology with U-shaped steel support and anchor-grouting to surrounding soft rock reinforcement in roadway 被引量:19
13
作者 王辉 郑朋强 +1 位作者 赵文娟 田洪铭 《Journal of Central South University》 SCIE EI CAS CSCD 2018年第5期1240-1250,共11页
Soft rock surrounding deep roadway has poor stability and long-term rheological effect. More and larger deformation problems of surrounding rock occur due to adverse supporting measures for such roadways, which not on... Soft rock surrounding deep roadway has poor stability and long-term rheological effect. More and larger deformation problems of surrounding rock occur due to adverse supporting measures for such roadways, which not only affects the engineering safety critically but also improves the maintenance costs. This paper takes the main rail roadway with severely deformation in China's Zaoquan coal mine as an example to study the long-term deformation tendency and damage zone by means of in-situ deformation monitoring and acoustic wave testing technique. A three-dimensional finite element model reflecting the engineering geological condition and initial design scheme is established by ABAQUS. Then, on the basis of field monitoring deformation data, the surrounding rock geotechnical and theological parameters of the roadway are obtained by back analysis. A combined supporting technology with U-shaped steel support and anchor-grouting is proposed for the surrounding soft rock. The numerical simulation of the combined supporting technology and in-situ deformation monitoring results show that the soft rock surrounding the roadway has been held effectively. 展开更多
关键词 soft rock roadway rheological effect supporting technology numerical simulation reinforcement
在线阅读 下载PDF
UAV cooperative air combat maneuver decision based on multi-agent reinforcement learning 被引量:25
14
作者 ZHANG Jiandong YANG Qiming +2 位作者 SHI Guoqing LU Yi WU Yong 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2021年第6期1421-1438,共18页
In order to improve the autonomous ability of unmanned aerial vehicles(UAV)to implement air combat mission,many artificial intelligence-based autonomous air combat maneuver decision-making studies have been carried ou... In order to improve the autonomous ability of unmanned aerial vehicles(UAV)to implement air combat mission,many artificial intelligence-based autonomous air combat maneuver decision-making studies have been carried out,but these studies are often aimed at individual decision-making in 1 v1 scenarios which rarely happen in actual air combat.Based on the research of the 1 v1 autonomous air combat maneuver decision,this paper builds a multi-UAV cooperative air combat maneuver decision model based on multi-agent reinforcement learning.Firstly,a bidirectional recurrent neural network(BRNN)is used to achieve communication between UAV individuals,and the multi-UAV cooperative air combat maneuver decision model under the actor-critic architecture is established.Secondly,through combining with target allocation and air combat situation assessment,the tactical goal of the formation is merged with the reinforcement learning goal of every UAV,and a cooperative tactical maneuver policy is generated.The simulation results prove that the multi-UAV cooperative air combat maneuver decision model established in this paper can obtain the cooperative maneuver policy through reinforcement learning,the cooperative maneuver policy can guide UAVs to obtain the overall situational advantage and defeat the opponents under tactical cooperation. 展开更多
关键词 DECISION-MAKING air combat maneuver cooperative air combat reinforcement learning recurrent neural network
在线阅读 下载PDF
A review of mobile robot motion planning methods:from classical motion planning workflows to reinforcement learning-based architectures 被引量:9
15
作者 DONG Lu HE Zichen +1 位作者 SONG Chunwei SUN Changyin 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2023年第2期439-459,共21页
Motion planning is critical to realize the autonomous operation of mobile robots.As the complexity and randomness of robot application scenarios increase,the planning capability of the classical hierarchical motion pl... Motion planning is critical to realize the autonomous operation of mobile robots.As the complexity and randomness of robot application scenarios increase,the planning capability of the classical hierarchical motion planners is challenged.With the development of machine learning,the deep reinforcement learning(DRL)-based motion planner has gradually become a research hotspot due to its several advantageous feature.The DRL-based motion planner is model-free and does not rely on the prior structured map.Most importantly,the DRL-based motion planner achieves the unification of the global planner and the local planner.In this paper,we provide a systematic review of various motion planning methods.Firstly,we summarize the representative and state-of-the-art works for each submodule of the classical motion planning architecture and analyze their performance features.Then,we concentrate on summarizing reinforcement learning(RL)-based motion planning approaches,including motion planners combined with RL improvements,map-free RL-based motion planners,and multi-robot cooperative planning methods.Finally,we analyze the urgent challenges faced by these mainstream RLbased motion planners in detail,review some state-of-the-art works for these issues,and propose suggestions for future research. 展开更多
关键词 mobile robot reinforcement learning(RL) motion planning multi-robot cooperative planning
在线阅读 下载PDF
Analytical approach and field monitoring for mechanical behaviors of pipe roof reinforcement 被引量:4
16
作者 王海涛 贾金青 康海贵 《Journal of Central South University》 SCIE EI CAS 2009年第5期827-834,共8页
Considering the delay effect of initial lining and revising the Winkler elastic foundation model,an analytical approach based on Pasternak elastic foundation beam theory for pipe roof reinforcement was put forward. Wi... Considering the delay effect of initial lining and revising the Winkler elastic foundation model,an analytical approach based on Pasternak elastic foundation beam theory for pipe roof reinforcement was put forward. With the example of a certain tunnel excavation,the comparison of the values of longitudinal strain of reinforcing pipe between field monitoring and analytical approach was made. The results indicate that Pasternak model,which considers a more realistic hypothesis in the elastic soil than Winkler model,gives more accurate calculation and agrees better with the result of field monitoring. The difference of calculation results between these two models is about 7%,and Pasternak model is proved to be a better way to study the reinforcement mechanism and improve design practice. The calculation results also reveal that the reinforcing pipes act as levers,which increases longitudinal load transfer to an unexcavated area,and consequently decreases deformation and increases face stability. 展开更多
关键词 tunnel heading pipe roof reinforcement Pasternak elastic foundation beam field monitoring
在线阅读 下载PDF
A guidance method for coplanar orbital interception based on reinforcement learning 被引量:6
17
作者 ZENG Xin ZHU Yanwei +1 位作者 YANG Leping ZHANG Chengming 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2021年第4期927-938,共12页
This paper investigates the guidance method based on reinforcement learning(RL)for the coplanar orbital interception in a continuous low-thrust scenario.The problem is formulated into a Markov decision process(MDP)mod... This paper investigates the guidance method based on reinforcement learning(RL)for the coplanar orbital interception in a continuous low-thrust scenario.The problem is formulated into a Markov decision process(MDP)model,then a welldesigned RL algorithm,experience based deep deterministic policy gradient(EBDDPG),is proposed to solve it.By taking the advantage of prior information generated through the optimal control model,the proposed algorithm not only resolves the convergence problem of the common RL algorithm,but also successfully trains an efficient deep neural network(DNN)controller for the chaser spacecraft to generate the control sequence.Numerical simulation results show that the proposed algorithm is feasible and the trained DNN controller significantly improves the efficiency over traditional optimization methods by roughly two orders of magnitude. 展开更多
关键词 orbital interception reinforcement learning(RL) Markov decision process(MDP) deep neural network(DNN)
在线阅读 下载PDF
Task assignment in ground-to-air confrontation based on multiagent deep reinforcement learning 被引量:4
18
作者 Jia-yi Liu Gang Wang +2 位作者 Qiang Fu Shao-hua Yue Si-yuan Wang 《Defence Technology(防务技术)》 SCIE EI CAS CSCD 2023年第1期210-219,共10页
The scale of ground-to-air confrontation task assignments is large and needs to deal with many concurrent task assignments and random events.Aiming at the problems where existing task assignment methods are applied to... The scale of ground-to-air confrontation task assignments is large and needs to deal with many concurrent task assignments and random events.Aiming at the problems where existing task assignment methods are applied to ground-to-air confrontation,there is low efficiency in dealing with complex tasks,and there are interactive conflicts in multiagent systems.This study proposes a multiagent architecture based on a one-general agent with multiple narrow agents(OGMN)to reduce task assignment conflicts.Considering the slow speed of traditional dynamic task assignment algorithms,this paper proposes the proximal policy optimization for task assignment of general and narrow agents(PPOTAGNA)algorithm.The algorithm based on the idea of the optimal assignment strategy algorithm and combined with the training framework of deep reinforcement learning(DRL)adds a multihead attention mechanism and a stage reward mechanism to the bilateral band clipping PPO algorithm to solve the problem of low training efficiency.Finally,simulation experiments are carried out in the digital battlefield.The multiagent architecture based on OGMN combined with the PPO-TAGNA algorithm can obtain higher rewards faster and has a higher win ratio.By analyzing agent behavior,the efficiency,superiority and rationality of resource utilization of this method are verified. 展开更多
关键词 Ground-to-air confrontation Task assignment General and narrow agents Deep reinforcement learning Proximal policy optimization(PPO)
在线阅读 下载PDF
A single-task and multi-decision evolutionary game model based on multi-agent reinforcement learning 被引量:4
19
作者 MA Ye CHANG Tianqing FAN Wenhui 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2021年第3期642-657,共16页
In the evolutionary game of the same task for groups,the changes in game rules,personal interests,the crowd size,and external supervision cause uncertain effects on individual decision-making and game results.In the M... In the evolutionary game of the same task for groups,the changes in game rules,personal interests,the crowd size,and external supervision cause uncertain effects on individual decision-making and game results.In the Markov decision framework,a single-task multi-decision evolutionary game model based on multi-agent reinforcement learning is proposed to explore the evolutionary rules in the process of a game.The model can improve the result of a evolutionary game and facilitate the completion of the task.First,based on the multi-agent theory,to solve the existing problems in the original model,a negative feedback tax penalty mechanism is proposed to guide the strategy selection of individuals in the group.In addition,in order to evaluate the evolutionary game results of the group in the model,a calculation method of the group intelligence level is defined.Secondly,the Q-learning algorithm is used to improve the guiding effect of the negative feedback tax penalty mechanism.In the model,the selection strategy of the Q-learning algorithm is improved and a bounded rationality evolutionary game strategy is proposed based on the rule of evolutionary games and the consideration of the bounded rationality of individuals.Finally,simulation results show that the proposed model can effectively guide individuals to choose cooperation strategies which are beneficial to task completion and stability under different negative feedback factor values and different group sizes,so as to improve the group intelligence level. 展开更多
关键词 MULTI-AGENT reinforcement learning evolutionary game Q-LEARNING
在线阅读 下载PDF
Knowledge transfer in multi-agent reinforcement learning with incremental number of agents 被引量:4
20
作者 LIU Wenzhang DONG Lu +1 位作者 LIU Jian SUN Changyin 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2022年第2期447-460,共14页
In this paper, the reinforcement learning method for cooperative multi-agent systems(MAS) with incremental number of agents is studied. The existing multi-agent reinforcement learning approaches deal with the MAS with... In this paper, the reinforcement learning method for cooperative multi-agent systems(MAS) with incremental number of agents is studied. The existing multi-agent reinforcement learning approaches deal with the MAS with a specific number of agents, and can learn well-performed policies. However, if there is an increasing number of agents, the previously learned in may not perform well in the current scenario. The new agents need to learn from scratch to find optimal policies with others,which may slow down the learning speed of the whole team. To solve that problem, in this paper, we propose a new algorithm to take full advantage of the historical knowledge which was learned before, and transfer it from the previous agents to the new agents. Since the previous agents have been trained well in the source environment, they are treated as teacher agents in the target environment. Correspondingly, the new agents are called student agents. To enable the student agents to learn from the teacher agents, we first modify the input nodes of the networks for teacher agents to adapt to the current environment. Then, the teacher agents take the observations of the student agents as input, and output the advised actions and values as supervising information. Finally, the student agents combine the reward from the environment and the supervising information from the teacher agents, and learn the optimal policies with modified loss functions. By taking full advantage of the knowledge of teacher agents, the search space for the student agents will be reduced significantly, which can accelerate the learning speed of the holistic system. The proposed algorithm is verified in some multi-agent simulation environments, and its efficiency has been demonstrated by the experiment results. 展开更多
关键词 knowledge transfer multi-agent reinforcement learning(MARL) new agents
在线阅读 下载PDF
上一页 1 2 8 下一页 到第
使用帮助 返回顶部