Low Earth orbit(LEO)satellite networks exhibit distinct characteristics,e.g.,limited resources of individual satellite nodes and dynamic network topology,which have brought many challenges for routing algorithms.To sa...Low Earth orbit(LEO)satellite networks exhibit distinct characteristics,e.g.,limited resources of individual satellite nodes and dynamic network topology,which have brought many challenges for routing algorithms.To satisfy quality of service(QoS)requirements of various users,it is critical to research efficient routing strategies to fully utilize satellite resources.This paper proposes a multi-QoS information optimized routing algorithm based on reinforcement learning for LEO satellite networks,which guarantees high level assurance demand services to be prioritized under limited satellite resources while considering the load balancing performance of the satellite networks for low level assurance demand services to ensure the full and effective utilization of satellite resources.An auxiliary path search algorithm is proposed to accelerate the convergence of satellite routing algorithm.Simulation results show that the generated routing strategy can timely process and fully meet the QoS demands of high assurance services while effectively improving the load balancing performance of the link.展开更多
The weak interface interaction and solid-solid phase transition have long been a conundrum for 1,3,5,7-tetranitro-1,3,5,7-tetraazacyclooctane(HMX)-based polymer-bonded explosives(PBX).A two-step strategy that involves...The weak interface interaction and solid-solid phase transition have long been a conundrum for 1,3,5,7-tetranitro-1,3,5,7-tetraazacyclooctane(HMX)-based polymer-bonded explosives(PBX).A two-step strategy that involves the pretreatment of HMX to endow—OH groups on the surface via polyalcohol bonding agent modification and in situ coating with nitrate ester-containing polymer,was proposed to address the problem.Two types of energetic polyether—glycidyl azide polymer(GAP)and nitrate modified GAP(GNP)were grafted onto HMX crystal based on isocyanate addition reaction bridged through neutral polymeric bonding agent(NPBA)layer.The morphology and structure of the HMX-based composites were characterized in detail and the core-shell structure was validated.The grafted polymers obviously enhanced the adhesion force between HMX crystals and fluoropolymer(F2314)binder.Due to the interfacial reinforcement among the components,the two HMX-based composites exhibited a remarkable increment of phase transition peak temperature by 10.2°C and 19.6°C with no more than 1.5%shell content,respectively.Furthermore,the impact and friction sensitivity of the composites decreased significantly as a result of the barrier produced by the grafted polymers.These findings will enhance the future prospects for the interface design of energetic composites aiming to solve the weak interface and safety concerns.展开更多
This paper mainly focuses on the development of a learning-based controller for a class of uncertain mechanical systems modeled by the Euler-Lagrange formulation.The considered system can depict the behavior of a larg...This paper mainly focuses on the development of a learning-based controller for a class of uncertain mechanical systems modeled by the Euler-Lagrange formulation.The considered system can depict the behavior of a large class of engineering systems,such as vehicular systems,robot manipulators and satellites.All these systems are often characterized by highly nonlinear characteristics,heavy modeling uncertainties and unknown perturbations,therefore,accurate-model-based nonlinear control approaches become unavailable.Motivated by the challenge,a reinforcement learning(RL)adaptive control methodology based on the actor-critic framework is investigated to compensate the uncertain mechanical dynamics.The approximation inaccuracies caused by RL and the exogenous unknown disturbances are circumvented via a continuous robust integral of the sign of the error(RISE)control approach.Different from a classical RISE control law,a tanh(·)function is utilized instead of a sign(·)function to acquire a more smooth control signal.The developed controller requires very little prior knowledge of the dynamic model,is robust to unknown dynamics and exogenous disturbances,and can achieve asymptotic output tracking.Eventually,co-simulations through ADAMS and MATLAB/Simulink on a three degrees-of-freedom(3-DOF)manipulator and experiments on a real-time electromechanical servo system are performed to verify the performance of the proposed approach.展开更多
To solve the problem of the low interference success rate of air defense missile radio fuzes due to the unified interference form of the traditional fuze interference system,an interference decision method based Q-lea...To solve the problem of the low interference success rate of air defense missile radio fuzes due to the unified interference form of the traditional fuze interference system,an interference decision method based Q-learning algorithm is proposed.First,dividing the distance between the missile and the target into multiple states to increase the quantity of state spaces.Second,a multidimensional motion space is utilized,and the search range of which changes with the distance of the projectile,to select parameters and minimize the amount of ineffective interference parameters.The interference effect is determined by detecting whether the fuze signal disappears.Finally,a weighted reward function is used to determine the reward value based on the range state,output power,and parameter quantity information of the interference form.The effectiveness of the proposed method in selecting the range of motion space parameters and designing the discrimination degree of the reward function has been verified through offline experiments involving full-range missile rendezvous.The optimal interference form for each distance state has been obtained.Compared with the single-interference decision method,the proposed decision method can effectively improve the success rate of interference.展开更多
This work proposes a recorded recurrent twin delayed deep deterministic(RRTD3)policy gradient algorithm to solve the challenge of constructing guidance laws for intercepting endoatmospheric maneuvering missiles with u...This work proposes a recorded recurrent twin delayed deep deterministic(RRTD3)policy gradient algorithm to solve the challenge of constructing guidance laws for intercepting endoatmospheric maneuvering missiles with uncertainties and observation noise.The attack-defense engagement scenario is modeled as a partially observable Markov decision process(POMDP).Given the benefits of recurrent neural networks(RNNs)in processing sequence information,an RNN layer is incorporated into the agent’s policy network to alleviate the bottleneck of traditional deep reinforcement learning methods while dealing with POMDPs.The measurements from the interceptor’s seeker during each guidance cycle are combined into one sequence as the input to the policy network since the detection frequency of an interceptor is usually higher than its guidance frequency.During training,the hidden states of the RNN layer in the policy network are recorded to overcome the partially observable problem that this RNN layer causes inside the agent.The training curves show that the proposed RRTD3 successfully enhances data efficiency,training speed,and training stability.The test results confirm the advantages of the RRTD3-based guidance laws over some conventional guidance laws.展开更多
Autonomous umanned aerial vehicle(UAV) manipulation is necessary for the defense department to execute tactical missions given by commanders in the future unmanned battlefield. A large amount of research has been devo...Autonomous umanned aerial vehicle(UAV) manipulation is necessary for the defense department to execute tactical missions given by commanders in the future unmanned battlefield. A large amount of research has been devoted to improving the autonomous decision-making ability of UAV in an interactive environment, where finding the optimal maneuvering decisionmaking policy became one of the key issues for enabling the intelligence of UAV. In this paper, we propose a maneuvering decision-making algorithm for autonomous air-delivery based on deep reinforcement learning under the guidance of expert experience. Specifically, we refine the guidance towards area and guidance towards specific point tasks for the air-delivery process based on the traditional air-to-surface fire control methods.Moreover, we construct the UAV maneuvering decision-making model based on Markov decision processes(MDPs). Specifically, we present a reward shaping method for the guidance towards area and guidance towards specific point tasks using potential-based function and expert-guided advice. The proposed algorithm could accelerate the convergence of the maneuvering decision-making policy and increase the stability of the policy in terms of the output during the later stage of training process. The effectiveness of the proposed maneuvering decision-making policy is illustrated by the curves of training parameters and extensive experimental results for testing the trained policy.展开更多
Future unmanned battles desperately require intelli-gent combat policies,and multi-agent reinforcement learning offers a promising solution.However,due to the complexity of combat operations and large size of the comb...Future unmanned battles desperately require intelli-gent combat policies,and multi-agent reinforcement learning offers a promising solution.However,due to the complexity of combat operations and large size of the combat group,this task suffers from credit assignment problem more than other rein-forcement learning tasks.This study uses reward shaping to relieve the credit assignment problem and improve policy train-ing for the new generation of large-scale unmanned combat operations.We first prove that multiple reward shaping func-tions would not change the Nash Equilibrium in stochastic games,providing theoretical support for their use.According to the characteristics of combat operations,we propose tactical reward shaping(TRS)that comprises maneuver shaping advice and threat assessment-based attack shaping advice.Then,we investigate the effects of different types and combinations of shaping advice on combat policies through experiments.The results show that TRS improves both the efficiency and attack accuracy of combat policies,with the combination of maneuver reward shaping advice and ally-focused attack shaping advice achieving the best performance compared with that of the base-line strategy.展开更多
In consideration of the field-of-view(FOV)angle con-straint,this study focuses on the guidance problem with impact time control.A deep reinforcement learning guidance method is given for the missile to obtain the desi...In consideration of the field-of-view(FOV)angle con-straint,this study focuses on the guidance problem with impact time control.A deep reinforcement learning guidance method is given for the missile to obtain the desired impact time and meet the demand of FOV angle constraint.On basis of the framework of the proportional navigation guidance,an auxiliary control term is supplemented by the distributed deep deterministic policy gradient algorithm,in which the reward functions are developed to decrease the time-to-go error and improve the terminal guid-ance accuracy.The numerical simulation demonstrates that the missile governed by the presented deep reinforcement learning guidance law can hit the target successfully at appointed arrival time.展开更多
Missile interception problem can be regarded as a two-person zero-sum differential games problem,which depends on the solution of Hamilton-Jacobi-Isaacs(HJI)equa-tion.It has been proved impossible to obtain a closed-f...Missile interception problem can be regarded as a two-person zero-sum differential games problem,which depends on the solution of Hamilton-Jacobi-Isaacs(HJI)equa-tion.It has been proved impossible to obtain a closed-form solu-tion due to the nonlinearity of HJI equation,and many iterative algorithms are proposed to solve the HJI equation.Simultane-ous policy updating algorithm(SPUA)is an effective algorithm for solving HJI equation,but it is an on-policy integral reinforce-ment learning(IRL).For online implementation of SPUA,the dis-turbance signals need to be adjustable,which is unrealistic.In this paper,an off-policy IRL algorithm based on SPUA is pro-posed without making use of any knowledge of the systems dynamics.Then,a neural-network based online adaptive critic implementation scheme of the off-policy IRL algorithm is pre-sented.Based on the online off-policy IRL method,a computa-tional intelligence interception guidance(CIIG)law is developed for intercepting high-maneuvering target.As a model-free method,intercepting targets can be achieved through measur-ing system data online.The effectiveness of the CIIG is verified through two missile and target engagement scenarios.展开更多
Metal-organic framework(MOF)nanostructures have emerged as a prominent class of materials in the advancement of electrochemical sensors.The rational design of bimetallic MOF-functionalized microelectrode is of importa...Metal-organic framework(MOF)nanostructures have emerged as a prominent class of materials in the advancement of electrochemical sensors.The rational design of bimetallic MOF-functionalized microelectrode is of importance for improv-ing the electrochemical performance but still in great challenge.In this work,the bimetallic FeCo-MOF nanostructures were assembled onto a gold disk ultramicroelectrode(Au UME,5.2μm in diameter)via an in-situ electrodeposition method,which enhanced the sensitive detection of epinephrine(EP).The in-situ electrodeposited FeCo-MOF exhibited a character-istic nanoflower-like morphology and was uniformly dispersed on the Au UME.The FeCo-MOF/Au UME demonstrated excellent electrochemical performance on the detection of EP with a high sensitivity of 36.93μA·μmol^(-1)·L·cm^(-2)and a low detection limit of 1.28μmol·L^(-1).It can be attributed to the nonlinear diffusion of EP onto the ultra-micro working substrate,coupled with synergistical catalytic activity of the bimetallic Fe,Co within MOF structure.Furthermore,the FeCo-MOF/Au UME has been successful applied to the analysis of EP in human serum samples,yielding high recovery rates.These results not only contribute to the expansion of the research area of electrochemical sensors,but also provide novel insights and directions into the development of high-performance MOF-based electrochemical sensors.展开更多
High-performance graphite materials have important roles in aerospace and nuclear reactor technologies because of their outstanding chemical stability and high-temperature performance.Their traditional production meth...High-performance graphite materials have important roles in aerospace and nuclear reactor technologies because of their outstanding chemical stability and high-temperature performance.Their traditional production method relies on repeated impregnation-carbonization and graphitization,and is plagued by lengthy preparation cycles and high energy consumption.Phase transition-assisted self-pressurized selfsintering technology can rapidly produce high-strength graphite materials,but the fracture strain of the graphite materials produced is poor.To solve this problem,this study used a two-step sintering method to uniformly introduce micro-nano pores into natural graphite-based bulk graphite,achieving improved fracture strain of the samples without reducing their density and mechanical properties.Using natural graphite powder,micron-diamond,and nano-diamond as raw materials,and by precisely controlling the staged pressure release process,the degree of diamond phase transition expansion was effectively regulated.The strain-to-failure of the graphite samples reached 1.2%,a 35%increase compared to samples produced by fullpressure sintering.Meanwhile,their flexural strength exceeded 110 MPa,and their density was over 1.9 g/cm^(3).The process therefore produced both a high strength and a high fracture strain.The interface evolution and toughening mechanism during the two-step sintering process were investigated.It is believed that the micro-nano pores formed have two roles:as stress concentrators they induce yielding by shear and as multi-crack propagation paths they significantly lengthen the crack propagation path.The two-step sintering phase transition strategy introduces pores and provides a new approach for increasing the fracture strain of brittle materials.展开更多
In this paper,a type of reinforcing structure for composite shell with single and through hole is presented.The experimental tests for the composite shells without hole,with single hole and reinforced structure,with t...In this paper,a type of reinforcing structure for composite shell with single and through hole is presented.The experimental tests for the composite shells without hole,with single hole and reinforced structure,with through hole and reinforced structure subjected to hydrostatic pressure were carried out by the designed experimental test system.The mechanical responses of the composite shells under hydrostatic pressure are obtained by the high-speed camera and strain measurement.The results show that the entire deformation process of the shell can be divided into three:uniform compression,"buckling mode formation"and buckling.The"buckling mode formation"process is captured and reported for the first time.For the composite shell with single hole,the proposed reinforcing structure has a significant reinforcement effect on the shell and the buckling capacity of the shell is not weaker than the complete composite shell.For the composite shell with through hole,sealing effect can be achieved by the proposed reinforcing structure,but the buckling capacity of the shell after reinforcement can only reach 77%of the original buckling capacity.展开更多
This research addresses the growing demand for high-performance protective materials against high-velocity projectile impacts.The performance of multi-layered steel fiber-reinforced mortar(SFRM)panels with varying thi...This research addresses the growing demand for high-performance protective materials against high-velocity projectile impacts.The performance of multi-layered steel fiber-reinforced mortar(SFRM)panels with varying thicknesses and air gaps,was experimentally investigated under single and repeated impacts of 7.62×51 mm bullets fired from a distance of 50 m.The impact events were recorded using a high-speed camera at 40000 fps.Panel performance was assessed in terms of failure modes,kinetic energy absorption,spalling diameter,and percentage of back-face damage area,and weight loss.Results showed that panel configuration significantly influenced performance.Panel P10,with 70 mm SFRM thickness and 20 mm air gaps,provided the highest resistance,dissipating 5223 J of kinetic energy and preventing back-face damage.In contrast,P7,which absorbed 4476 J,presented a back damage area percentage of 8.93%after three impacts.Weight loss analysis further confirmed durability improvements,with P10 showing only 1.53%cumulative loss compared to 3.26%in P7.The inclusion of wider air gaps enhanced energy dissipation and reduced damage.Comparison between single and repeated impacts demonstrated the sustained resistance of high-performance panels,with P10 maintaining minimal degradation across three consecutive impacts.These findings highlight the potential of multi-layer SFRM panels to enhance ballistic resistance,making them suitable for military,security,and civilian protective applications requiring long-term durability.展开更多
The development of guidance technology has made it possible for the earth penetration weapons(EPWs)to impact the target repeatedly at a close range. To investigative the damage of single and sequential strike induced ...The development of guidance technology has made it possible for the earth penetration weapons(EPWs)to impact the target repeatedly at a close range. To investigative the damage of single and sequential strike induced by the EPWs, experimental and numerical investigations are carried out in this paper.Firstly, a series of sequential explosion tests are conducted to provide the basic data of the crater size.Then, a numerical model is established to simulate the damage effects of sequential explosions using the meshfree method of Smoothed particle Galerkin. The effectiveness of numerical model is verified by comparison with the experimental results. Finally, based on dimensional analysis, several empirical formulas for describing the crater size are presented, including the conical crater diameter and the conical crater depth of the single explosion, the conical crater area and the joint depth of the secondary explosion. The formula for the single explosion expresses the relationship between the aspect ratio of the charge ranging from 3 to 7, the dimensionless buried depth ranging from 2 to 14 and the crater size. The formula for the secondary explosion expresses the relationship between the relative position of the two explosions and the crater size. All of data can provide reference for the design of protective structures.展开更多
With the change of the main influencing factors such as structural configuration and impact conditions,reinforced concrete slabs exhibit different mechanical behaviors with different failure patterns,and the failure m...With the change of the main influencing factors such as structural configuration and impact conditions,reinforced concrete slabs exhibit different mechanical behaviors with different failure patterns,and the failure modes are transformed.In order to reveal the failure mode and transformation rule of reinforced concrete slabs under impact loads,a dynamic impact response test was carried out using a drop hammer test device.The dynamic data pertaining to the impact force,support reaction force,structural displacement,and reinforcement strain were obtained through the use of digital image correlation technology(DIC),impact force measurement,and strain measurement.The analysis of the ultimate damage state of the reinforced concrete slab identified four distinct types of impact failure modes:local failure by stamping,overall failure by stamping,local-overall coupling failure,and local failure by punching.Additionally,the influence laws of hammerhead shape,hammer height,and reinforcement ratio on the dynamic response and failure mode transformation of the slab were revealed.The results indicate that:(1)The local damage to the slab by the plane hammer is readily apparent,while the overall damage by the spherical hammer is more pronounced.(2)In comparison to the high reinforcement ratio slabs,the overall bending resistance of the low reinforcement ratio slabs is significantly inferior,and the slab back exhibits further cracks.(3)As the hammer height increases,the slab failure mode undergoes a transformation,shifting from local failure by stamping and overall failure by stamping to local-overall coupling failure and local failure by punching.(4)Three failure mode thresholds have been established,and by comparing the peak impact force with the failure thresholds,the failure mode of the slab can be effectively determined.展开更多
Soft rock surrounding deep roadway has poor stability and long-term rheological effect. More and larger deformation problems of surrounding rock occur due to adverse supporting measures for such roadways, which not on...Soft rock surrounding deep roadway has poor stability and long-term rheological effect. More and larger deformation problems of surrounding rock occur due to adverse supporting measures for such roadways, which not only affects the engineering safety critically but also improves the maintenance costs. This paper takes the main rail roadway with severely deformation in China's Zaoquan coal mine as an example to study the long-term deformation tendency and damage zone by means of in-situ deformation monitoring and acoustic wave testing technique. A three-dimensional finite element model reflecting the engineering geological condition and initial design scheme is established by ABAQUS. Then, on the basis of field monitoring deformation data, the surrounding rock geotechnical and theological parameters of the roadway are obtained by back analysis. A combined supporting technology with U-shaped steel support and anchor-grouting is proposed for the surrounding soft rock. The numerical simulation of the combined supporting technology and in-situ deformation monitoring results show that the soft rock surrounding the roadway has been held effectively.展开更多
In order to improve the autonomous ability of unmanned aerial vehicles(UAV)to implement air combat mission,many artificial intelligence-based autonomous air combat maneuver decision-making studies have been carried ou...In order to improve the autonomous ability of unmanned aerial vehicles(UAV)to implement air combat mission,many artificial intelligence-based autonomous air combat maneuver decision-making studies have been carried out,but these studies are often aimed at individual decision-making in 1 v1 scenarios which rarely happen in actual air combat.Based on the research of the 1 v1 autonomous air combat maneuver decision,this paper builds a multi-UAV cooperative air combat maneuver decision model based on multi-agent reinforcement learning.Firstly,a bidirectional recurrent neural network(BRNN)is used to achieve communication between UAV individuals,and the multi-UAV cooperative air combat maneuver decision model under the actor-critic architecture is established.Secondly,through combining with target allocation and air combat situation assessment,the tactical goal of the formation is merged with the reinforcement learning goal of every UAV,and a cooperative tactical maneuver policy is generated.The simulation results prove that the multi-UAV cooperative air combat maneuver decision model established in this paper can obtain the cooperative maneuver policy through reinforcement learning,the cooperative maneuver policy can guide UAVs to obtain the overall situational advantage and defeat the opponents under tactical cooperation.展开更多
The full-range behavior of partially bonded, together with partially prestressed concrete beams containing fiber reinforced polymer (FRP) tendons and stainless steel reinforcing bars was simulated using a simplified...The full-range behavior of partially bonded, together with partially prestressed concrete beams containing fiber reinforced polymer (FRP) tendons and stainless steel reinforcing bars was simulated using a simplified theoretical model. The model assumes that a section in the beam has a trilinear moment--curvature relationship characterized by three particular points, initial cracking of concrete, yielding of non-prestressed steel, and crushing of concrete or rupturing of prestressing tendons. Predictions from the model were compared with the limited available test data, and a reasonable agreement was obtained. A detailed parametric study of the behavior of the prestressed concrete beams with hybrid FRP and stainless steel reinforcements was conducted. It can be concluded that the deformability of the beam can be enhanced by increasing the ultimate compressive strain of concrete, unhonded length of tendon, percentage of compressive reinforcement and partial prestress ratio, and decreasing the effective prestress in tendons, and increasing in ultimate compressive strain of concrete is the most efficient one. The deformability of the beam is almost directly proportional to the concrete ultimate strain provided the failure mode is concrete crushing, even though the concrete ultimate strain has less influence on the load-carrying capacity.展开更多
In the evolutionary game of the same task for groups,the changes in game rules,personal interests,the crowd size,and external supervision cause uncertain effects on individual decision-making and game results.In the M...In the evolutionary game of the same task for groups,the changes in game rules,personal interests,the crowd size,and external supervision cause uncertain effects on individual decision-making and game results.In the Markov decision framework,a single-task multi-decision evolutionary game model based on multi-agent reinforcement learning is proposed to explore the evolutionary rules in the process of a game.The model can improve the result of a evolutionary game and facilitate the completion of the task.First,based on the multi-agent theory,to solve the existing problems in the original model,a negative feedback tax penalty mechanism is proposed to guide the strategy selection of individuals in the group.In addition,in order to evaluate the evolutionary game results of the group in the model,a calculation method of the group intelligence level is defined.Secondly,the Q-learning algorithm is used to improve the guiding effect of the negative feedback tax penalty mechanism.In the model,the selection strategy of the Q-learning algorithm is improved and a bounded rationality evolutionary game strategy is proposed based on the rule of evolutionary games and the consideration of the bounded rationality of individuals.Finally,simulation results show that the proposed model can effectively guide individuals to choose cooperation strategies which are beneficial to task completion and stability under different negative feedback factor values and different group sizes,so as to improve the group intelligence level.展开更多
基金National Key Research and Development Program(2021YFB2900604)。
文摘Low Earth orbit(LEO)satellite networks exhibit distinct characteristics,e.g.,limited resources of individual satellite nodes and dynamic network topology,which have brought many challenges for routing algorithms.To satisfy quality of service(QoS)requirements of various users,it is critical to research efficient routing strategies to fully utilize satellite resources.This paper proposes a multi-QoS information optimized routing algorithm based on reinforcement learning for LEO satellite networks,which guarantees high level assurance demand services to be prioritized under limited satellite resources while considering the load balancing performance of the satellite networks for low level assurance demand services to ensure the full and effective utilization of satellite resources.An auxiliary path search algorithm is proposed to accelerate the convergence of satellite routing algorithm.Simulation results show that the generated routing strategy can timely process and fully meet the QoS demands of high assurance services while effectively improving the load balancing performance of the link.
基金the support for this work by National Natural Science Foundation of China(Grant Nos.22175139 and 22105156)。
文摘The weak interface interaction and solid-solid phase transition have long been a conundrum for 1,3,5,7-tetranitro-1,3,5,7-tetraazacyclooctane(HMX)-based polymer-bonded explosives(PBX).A two-step strategy that involves the pretreatment of HMX to endow—OH groups on the surface via polyalcohol bonding agent modification and in situ coating with nitrate ester-containing polymer,was proposed to address the problem.Two types of energetic polyether—glycidyl azide polymer(GAP)and nitrate modified GAP(GNP)were grafted onto HMX crystal based on isocyanate addition reaction bridged through neutral polymeric bonding agent(NPBA)layer.The morphology and structure of the HMX-based composites were characterized in detail and the core-shell structure was validated.The grafted polymers obviously enhanced the adhesion force between HMX crystals and fluoropolymer(F2314)binder.Due to the interfacial reinforcement among the components,the two HMX-based composites exhibited a remarkable increment of phase transition peak temperature by 10.2°C and 19.6°C with no more than 1.5%shell content,respectively.Furthermore,the impact and friction sensitivity of the composites decreased significantly as a result of the barrier produced by the grafted polymers.These findings will enhance the future prospects for the interface design of energetic composites aiming to solve the weak interface and safety concerns.
基金supported in part by the National Key R&D Program of China under Grant 2021YFB2011300the National Natural Science Foundation of China under Grant 52075262。
文摘This paper mainly focuses on the development of a learning-based controller for a class of uncertain mechanical systems modeled by the Euler-Lagrange formulation.The considered system can depict the behavior of a large class of engineering systems,such as vehicular systems,robot manipulators and satellites.All these systems are often characterized by highly nonlinear characteristics,heavy modeling uncertainties and unknown perturbations,therefore,accurate-model-based nonlinear control approaches become unavailable.Motivated by the challenge,a reinforcement learning(RL)adaptive control methodology based on the actor-critic framework is investigated to compensate the uncertain mechanical dynamics.The approximation inaccuracies caused by RL and the exogenous unknown disturbances are circumvented via a continuous robust integral of the sign of the error(RISE)control approach.Different from a classical RISE control law,a tanh(·)function is utilized instead of a sign(·)function to acquire a more smooth control signal.The developed controller requires very little prior knowledge of the dynamic model,is robust to unknown dynamics and exogenous disturbances,and can achieve asymptotic output tracking.Eventually,co-simulations through ADAMS and MATLAB/Simulink on a three degrees-of-freedom(3-DOF)manipulator and experiments on a real-time electromechanical servo system are performed to verify the performance of the proposed approach.
基金National Natural Science Foundation of China(61973037)National 173 Program Project(2019-JCJQ-ZD-324).
文摘To solve the problem of the low interference success rate of air defense missile radio fuzes due to the unified interference form of the traditional fuze interference system,an interference decision method based Q-learning algorithm is proposed.First,dividing the distance between the missile and the target into multiple states to increase the quantity of state spaces.Second,a multidimensional motion space is utilized,and the search range of which changes with the distance of the projectile,to select parameters and minimize the amount of ineffective interference parameters.The interference effect is determined by detecting whether the fuze signal disappears.Finally,a weighted reward function is used to determine the reward value based on the range state,output power,and parameter quantity information of the interference form.The effectiveness of the proposed method in selecting the range of motion space parameters and designing the discrimination degree of the reward function has been verified through offline experiments involving full-range missile rendezvous.The optimal interference form for each distance state has been obtained.Compared with the single-interference decision method,the proposed decision method can effectively improve the success rate of interference.
基金supported by the National Natural Science Foundation of China(Grant No.12072090)。
文摘This work proposes a recorded recurrent twin delayed deep deterministic(RRTD3)policy gradient algorithm to solve the challenge of constructing guidance laws for intercepting endoatmospheric maneuvering missiles with uncertainties and observation noise.The attack-defense engagement scenario is modeled as a partially observable Markov decision process(POMDP).Given the benefits of recurrent neural networks(RNNs)in processing sequence information,an RNN layer is incorporated into the agent’s policy network to alleviate the bottleneck of traditional deep reinforcement learning methods while dealing with POMDPs.The measurements from the interceptor’s seeker during each guidance cycle are combined into one sequence as the input to the policy network since the detection frequency of an interceptor is usually higher than its guidance frequency.During training,the hidden states of the RNN layer in the policy network are recorded to overcome the partially observable problem that this RNN layer causes inside the agent.The training curves show that the proposed RRTD3 successfully enhances data efficiency,training speed,and training stability.The test results confirm the advantages of the RRTD3-based guidance laws over some conventional guidance laws.
基金supported by the Key Research and Development Program of Shaanxi (2022GXLH-02-09)the Aeronautical Science Foundation of China (20200051053001)the Natural Science Basic Research Program of Shaanxi (2020JM-147)。
文摘Autonomous umanned aerial vehicle(UAV) manipulation is necessary for the defense department to execute tactical missions given by commanders in the future unmanned battlefield. A large amount of research has been devoted to improving the autonomous decision-making ability of UAV in an interactive environment, where finding the optimal maneuvering decisionmaking policy became one of the key issues for enabling the intelligence of UAV. In this paper, we propose a maneuvering decision-making algorithm for autonomous air-delivery based on deep reinforcement learning under the guidance of expert experience. Specifically, we refine the guidance towards area and guidance towards specific point tasks for the air-delivery process based on the traditional air-to-surface fire control methods.Moreover, we construct the UAV maneuvering decision-making model based on Markov decision processes(MDPs). Specifically, we present a reward shaping method for the guidance towards area and guidance towards specific point tasks using potential-based function and expert-guided advice. The proposed algorithm could accelerate the convergence of the maneuvering decision-making policy and increase the stability of the policy in terms of the output during the later stage of training process. The effectiveness of the proposed maneuvering decision-making policy is illustrated by the curves of training parameters and extensive experimental results for testing the trained policy.
文摘Future unmanned battles desperately require intelli-gent combat policies,and multi-agent reinforcement learning offers a promising solution.However,due to the complexity of combat operations and large size of the combat group,this task suffers from credit assignment problem more than other rein-forcement learning tasks.This study uses reward shaping to relieve the credit assignment problem and improve policy train-ing for the new generation of large-scale unmanned combat operations.We first prove that multiple reward shaping func-tions would not change the Nash Equilibrium in stochastic games,providing theoretical support for their use.According to the characteristics of combat operations,we propose tactical reward shaping(TRS)that comprises maneuver shaping advice and threat assessment-based attack shaping advice.Then,we investigate the effects of different types and combinations of shaping advice on combat policies through experiments.The results show that TRS improves both the efficiency and attack accuracy of combat policies,with the combination of maneuver reward shaping advice and ally-focused attack shaping advice achieving the best performance compared with that of the base-line strategy.
基金supported by the National Natural Science Foundation of China(62003021,62373304)Industry-University-Research Innovation Fund for Chinese Universities(2021ZYA02009)+2 种基金Shaanxi Qinchuangyuan High-level Innovation and Entrepreneurship Talent Project(OCYRCXM-2022-136)Shaanxi Association for Science and Technology Youth Talent Support Program(XXJS202218)the Fundamental Research Funds for the Central Universities(D5000210830).
文摘In consideration of the field-of-view(FOV)angle con-straint,this study focuses on the guidance problem with impact time control.A deep reinforcement learning guidance method is given for the missile to obtain the desired impact time and meet the demand of FOV angle constraint.On basis of the framework of the proportional navigation guidance,an auxiliary control term is supplemented by the distributed deep deterministic policy gradient algorithm,in which the reward functions are developed to decrease the time-to-go error and improve the terminal guid-ance accuracy.The numerical simulation demonstrates that the missile governed by the presented deep reinforcement learning guidance law can hit the target successfully at appointed arrival time.
文摘Missile interception problem can be regarded as a two-person zero-sum differential games problem,which depends on the solution of Hamilton-Jacobi-Isaacs(HJI)equa-tion.It has been proved impossible to obtain a closed-form solu-tion due to the nonlinearity of HJI equation,and many iterative algorithms are proposed to solve the HJI equation.Simultane-ous policy updating algorithm(SPUA)is an effective algorithm for solving HJI equation,but it is an on-policy integral reinforce-ment learning(IRL).For online implementation of SPUA,the dis-turbance signals need to be adjustable,which is unrealistic.In this paper,an off-policy IRL algorithm based on SPUA is pro-posed without making use of any knowledge of the systems dynamics.Then,a neural-network based online adaptive critic implementation scheme of the off-policy IRL algorithm is pre-sented.Based on the online off-policy IRL method,a computa-tional intelligence interception guidance(CIIG)law is developed for intercepting high-maneuvering target.As a model-free method,intercepting targets can be achieved through measur-ing system data online.The effectiveness of the CIIG is verified through two missile and target engagement scenarios.
基金support from the National Key Research and Development Program of China(2021YFB3201400,2021YFB3201401,2020YFC1908602)the National Natural Science Foundation of China(21904001 and 61774159)+1 种基金the Anhui Provincial Natural Science Foundation(2008085QF288)the Scientific Research Foundation for the Returned Overseas Chinese Scholars,Anhui Province(2020LCX032).
文摘Metal-organic framework(MOF)nanostructures have emerged as a prominent class of materials in the advancement of electrochemical sensors.The rational design of bimetallic MOF-functionalized microelectrode is of importance for improv-ing the electrochemical performance but still in great challenge.In this work,the bimetallic FeCo-MOF nanostructures were assembled onto a gold disk ultramicroelectrode(Au UME,5.2μm in diameter)via an in-situ electrodeposition method,which enhanced the sensitive detection of epinephrine(EP).The in-situ electrodeposited FeCo-MOF exhibited a character-istic nanoflower-like morphology and was uniformly dispersed on the Au UME.The FeCo-MOF/Au UME demonstrated excellent electrochemical performance on the detection of EP with a high sensitivity of 36.93μA·μmol^(-1)·L·cm^(-2)and a low detection limit of 1.28μmol·L^(-1).It can be attributed to the nonlinear diffusion of EP onto the ultra-micro working substrate,coupled with synergistical catalytic activity of the bimetallic Fe,Co within MOF structure.Furthermore,the FeCo-MOF/Au UME has been successful applied to the analysis of EP in human serum samples,yielding high recovery rates.These results not only contribute to the expansion of the research area of electrochemical sensors,but also provide novel insights and directions into the development of high-performance MOF-based electrochemical sensors.
基金Natural Science Foundation of Shanghai(24ZR1400800)he Natural Science Foundation of China(U23A20685,52073058,91963204)+1 种基金the National Key R&D Program of China(2021YFB3701400)Shanghai Sailing Program(23YF1400200)。
文摘High-performance graphite materials have important roles in aerospace and nuclear reactor technologies because of their outstanding chemical stability and high-temperature performance.Their traditional production method relies on repeated impregnation-carbonization and graphitization,and is plagued by lengthy preparation cycles and high energy consumption.Phase transition-assisted self-pressurized selfsintering technology can rapidly produce high-strength graphite materials,but the fracture strain of the graphite materials produced is poor.To solve this problem,this study used a two-step sintering method to uniformly introduce micro-nano pores into natural graphite-based bulk graphite,achieving improved fracture strain of the samples without reducing their density and mechanical properties.Using natural graphite powder,micron-diamond,and nano-diamond as raw materials,and by precisely controlling the staged pressure release process,the degree of diamond phase transition expansion was effectively regulated.The strain-to-failure of the graphite samples reached 1.2%,a 35%increase compared to samples produced by fullpressure sintering.Meanwhile,their flexural strength exceeded 110 MPa,and their density was over 1.9 g/cm^(3).The process therefore produced both a high strength and a high fracture strain.The interface evolution and toughening mechanism during the two-step sintering process were investigated.It is believed that the micro-nano pores formed have two roles:as stress concentrators they induce yielding by shear and as multi-crack propagation paths they significantly lengthen the crack propagation path.The two-step sintering phase transition strategy introduces pores and provides a new approach for increasing the fracture strain of brittle materials.
基金supported by the Ningbo Major Research and Development Plan Project(Grant No.2024Z135)the Natural Science Basic Research Program of Shaanxi Province(Grant No.2024JC-YBMS-322)+1 种基金China Postdoctoral Science Foundation(Grant No.2020M673492)National Natural Science Foundation of China(Grant No.51909219)。
文摘In this paper,a type of reinforcing structure for composite shell with single and through hole is presented.The experimental tests for the composite shells without hole,with single hole and reinforced structure,with through hole and reinforced structure subjected to hydrostatic pressure were carried out by the designed experimental test system.The mechanical responses of the composite shells under hydrostatic pressure are obtained by the high-speed camera and strain measurement.The results show that the entire deformation process of the shell can be divided into three:uniform compression,"buckling mode formation"and buckling.The"buckling mode formation"process is captured and reported for the first time.For the composite shell with single hole,the proposed reinforcing structure has a significant reinforcement effect on the shell and the buckling capacity of the shell is not weaker than the complete composite shell.For the composite shell with through hole,sealing effect can be achieved by the proposed reinforcing structure,but the buckling capacity of the shell after reinforcement can only reach 77%of the original buckling capacity.
基金funded by Thailand Research Fund under Research and Researchers for Industries (contract no. MSD62I0063)
文摘This research addresses the growing demand for high-performance protective materials against high-velocity projectile impacts.The performance of multi-layered steel fiber-reinforced mortar(SFRM)panels with varying thicknesses and air gaps,was experimentally investigated under single and repeated impacts of 7.62×51 mm bullets fired from a distance of 50 m.The impact events were recorded using a high-speed camera at 40000 fps.Panel performance was assessed in terms of failure modes,kinetic energy absorption,spalling diameter,and percentage of back-face damage area,and weight loss.Results showed that panel configuration significantly influenced performance.Panel P10,with 70 mm SFRM thickness and 20 mm air gaps,provided the highest resistance,dissipating 5223 J of kinetic energy and preventing back-face damage.In contrast,P7,which absorbed 4476 J,presented a back damage area percentage of 8.93%after three impacts.Weight loss analysis further confirmed durability improvements,with P10 showing only 1.53%cumulative loss compared to 3.26%in P7.The inclusion of wider air gaps enhanced energy dissipation and reduced damage.Comparison between single and repeated impacts demonstrated the sustained resistance of high-performance panels,with P10 maintaining minimal degradation across three consecutive impacts.These findings highlight the potential of multi-layer SFRM panels to enhance ballistic resistance,making them suitable for military,security,and civilian protective applications requiring long-term durability.
文摘The development of guidance technology has made it possible for the earth penetration weapons(EPWs)to impact the target repeatedly at a close range. To investigative the damage of single and sequential strike induced by the EPWs, experimental and numerical investigations are carried out in this paper.Firstly, a series of sequential explosion tests are conducted to provide the basic data of the crater size.Then, a numerical model is established to simulate the damage effects of sequential explosions using the meshfree method of Smoothed particle Galerkin. The effectiveness of numerical model is verified by comparison with the experimental results. Finally, based on dimensional analysis, several empirical formulas for describing the crater size are presented, including the conical crater diameter and the conical crater depth of the single explosion, the conical crater area and the joint depth of the secondary explosion. The formula for the single explosion expresses the relationship between the aspect ratio of the charge ranging from 3 to 7, the dimensionless buried depth ranging from 2 to 14 and the crater size. The formula for the secondary explosion expresses the relationship between the relative position of the two explosions and the crater size. All of data can provide reference for the design of protective structures.
基金Supported by the National Natural Science Foundation of China(Grant No.52078283)Shandong Provincial Natural Science Foundation(Project No.ZR2024MA094)。
文摘With the change of the main influencing factors such as structural configuration and impact conditions,reinforced concrete slabs exhibit different mechanical behaviors with different failure patterns,and the failure modes are transformed.In order to reveal the failure mode and transformation rule of reinforced concrete slabs under impact loads,a dynamic impact response test was carried out using a drop hammer test device.The dynamic data pertaining to the impact force,support reaction force,structural displacement,and reinforcement strain were obtained through the use of digital image correlation technology(DIC),impact force measurement,and strain measurement.The analysis of the ultimate damage state of the reinforced concrete slab identified four distinct types of impact failure modes:local failure by stamping,overall failure by stamping,local-overall coupling failure,and local failure by punching.Additionally,the influence laws of hammerhead shape,hammer height,and reinforcement ratio on the dynamic response and failure mode transformation of the slab were revealed.The results indicate that:(1)The local damage to the slab by the plane hammer is readily apparent,while the overall damage by the spherical hammer is more pronounced.(2)In comparison to the high reinforcement ratio slabs,the overall bending resistance of the low reinforcement ratio slabs is significantly inferior,and the slab back exhibits further cracks.(3)As the hammer height increases,the slab failure mode undergoes a transformation,shifting from local failure by stamping and overall failure by stamping to local-overall coupling failure and local failure by punching.(4)Three failure mode thresholds have been established,and by comparing the peak impact force with the failure thresholds,the failure mode of the slab can be effectively determined.
基金Projects(51409154,41772299)supported by the National Natural Science Foundation of ChinaProject(J16LG03)supported by the Shandong Province Higher Educational Science and Technology Program,China+1 种基金Projects(2015JQJH106,2014TDJH103)supported by the SDUST Research Fund,ChinaProject(201630576)supported by the Tai’an Scientific and Technologic Development Project,China
文摘Soft rock surrounding deep roadway has poor stability and long-term rheological effect. More and larger deformation problems of surrounding rock occur due to adverse supporting measures for such roadways, which not only affects the engineering safety critically but also improves the maintenance costs. This paper takes the main rail roadway with severely deformation in China's Zaoquan coal mine as an example to study the long-term deformation tendency and damage zone by means of in-situ deformation monitoring and acoustic wave testing technique. A three-dimensional finite element model reflecting the engineering geological condition and initial design scheme is established by ABAQUS. Then, on the basis of field monitoring deformation data, the surrounding rock geotechnical and theological parameters of the roadway are obtained by back analysis. A combined supporting technology with U-shaped steel support and anchor-grouting is proposed for the surrounding soft rock. The numerical simulation of the combined supporting technology and in-situ deformation monitoring results show that the soft rock surrounding the roadway has been held effectively.
基金supported by the Aeronautical Science Foundation of China(2017ZC53033)the Seed Foundation of Innovation and Creation for Graduate Students in Northwestern Polytechnical University(CX2020156)。
文摘In order to improve the autonomous ability of unmanned aerial vehicles(UAV)to implement air combat mission,many artificial intelligence-based autonomous air combat maneuver decision-making studies have been carried out,but these studies are often aimed at individual decision-making in 1 v1 scenarios which rarely happen in actual air combat.Based on the research of the 1 v1 autonomous air combat maneuver decision,this paper builds a multi-UAV cooperative air combat maneuver decision model based on multi-agent reinforcement learning.Firstly,a bidirectional recurrent neural network(BRNN)is used to achieve communication between UAV individuals,and the multi-UAV cooperative air combat maneuver decision model under the actor-critic architecture is established.Secondly,through combining with target allocation and air combat situation assessment,the tactical goal of the formation is merged with the reinforcement learning goal of every UAV,and a cooperative tactical maneuver policy is generated.The simulation results prove that the multi-UAV cooperative air combat maneuver decision model established in this paper can obtain the cooperative maneuver policy through reinforcement learning,the cooperative maneuver policy can guide UAVs to obtain the overall situational advantage and defeat the opponents under tactical cooperation.
基金Project (50478502) supported by the National Natural Science Foundation of China
文摘The full-range behavior of partially bonded, together with partially prestressed concrete beams containing fiber reinforced polymer (FRP) tendons and stainless steel reinforcing bars was simulated using a simplified theoretical model. The model assumes that a section in the beam has a trilinear moment--curvature relationship characterized by three particular points, initial cracking of concrete, yielding of non-prestressed steel, and crushing of concrete or rupturing of prestressing tendons. Predictions from the model were compared with the limited available test data, and a reasonable agreement was obtained. A detailed parametric study of the behavior of the prestressed concrete beams with hybrid FRP and stainless steel reinforcements was conducted. It can be concluded that the deformability of the beam can be enhanced by increasing the ultimate compressive strain of concrete, unhonded length of tendon, percentage of compressive reinforcement and partial prestress ratio, and decreasing the effective prestress in tendons, and increasing in ultimate compressive strain of concrete is the most efficient one. The deformability of the beam is almost directly proportional to the concrete ultimate strain provided the failure mode is concrete crushing, even though the concrete ultimate strain has less influence on the load-carrying capacity.
基金supported by the National Key R&D Program of China(2017YFB1400105).
文摘In the evolutionary game of the same task for groups,the changes in game rules,personal interests,the crowd size,and external supervision cause uncertain effects on individual decision-making and game results.In the Markov decision framework,a single-task multi-decision evolutionary game model based on multi-agent reinforcement learning is proposed to explore the evolutionary rules in the process of a game.The model can improve the result of a evolutionary game and facilitate the completion of the task.First,based on the multi-agent theory,to solve the existing problems in the original model,a negative feedback tax penalty mechanism is proposed to guide the strategy selection of individuals in the group.In addition,in order to evaluate the evolutionary game results of the group in the model,a calculation method of the group intelligence level is defined.Secondly,the Q-learning algorithm is used to improve the guiding effect of the negative feedback tax penalty mechanism.In the model,the selection strategy of the Q-learning algorithm is improved and a bounded rationality evolutionary game strategy is proposed based on the rule of evolutionary games and the consideration of the bounded rationality of individuals.Finally,simulation results show that the proposed model can effectively guide individuals to choose cooperation strategies which are beneficial to task completion and stability under different negative feedback factor values and different group sizes,so as to improve the group intelligence level.