The rails of electromagnetic railgun can be ablated by the temperature rise due to current concentration.The current distributions on the rails and armature are not only affected by the skin effect,but also influenced...The rails of electromagnetic railgun can be ablated by the temperature rise due to current concentration.The current distributions on the rails and armature are not only affected by the skin effect,but also influenced by the proximity effect which is rarely mentioned.This paper illustrated the difference between skin effect and proximity effect,and the influencing factors of proximity effect were investigated.Results show that the current is concentrated on the surface around rails due to the skin effect,and the proximity effect exacerbates the current density on the inner surfaces of rails.Decrease in distance from rails enhances the proximity effect,but has nothing to do with the skin effect,which also augments the rail resistance,resulting in temperature rise.It can explain the reason why the ablation is often detected in the small caliber railgun.Research results in this paper can provide support for design and optimization of electromagnetic railgun.展开更多
The formation maintenance of multiple unmanned aerial vehicles(UAVs)based on proximity behavior is explored in this study.Individual decision-making is conducted according to the expected UAV formation structure and t...The formation maintenance of multiple unmanned aerial vehicles(UAVs)based on proximity behavior is explored in this study.Individual decision-making is conducted according to the expected UAV formation structure and the position,velocity,and attitude information of other UAVs in the azimuth area.This resolves problems wherein nodes are necessarily strongly connected and communication is strictly consistent under the traditional distributed formation control method.An adaptive distributed formation flight strategy is established for multiple UAVs by exploiting proximity behavior observations,which remedies the poor flexibility in distributed formation.This technique ensures consistent position and attitude among UAVs.In the proposed method,the azimuth area relative to the UAV itself is established to capture the state information of proximal UAVs.The dependency degree factor is introduced to state update equation based on proximity behavior.Finally,the formation position,speed,and attitude errors are used to form an adaptive dynamic adjustment strategy.Simulations are conducted to demonstrate the effectiveness and robustness of the theoretical results,thus validating the effectiveness of the proposed method.展开更多
To objectively obtain the three-dimensional coordinates of the projectile fuze proximity explosion when projectile intersects the head of missile target, we propose a dynamic seven photoelectric detection screen test ...To objectively obtain the three-dimensional coordinates of the projectile fuze proximity explosion when projectile intersects the head of missile target, we propose a dynamic seven photoelectric detection screen test method, which is made up of six plane detection screens and a flash photoelectric dynamic detection screen. The three-dimensional coordinates calculation model of the projectile proximity explosion position based on seven plane detection screens with dynamic characteristics is established.According to the relation of the dynamic seven photoelectric detection screen planes and the time values,the analytical function of the projectile proximity explosion position parameters under non-linear motion is derived. The projectile signal filtering method based on discrete wavelet transform is explored in this work. Additionally, the projectile signal recognition algorithm using an improved particle swarm is proposed. Based on the characteristics of the time duration and the signal peak error for the projectile passing through the detection screen, the signals attribution of the same projectile passing through six detection screens are analyzed for obtaining precise time values of the same projectile passing through the detection screens. On the basis of the projectile fuze proximity explosion test, the linear motion model and the proposed non-linear motion model are used to calculate and compare the same group of projectiles proximity explosion position parameters. The comparison of test results verifies that the proposed test method and calculation model in this work accurately obtain the actual projectile proximity explosion position parameters.展开更多
Moral imagination is the ability that can help individuals overcome constraints of organizational mental models to develop fresh frameworks and to make ethical decisions on the basis of those frameworks.This study aim...Moral imagination is the ability that can help individuals overcome constraints of organizational mental models to develop fresh frameworks and to make ethical decisions on the basis of those frameworks.This study aimed to explore the moderated mediator role of organizational commitment between ethical leadership and moral imagination.Data of 281 employees were collected,and results showed that when the victim of a certain ethical issue is their own company,organizational commitment fully mediated the effect of ethical leadership on moral imagination;however,when the victim is other company,ethical leadership and organizational commitment hadn't any effect on moral imagination.Those results showed the process that ethical leadership uses to influence moral imagination is not a social learning process but a social exchange process.展开更多
Signal modulation is an essential design factor for proximity detectors and directly affects the system's potential performance.In order to achieve the advantages of chaotic codes bi-phase modulation(CCBPM)and lin...Signal modulation is an essential design factor for proximity detectors and directly affects the system's potential performance.In order to achieve the advantages of chaotic codes bi-phase modulation(CCBPM)and linear frequency modulation(LFM) simultaneously,this paper designed a waveform which combined chaotic codes bi-phase modulation and linear frequency modulation(CCBPM-LFM) for proximity detectors.The CCBPM-LFM waveform was analyzed in the aspect of time delay resolution(TDR) and Doppler tolerance(DT) based on ambiguity function(AF).Then,a ranging method,which we called instant correlation harmonic demodulation(ICHD),was presented for the detector using the CCBPM-LFM waveform.By combining time domain instant correlation with harmonic demodulation,the ICHD solved the problem caused by combination modulation and made the most of the linear frequency modulation(LFM) harmonics and the correlation of chaotic codes.Finally,a prototype was implemented and ranging experiments were carried out.From the theoretical analysis and experimental results,the proximity detector used the CCBPM-LFM waveform has an outstanding detection performance.展开更多
The interrupted-sampling repeater jamming(ISRJ)can cause false targets to the radio-frequency proximity sensors(RFPSs),resulting in a serious decline in the target detection capability of the RFPS.This article propose...The interrupted-sampling repeater jamming(ISRJ)can cause false targets to the radio-frequency proximity sensors(RFPSs),resulting in a serious decline in the target detection capability of the RFPS.This article proposes a recognition method for RFPSs to identify the false targets caused by ISRJ.The proposed method is realized by assigning a unique identity(ID)to each RFPS,and each ID is a periodically and chaotically encrypted in every pulse period.The processing technique of the received signal is divided into ranging and ID decryption.In the ranging part,a high-resolution range profile(HRRP)can be obtained by performing pulse compression with the binary chaotic sequences.To suppress the noise,the singular value decomposition(SVD)is applied in the preprocessing.Regarding ID decryption,targets and ISRJ can be recognized through the encryption and decryption processes,which are controlled by random keys.An adaptability analysis conducted in terms of the peak-to-side lobe ratio(PSLR)and bit error rate(BER)indicates that the proposed method performs well within a 70-k Hz Doppler shift.A simulation and experimental results show that the proposed method achieves extremely stable target and ISRJ recognition accuracies at different signal-to-noise ratios(SNRs)and jamming-to-signal ratios(JSRs).展开更多
Dynamic soaring,inspired by the wind-riding flight of birds such as albatrosses,is a biomimetic technique which leverages wind fields to enhance the endurance of unmanned aerial vehicles(UAVs).Achieving a precise soar...Dynamic soaring,inspired by the wind-riding flight of birds such as albatrosses,is a biomimetic technique which leverages wind fields to enhance the endurance of unmanned aerial vehicles(UAVs).Achieving a precise soaring trajectory is crucial for maximizing energy efficiency during flight.Existing nonlinear programming methods are heavily dependent on the choice of initial values which is hard to determine.Therefore,this paper introduces a deep reinforcement learning method based on a differentially flat model for dynamic soaring trajectory planning and optimization.Initially,the gliding trajectory is parameterized using Fourier basis functions,achieving a flexible trajectory representation with a minimal number of hyperparameters.Subsequently,the trajectory optimization problem is formulated as a dynamic interactive process of Markov decision-making.The hyperparameters of the trajectory are optimized using the Proximal Policy Optimization(PPO2)algorithm from deep reinforcement learning(DRL),reducing the strong reliance on initial value settings in the optimization process.Finally,a comparison between the proposed method and the nonlinear programming method reveals that the trajectory generated by the proposed approach is smoother while meeting the same performance requirements.Specifically,the proposed method achieves a 34%reduction in maximum thrust,a 39.4%decrease in maximum thrust difference,and a 33%reduction in maximum airspeed difference.展开更多
With the increment of the number of Gaussian components, the computation cost increases in the Gaussian mixture probability hypothesis density(GM-PHD) filter. Based on the theory of Chen et al, we propose an improved ...With the increment of the number of Gaussian components, the computation cost increases in the Gaussian mixture probability hypothesis density(GM-PHD) filter. Based on the theory of Chen et al, we propose an improved pruning algorithm for the GM-PHD filter, which utilizes not only the Gaussian components’ means and covariance, but their weights as a new criterion to improve the estimate accuracy of the conventional pruning algorithm for tracking very closely proximity targets. Moreover, it solves the end-less while-loop problem without the need of a second merging step. Simulation results show that this improved algorithm is easier to implement and more robust than the formal ones.展开更多
Aiming at a novel missile-borne detector in the optional burst height proximity fuze, a self-adaptive high-resolution forward-looking imaging algorithm (SAHRFL-IA) is presented. The echo data are captured by the missi...Aiming at a novel missile-borne detector in the optional burst height proximity fuze, a self-adaptive high-resolution forward-looking imaging algorithm (SAHRFL-IA) is presented. The echo data are captured by the missile-borne detector in the target regions;thereby the azimuth angulation accuracy at the same distance dimension is improved dynamically. Thus, azimuth information of the targets in the detection area may be obtained accurately. The proposed imaging algorithm breaks through the conventional misconception of merely using azimuth discrimination curves under ideal conditions during monopulse angulation. The real-time echo data from the target region are used to perform error correction for this discrimination curve, and finally the accuracy of the azimuth angulation may reach the optimum at the same distance dimension. A series of experiments demonstrate the validity, reliability and high performance of the proposed imaging algorithm. Azimuth angulation accuracy may reach ten times that of the detection beam width. Meanwhile, the running time of this algorithm satisfies the requirements of missile-borne platforms.展开更多
Three differential equations based on different definitions of current density are compared. Formulation I is based on an incomplete equation for total current density (TCD). Formulations II and {I1 are based on inc...Three differential equations based on different definitions of current density are compared. Formulation I is based on an incomplete equation for total current density (TCD). Formulations II and {I1 are based on incomplete and complete equations for source current density (SCD), respectively. Using the weak form of finite element method (FEM), three formulations were applied in a spiral coil electromagnetic acoustic transducer (EMAT) example to solve magnetic vector potential (MVP). The input impedances calculated by Formulation III are in excellent agreement with the experimental measurements. Results show that the errors for Formulations I & II vary with coil diameter, coil spacing, lift-off distance and external excitation frequency, for the existence of eddy-current and skin & proximity effects. And the current distribution across the coil conductor also follows the same trend. It is better to choose Formulation I instead of Formulation Ili to solve MVP when the coil diameter is less than twice the skin depth for Formulation I is a low cost and high efficiency calculation method.展开更多
The scale of ground-to-air confrontation task assignments is large and needs to deal with many concurrent task assignments and random events.Aiming at the problems where existing task assignment methods are applied to...The scale of ground-to-air confrontation task assignments is large and needs to deal with many concurrent task assignments and random events.Aiming at the problems where existing task assignment methods are applied to ground-to-air confrontation,there is low efficiency in dealing with complex tasks,and there are interactive conflicts in multiagent systems.This study proposes a multiagent architecture based on a one-general agent with multiple narrow agents(OGMN)to reduce task assignment conflicts.Considering the slow speed of traditional dynamic task assignment algorithms,this paper proposes the proximal policy optimization for task assignment of general and narrow agents(PPOTAGNA)algorithm.The algorithm based on the idea of the optimal assignment strategy algorithm and combined with the training framework of deep reinforcement learning(DRL)adds a multihead attention mechanism and a stage reward mechanism to the bilateral band clipping PPO algorithm to solve the problem of low training efficiency.Finally,simulation experiments are carried out in the digital battlefield.The multiagent architecture based on OGMN combined with the PPO-TAGNA algorithm can obtain higher rewards faster and has a higher win ratio.By analyzing agent behavior,the efficiency,superiority and rationality of resource utilization of this method are verified.展开更多
To solve the path following control problem for unmanned surface vehicles(USVs),a control method based on deep reinforcement learning(DRL)with long short-term memory(LSTM)networks is proposed.A distributed proximal po...To solve the path following control problem for unmanned surface vehicles(USVs),a control method based on deep reinforcement learning(DRL)with long short-term memory(LSTM)networks is proposed.A distributed proximal policy opti-mization(DPPO)algorithm,which is a modified actor-critic-based type of reinforcement learning algorithm,is adapted to improve the controller performance in repeated trials.The LSTM network structure is introduced to solve the strong temporal cor-relation USV control problem.In addition,a specially designed path dataset,including straight and curved paths,is established to simulate various sailing scenarios so that the reinforcement learning controller can obtain as much handling experience as possible.Extensive numerical simulation results demonstrate that the proposed method has better control performance under missions involving complex maneuvers than trained with limited scenarios and can potentially be applied in practice.展开更多
To solve the problem of multi-target hunting by an unmanned surface vehicle(USV)fleet,a hunting algorithm based on multi-agent reinforcement learning is proposed.Firstly,the hunting environment and kinematic model wit...To solve the problem of multi-target hunting by an unmanned surface vehicle(USV)fleet,a hunting algorithm based on multi-agent reinforcement learning is proposed.Firstly,the hunting environment and kinematic model without boundary constraints are built,and the criteria for successful target capture are given.Then,the cooperative hunting problem of a USV fleet is modeled as a decentralized partially observable Markov decision process(Dec-POMDP),and a distributed partially observable multitarget hunting Proximal Policy Optimization(DPOMH-PPO)algorithm applicable to USVs is proposed.In addition,an observation model,a reward function and the action space applicable to multi-target hunting tasks are designed.To deal with the dynamic change of observational feature dimension input by partially observable systems,a feature embedding block is proposed.By combining the two feature compression methods of column-wise max pooling(CMP)and column-wise average-pooling(CAP),observational feature encoding is established.Finally,the centralized training and decentralized execution framework is adopted to complete the training of hunting strategy.Each USV in the fleet shares the same policy and perform actions independently.Simulation experiments have verified the effectiveness of the DPOMH-PPO algorithm in the test scenarios with different numbers of USVs.Moreover,the advantages of the proposed model are comprehensively analyzed from the aspects of algorithm performance,migration effect in task scenarios and self-organization capability after being damaged,the potential deployment and application of DPOMH-PPO in the real environment is verified.展开更多
文摘The rails of electromagnetic railgun can be ablated by the temperature rise due to current concentration.The current distributions on the rails and armature are not only affected by the skin effect,but also influenced by the proximity effect which is rarely mentioned.This paper illustrated the difference between skin effect and proximity effect,and the influencing factors of proximity effect were investigated.Results show that the current is concentrated on the surface around rails due to the skin effect,and the proximity effect exacerbates the current density on the inner surfaces of rails.Decrease in distance from rails enhances the proximity effect,but has nothing to do with the skin effect,which also augments the rail resistance,resulting in temperature rise.It can explain the reason why the ablation is often detected in the small caliber railgun.Research results in this paper can provide support for design and optimization of electromagnetic railgun.
文摘The formation maintenance of multiple unmanned aerial vehicles(UAVs)based on proximity behavior is explored in this study.Individual decision-making is conducted according to the expected UAV formation structure and the position,velocity,and attitude information of other UAVs in the azimuth area.This resolves problems wherein nodes are necessarily strongly connected and communication is strictly consistent under the traditional distributed formation control method.An adaptive distributed formation flight strategy is established for multiple UAVs by exploiting proximity behavior observations,which remedies the poor flexibility in distributed formation.This technique ensures consistent position and attitude among UAVs.In the proposed method,the azimuth area relative to the UAV itself is established to capture the state information of proximal UAVs.The dependency degree factor is introduced to state update equation based on proximity behavior.Finally,the formation position,speed,and attitude errors are used to form an adaptive dynamic adjustment strategy.Simulations are conducted to demonstrate the effectiveness and robustness of the theoretical results,thus validating the effectiveness of the proposed method.
基金supported by Project of the National Natural Science Foundation of China (No.62073256, 61773305)the Key Science and Technology Program of Shaanxi Province (No.2020GY-125)Xi’an Science and Technology Innovation talent service enterprise project (No.2020KJRC0041)。
文摘To objectively obtain the three-dimensional coordinates of the projectile fuze proximity explosion when projectile intersects the head of missile target, we propose a dynamic seven photoelectric detection screen test method, which is made up of six plane detection screens and a flash photoelectric dynamic detection screen. The three-dimensional coordinates calculation model of the projectile proximity explosion position based on seven plane detection screens with dynamic characteristics is established.According to the relation of the dynamic seven photoelectric detection screen planes and the time values,the analytical function of the projectile proximity explosion position parameters under non-linear motion is derived. The projectile signal filtering method based on discrete wavelet transform is explored in this work. Additionally, the projectile signal recognition algorithm using an improved particle swarm is proposed. Based on the characteristics of the time duration and the signal peak error for the projectile passing through the detection screen, the signals attribution of the same projectile passing through six detection screens are analyzed for obtaining precise time values of the same projectile passing through the detection screens. On the basis of the projectile fuze proximity explosion test, the linear motion model and the proposed non-linear motion model are used to calculate and compare the same group of projectiles proximity explosion position parameters. The comparison of test results verifies that the proposed test method and calculation model in this work accurately obtain the actual projectile proximity explosion position parameters.
基金supported by a grant from the Chinese National Scientific Foundation(71002112,71562017)
文摘Moral imagination is the ability that can help individuals overcome constraints of organizational mental models to develop fresh frameworks and to make ethical decisions on the basis of those frameworks.This study aimed to explore the moderated mediator role of organizational commitment between ethical leadership and moral imagination.Data of 281 employees were collected,and results showed that when the victim of a certain ethical issue is their own company,organizational commitment fully mediated the effect of ethical leadership on moral imagination;however,when the victim is other company,ethical leadership and organizational commitment hadn't any effect on moral imagination.Those results showed the process that ethical leadership uses to influence moral imagination is not a social learning process but a social exchange process.
基金supported by the State Key Program of Basic Research of China under Grant No.613196the National Natural Science Foundation of China under Grant No.61673066。
文摘Signal modulation is an essential design factor for proximity detectors and directly affects the system's potential performance.In order to achieve the advantages of chaotic codes bi-phase modulation(CCBPM)and linear frequency modulation(LFM) simultaneously,this paper designed a waveform which combined chaotic codes bi-phase modulation and linear frequency modulation(CCBPM-LFM) for proximity detectors.The CCBPM-LFM waveform was analyzed in the aspect of time delay resolution(TDR) and Doppler tolerance(DT) based on ambiguity function(AF).Then,a ranging method,which we called instant correlation harmonic demodulation(ICHD),was presented for the detector using the CCBPM-LFM waveform.By combining time domain instant correlation with harmonic demodulation,the ICHD solved the problem caused by combination modulation and made the most of the linear frequency modulation(LFM) harmonics and the correlation of chaotic codes.Finally,a prototype was implemented and ranging experiments were carried out.From the theoretical analysis and experimental results,the proximity detector used the CCBPM-LFM waveform has an outstanding detection performance.
基金supported by the National Natural Science Foundation of China(Grant No.61973037)and(Grant No.61871414)Postdoctoral Fundation of China(Grant No.2022M720419)。
文摘The interrupted-sampling repeater jamming(ISRJ)can cause false targets to the radio-frequency proximity sensors(RFPSs),resulting in a serious decline in the target detection capability of the RFPS.This article proposes a recognition method for RFPSs to identify the false targets caused by ISRJ.The proposed method is realized by assigning a unique identity(ID)to each RFPS,and each ID is a periodically and chaotically encrypted in every pulse period.The processing technique of the received signal is divided into ranging and ID decryption.In the ranging part,a high-resolution range profile(HRRP)can be obtained by performing pulse compression with the binary chaotic sequences.To suppress the noise,the singular value decomposition(SVD)is applied in the preprocessing.Regarding ID decryption,targets and ISRJ can be recognized through the encryption and decryption processes,which are controlled by random keys.An adaptability analysis conducted in terms of the peak-to-side lobe ratio(PSLR)and bit error rate(BER)indicates that the proposed method performs well within a 70-k Hz Doppler shift.A simulation and experimental results show that the proposed method achieves extremely stable target and ISRJ recognition accuracies at different signal-to-noise ratios(SNRs)and jamming-to-signal ratios(JSRs).
基金support received by the National Natural Science Foundation of China(Grant Nos.52372398&62003272).
文摘Dynamic soaring,inspired by the wind-riding flight of birds such as albatrosses,is a biomimetic technique which leverages wind fields to enhance the endurance of unmanned aerial vehicles(UAVs).Achieving a precise soaring trajectory is crucial for maximizing energy efficiency during flight.Existing nonlinear programming methods are heavily dependent on the choice of initial values which is hard to determine.Therefore,this paper introduces a deep reinforcement learning method based on a differentially flat model for dynamic soaring trajectory planning and optimization.Initially,the gliding trajectory is parameterized using Fourier basis functions,achieving a flexible trajectory representation with a minimal number of hyperparameters.Subsequently,the trajectory optimization problem is formulated as a dynamic interactive process of Markov decision-making.The hyperparameters of the trajectory are optimized using the Proximal Policy Optimization(PPO2)algorithm from deep reinforcement learning(DRL),reducing the strong reliance on initial value settings in the optimization process.Finally,a comparison between the proposed method and the nonlinear programming method reveals that the trajectory generated by the proposed approach is smoother while meeting the same performance requirements.Specifically,the proposed method achieves a 34%reduction in maximum thrust,a 39.4%decrease in maximum thrust difference,and a 33%reduction in maximum airspeed difference.
基金supported by the National Natural Science Foundation of China(61703228)
文摘With the increment of the number of Gaussian components, the computation cost increases in the Gaussian mixture probability hypothesis density(GM-PHD) filter. Based on the theory of Chen et al, we propose an improved pruning algorithm for the GM-PHD filter, which utilizes not only the Gaussian components’ means and covariance, but their weights as a new criterion to improve the estimate accuracy of the conventional pruning algorithm for tracking very closely proximity targets. Moreover, it solves the end-less while-loop problem without the need of a second merging step. Simulation results show that this improved algorithm is easier to implement and more robust than the formal ones.
基金supported by the Key Army Pre-research Projects of China(30107030803)
文摘Aiming at a novel missile-borne detector in the optional burst height proximity fuze, a self-adaptive high-resolution forward-looking imaging algorithm (SAHRFL-IA) is presented. The echo data are captured by the missile-borne detector in the target regions;thereby the azimuth angulation accuracy at the same distance dimension is improved dynamically. Thus, azimuth information of the targets in the detection area may be obtained accurately. The proposed imaging algorithm breaks through the conventional misconception of merely using azimuth discrimination curves under ideal conditions during monopulse angulation. The real-time echo data from the target region are used to perform error correction for this discrimination curve, and finally the accuracy of the azimuth angulation may reach the optimum at the same distance dimension. A series of experiments demonstrate the validity, reliability and high performance of the proposed imaging algorithm. Azimuth angulation accuracy may reach ten times that of the detection beam width. Meanwhile, the running time of this algorithm satisfies the requirements of missile-borne platforms.
基金Project(2014BAF12B01)supported by the Key Projects in the National Science&Technology Pillar Program during the Twelfth Five-year Plan Period,ChinaProject(51405520)supported by the National Natural Science Foundation of ChinaProject(2012CB619505)supported by National Basic Research Program of China
文摘Three differential equations based on different definitions of current density are compared. Formulation I is based on an incomplete equation for total current density (TCD). Formulations II and {I1 are based on incomplete and complete equations for source current density (SCD), respectively. Using the weak form of finite element method (FEM), three formulations were applied in a spiral coil electromagnetic acoustic transducer (EMAT) example to solve magnetic vector potential (MVP). The input impedances calculated by Formulation III are in excellent agreement with the experimental measurements. Results show that the errors for Formulations I & II vary with coil diameter, coil spacing, lift-off distance and external excitation frequency, for the existence of eddy-current and skin & proximity effects. And the current distribution across the coil conductor also follows the same trend. It is better to choose Formulation I instead of Formulation Ili to solve MVP when the coil diameter is less than twice the skin depth for Formulation I is a low cost and high efficiency calculation method.
基金the Project of National Natural Science Foundation of China(Grant No.62106283)the Project of National Natural Science Foundation of China(Grant No.72001214)to provide fund for conducting experimentsthe Project of Natural Science Foundation of Shaanxi Province(Grant No.2020JQ-484)。
文摘The scale of ground-to-air confrontation task assignments is large and needs to deal with many concurrent task assignments and random events.Aiming at the problems where existing task assignment methods are applied to ground-to-air confrontation,there is low efficiency in dealing with complex tasks,and there are interactive conflicts in multiagent systems.This study proposes a multiagent architecture based on a one-general agent with multiple narrow agents(OGMN)to reduce task assignment conflicts.Considering the slow speed of traditional dynamic task assignment algorithms,this paper proposes the proximal policy optimization for task assignment of general and narrow agents(PPOTAGNA)algorithm.The algorithm based on the idea of the optimal assignment strategy algorithm and combined with the training framework of deep reinforcement learning(DRL)adds a multihead attention mechanism and a stage reward mechanism to the bilateral band clipping PPO algorithm to solve the problem of low training efficiency.Finally,simulation experiments are carried out in the digital battlefield.The multiagent architecture based on OGMN combined with the PPO-TAGNA algorithm can obtain higher rewards faster and has a higher win ratio.By analyzing agent behavior,the efficiency,superiority and rationality of resource utilization of this method are verified.
基金supported by the National Natural Science Foundation(61601491)the Natural Science Foundation of Hubei Province(2018CFC865)the China Postdoctoral Science Foundation Funded Project(2016T45686).
文摘To solve the path following control problem for unmanned surface vehicles(USVs),a control method based on deep reinforcement learning(DRL)with long short-term memory(LSTM)networks is proposed.A distributed proximal policy opti-mization(DPPO)algorithm,which is a modified actor-critic-based type of reinforcement learning algorithm,is adapted to improve the controller performance in repeated trials.The LSTM network structure is introduced to solve the strong temporal cor-relation USV control problem.In addition,a specially designed path dataset,including straight and curved paths,is established to simulate various sailing scenarios so that the reinforcement learning controller can obtain as much handling experience as possible.Extensive numerical simulation results demonstrate that the proposed method has better control performance under missions involving complex maneuvers than trained with limited scenarios and can potentially be applied in practice.
基金financial support from National Natural Science Foundation of China(Grant No.61601491)Natural Science Foundation of Hubei Province,China(Grant No.2018CFC865)Military Research Project of China(-Grant No.YJ2020B117)。
文摘To solve the problem of multi-target hunting by an unmanned surface vehicle(USV)fleet,a hunting algorithm based on multi-agent reinforcement learning is proposed.Firstly,the hunting environment and kinematic model without boundary constraints are built,and the criteria for successful target capture are given.Then,the cooperative hunting problem of a USV fleet is modeled as a decentralized partially observable Markov decision process(Dec-POMDP),and a distributed partially observable multitarget hunting Proximal Policy Optimization(DPOMH-PPO)algorithm applicable to USVs is proposed.In addition,an observation model,a reward function and the action space applicable to multi-target hunting tasks are designed.To deal with the dynamic change of observational feature dimension input by partially observable systems,a feature embedding block is proposed.By combining the two feature compression methods of column-wise max pooling(CMP)and column-wise average-pooling(CAP),observational feature encoding is established.Finally,the centralized training and decentralized execution framework is adopted to complete the training of hunting strategy.Each USV in the fleet shares the same policy and perform actions independently.Simulation experiments have verified the effectiveness of the DPOMH-PPO algorithm in the test scenarios with different numbers of USVs.Moreover,the advantages of the proposed model are comprehensively analyzed from the aspects of algorithm performance,migration effect in task scenarios and self-organization capability after being damaged,the potential deployment and application of DPOMH-PPO in the real environment is verified.