With applying the information technology to the military field, the advantages and importance of the networked combat are more and more obvious. In order to make full use of limited battlefield resources and maximally...With applying the information technology to the military field, the advantages and importance of the networked combat are more and more obvious. In order to make full use of limited battlefield resources and maximally destroy enemy targets from arbitrary angle in a limited time, the research on firepower nodes dynamic deployment becomes a key problem of command and control. Considering a variety of tactical indexes and actual constraints in air defense, a mathematical model is formulated to minimize the enemy target penetration probability. Based on characteristics of the mathematical model and demands of the deployment problems, an assistance-based algorithm is put forward which combines the artificial potential field (APF) method with a memetic algorithm. The APF method is employed to solve the constraint handling problem and generate feasible solutions. The constrained optimization problem transforms into an optimization problem of APF parameters adjustment, and the dimension of the problem is reduced greatly. The dynamic deployment is accomplished by generation and refinement of feasible solutions. The simulation results show that the proposed algorithm is effective and feasible in dynamic situation.展开更多
In the world, most of the successes are results of longterm efforts. The reward of success is extremely high, but before that, a long-term investment process is required. People who are “myopic” only value short-ter...In the world, most of the successes are results of longterm efforts. The reward of success is extremely high, but before that, a long-term investment process is required. People who are “myopic” only value short-term rewards and are unwilling to make early-stage investments, so they hardly get the ultimate success and the corresponding high rewards. Similarly, for a reinforcement learning(RL) model with long-delay rewards, the discount rate determines the strength of agent’s “farsightedness”.In order to enable the trained agent to make a chain of correct choices and succeed finally, the feasible region of the discount rate is obtained through mathematical derivation in this paper firstly. It satisfies the “farsightedness” requirement of agent. Afterwards, in order to avoid the complicated problem of solving implicit equations in the process of choosing feasible solutions,a simple method is explored and verified by theoreti cal demonstration and mathematical experiments. Then, a series of RL experiments are designed and implemented to verify the validity of theory. Finally, the model is extended from the finite process to the infinite process. The validity of the extended model is verified by theories and experiments. The whole research not only reveals the significance of the discount rate, but also provides a theoretical basis as well as a practical method for the choice of discount rate in future researches.展开更多
基金supported by the National Outstanding Youth Science Foundation (60925011)the National Natural Science Foundation of China (61203181)
文摘With applying the information technology to the military field, the advantages and importance of the networked combat are more and more obvious. In order to make full use of limited battlefield resources and maximally destroy enemy targets from arbitrary angle in a limited time, the research on firepower nodes dynamic deployment becomes a key problem of command and control. Considering a variety of tactical indexes and actual constraints in air defense, a mathematical model is formulated to minimize the enemy target penetration probability. Based on characteristics of the mathematical model and demands of the deployment problems, an assistance-based algorithm is put forward which combines the artificial potential field (APF) method with a memetic algorithm. The APF method is employed to solve the constraint handling problem and generate feasible solutions. The constrained optimization problem transforms into an optimization problem of APF parameters adjustment, and the dimension of the problem is reduced greatly. The dynamic deployment is accomplished by generation and refinement of feasible solutions. The simulation results show that the proposed algorithm is effective and feasible in dynamic situation.
基金supported by the National Natural Science Foundation of China (717712167170120972001214)。
文摘In the world, most of the successes are results of longterm efforts. The reward of success is extremely high, but before that, a long-term investment process is required. People who are “myopic” only value short-term rewards and are unwilling to make early-stage investments, so they hardly get the ultimate success and the corresponding high rewards. Similarly, for a reinforcement learning(RL) model with long-delay rewards, the discount rate determines the strength of agent’s “farsightedness”.In order to enable the trained agent to make a chain of correct choices and succeed finally, the feasible region of the discount rate is obtained through mathematical derivation in this paper firstly. It satisfies the “farsightedness” requirement of agent. Afterwards, in order to avoid the complicated problem of solving implicit equations in the process of choosing feasible solutions,a simple method is explored and verified by theoreti cal demonstration and mathematical experiments. Then, a series of RL experiments are designed and implemented to verify the validity of theory. Finally, the model is extended from the finite process to the infinite process. The validity of the extended model is verified by theories and experiments. The whole research not only reveals the significance of the discount rate, but also provides a theoretical basis as well as a practical method for the choice of discount rate in future researches.