A policy iteration algorithm of adaptive dynamic programming(ADP) is developed to solve the optimal tracking control for a class of discrete-time chaotic systems. By system transformations, the optimal tracking prob...A policy iteration algorithm of adaptive dynamic programming(ADP) is developed to solve the optimal tracking control for a class of discrete-time chaotic systems. By system transformations, the optimal tracking problem is transformed into an optimal regulation one. The policy iteration algorithm for discrete-time chaotic systems is first described. Then,the convergence and admissibility properties of the developed policy iteration algorithm are presented, which show that the transformed chaotic system can be stabilized under an arbitrary iterative control law and the iterative performance index function simultaneously converges to the optimum. By implementing the policy iteration algorithm via neural networks,the developed optimal tracking control scheme for chaotic systems is verified by a simulation.展开更多
In short-term operation of natural gas network,the impact of demand uncertainty is not negligible.To address this issue we propose a two-stage robust model for power cost minimization problem in gunbarrel natural gas ...In short-term operation of natural gas network,the impact of demand uncertainty is not negligible.To address this issue we propose a two-stage robust model for power cost minimization problem in gunbarrel natural gas networks.The demands between pipelines and compressor stations are uncertain with a budget parameter,since it is unlikely that all the uncertain demands reach the maximal deviation simultaneously.During solving the two-stage robust model we encounter a bilevel problem which is challenging to solve.We formulate it as a multi-dimensional dynamic programming problem and propose approximate dynamic programming methods to accelerate the calculation.Numerical results based on real network in China show that we obtain a speed gain of 7 times faster in average without compromising optimality compared with original dynamic programming algorithm.Numerical results also verify the advantage of robust model compared with deterministic model when facing uncertainties.These findings offer short-term operation methods for gunbarrel natural gas network management to handle with uncertainties.展开更多
An approach about large dynamic programming based on discrete linear system with a quadratic index function is proposed by importing two Lagrange multipliers.
The convergence and stability of a value-iteration-based adaptive dynamic programming (ADP) algorithm are con- sidered for discrete-time nonlinear systems accompanied by a discounted quadric performance index. More ...The convergence and stability of a value-iteration-based adaptive dynamic programming (ADP) algorithm are con- sidered for discrete-time nonlinear systems accompanied by a discounted quadric performance index. More importantly than sufficing to achieve a good approximate structure, the iterative feedback control law must guarantee the closed-loop stability. Specifically, it is firstly proved that the iterative value function sequence will precisely converge to the optimum. Secondly, the necessary and sufficient condition of the optimal value function serving as a Lyapunov function is investi- gated. We prove that for the case of infinite horizon, there exists a finite horizon length of which the iterative feedback control law will provide stability, and this increases the practicability of the proposed value iteration algorithm. Neural networks (NNs) are employed to approximate the value functions and the optimal feedback control laws, and the approach allows the implementation of the algorithm without knowing the internal dynamics of the system. Finally, a simulation example is employed to demonstrate the effectiveness of the developed optimal control method.展开更多
We develop an optimal tracking control method for chaotic system with unknown dynamics and disturbances. The method allows the optimal cost function and the corresponding tracking control to update synchronously. Acco...We develop an optimal tracking control method for chaotic system with unknown dynamics and disturbances. The method allows the optimal cost function and the corresponding tracking control to update synchronously. According to the tracking error and the reference dynamics, the augmented system is constructed. Then the optimal tracking control problem is defined. The policy iteration (PI) is introduced to solve the rain-max optimization problem. The off-policy adaptive dynamic programming (ADP) algorithm is then proposed to find the solution of the tracking Hamilton-Jacobi- Isaacs (HJI) equation online only using measured data and without any knowledge about the system dynamics. Critic neural network (CNN), action neural network (ANN), and disturbance neural network (DNN) are used to approximate the cost function, control, and disturbance. The weights of these networks compose the augmented weight matrix, and the uniformly ultimately bounded (UUB) of which is proven. The convergence of the tracking error system is also proven. Two examples are given to show the effectiveness of the proposed synchronous solution method for the chaotic system tracking problem.展开更多
Modulating both the clock frequency and supply voltage of the network-on-chip (NoC) during runtime can reduce the power consumption and heat flux, but will lead to the increase of the latency of NoC. It is necessary...Modulating both the clock frequency and supply voltage of the network-on-chip (NoC) during runtime can reduce the power consumption and heat flux, but will lead to the increase of the latency of NoC. It is necessary to find a tradeoff between power consumption and communication latency. So we propose an analytical latency model which can show us the relationship of them. The proposed model to analyze latency is based on the M/G/1 queuing model, which is suitable for dynamic frequency scaling. The experiment results show that the accuracy of this model is more than 90%.展开更多
In the capacity planning of hydro-wind-solar power systems(CPHPS),it is crucial to use flexible hydropower to complement the variable wind-solar power.Hydropower units must be operated such that they avoid specific re...In the capacity planning of hydro-wind-solar power systems(CPHPS),it is crucial to use flexible hydropower to complement the variable wind-solar power.Hydropower units must be operated such that they avoid specific restricted operation zones,that is,forbidden zones(FZs),to avoid the risks associated with hydropower unit vibration.FZs cause limitations in terms of both the hydropower generation and flexible regulation in the hydro-wind-solar power systems.Therefore,it is essential to consider FZs when determining the optimal wind-solar power capacity that can be compensated by the hydropower.This study presents a mathematical model that incorporates the FZ constraints into the CPHPS problem.Firstly,the FZs of the hydropower units are converted into those of the hydropower plants based on set theory.Secondly,a mathematical model was formulated for the CPHPS,which couples the FZ constraints of hydropower plants with other operational constraints(e.g.,power balance constraints,new energy consumption limits,and hydropower generation functions).Thirdly,dynamic programming with successive approximations is employed to solve the proposed model.Lastly,case studies were conducted on the hydro-wind-solar system of the Qingshui River to demonstrate the effectiveness of the proposed model.展开更多
This paper estimates an off-policy integral reinforcement learning(IRL) algorithm to obtain the optimal tracking control of unknown chaotic systems. Off-policy IRL can learn the solution of the HJB equation from the...This paper estimates an off-policy integral reinforcement learning(IRL) algorithm to obtain the optimal tracking control of unknown chaotic systems. Off-policy IRL can learn the solution of the HJB equation from the system data generated by an arbitrary control. Moreover, off-policy IRL can be regarded as a direct learning method, which avoids the identification of system dynamics. In this paper, the performance index function is first given based on the system tracking error and control error. For solving the Hamilton–Jacobi–Bellman(HJB) equation, an off-policy IRL algorithm is proposed.It is proven that the iterative control makes the tracking error system asymptotically stable, and the iterative performance index function is convergent. Simulation study demonstrates the effectiveness of the developed tracking control method.展开更多
By Mobile Edge Computing(MEC), computation-intensive tasks are offloaded from mobile devices to cloud servers, and thus the energy consumption of mobile devices can be notably reduced. In this paper, we study task off...By Mobile Edge Computing(MEC), computation-intensive tasks are offloaded from mobile devices to cloud servers, and thus the energy consumption of mobile devices can be notably reduced. In this paper, we study task offloading in multi-user MEC systems with heterogeneous clouds, including edge clouds and remote clouds. Tasks are forwarded from mobile devices to edge clouds via wireless channels, and they can be further forwarded to remote clouds via the Internet. Our objective is to minimize the total energy consumption of multiple mobile devices, subject to bounded-delay requirements of tasks. Based on dynamic programming, we propose an algorithm that minimizes the energy consumption, by jointly allocating bandwidth and computational resources to mobile devices. The algorithm is of pseudo-polynomial complexity. To further reduce the complexity, we propose an approximation algorithm with energy discretization, and its total energy consumption is proved to be within a bounded gap from the optimum. Simulation results show that, nearly 82.7% energy of mobile devices can be saved by task offloading compared with mobile device execution.展开更多
The cutting technic for thePinus elliottii plantation of the multi-benefit management pattern in the hilly region of Jiangxi Province was studied by establishing the model of growth progress according to Richards func...The cutting technic for thePinus elliottii plantation of the multi-benefit management pattern in the hilly region of Jiangxi Province was studied by establishing the model of growth progress according to Richards function and simulating the tending cutting on computer by use of dynamic programming. The results showed that the best time for the initial thinning was at tree age of 8–10 and final cutting was at tree age of 25. The optimal thinning project was 3 times of thinning cutting including the first thinning, and the thinning time was at tree ages of 8, 12 and 16, respectively. Their thinning intensities were separately 950, 700 and 300 trunks per hectare, and the preserved density was 550 trunks per hectare until the final cutting Keywords Pinus elliottir - Multi-benefit management pattern - Richards function - Cutting technic - Dynamic programming CLC number S757.4 Document code A Foundation item: This study was supported by Natural Science Foundation of Jiangxi Province (A grant 0330023)Biography: WANG Qing-chun (1970-), male, Ph. Doctor, Senior Engineer in Academy of Forest Inventory & Planning, Jiangxi, Nanchang 330046, P.R. China.Responsible editor: Song Funan展开更多
Aim To find a more efficient learning method based on temporal difference learning for delayed reinforcement learning tasks. Methods A kind of Q learning algorithm based on truncated TD( λ ) with adaptive scheme...Aim To find a more efficient learning method based on temporal difference learning for delayed reinforcement learning tasks. Methods A kind of Q learning algorithm based on truncated TD( λ ) with adaptive schemes of λ value selection addressed to absorbing Markov decision processes was presented and implemented on computers. Results and Conclusion Simulations on the shortest path searching problems show that using adaptive λ in the Q learning based on TTD( λ ) can speed up its convergence.展开更多
An optimization model is established for a multi-product pipeline which has a known delivery demand and operation plan for each off-take station.The aim of this optimization model is to minimize the total pumping oper...An optimization model is established for a multi-product pipeline which has a known delivery demand and operation plan for each off-take station.The aim of this optimization model is to minimize the total pumping operation cost,considering not only factors including the energy equilibrium constraint,the maximum and minimum suction and discharge pressures constraints of pump stations,and pressure constraint at special elevation points,but also the regional differences in electricity prices along the pipeline.The dynamic programming method is applied to solve the model and to find the optimal pump configuration.展开更多
Single gimbal control moment gyroscope(SGCMG)with high precision and fast response is an important attitude control system for high precision docking,rapid maneuvering navigation and guidance system in the aerospace f...Single gimbal control moment gyroscope(SGCMG)with high precision and fast response is an important attitude control system for high precision docking,rapid maneuvering navigation and guidance system in the aerospace field.In this paper,considering the influence of multi-source disturbance,a data-based feedback relearning(FR)algorithm is designed for the robust control of SGCMG gimbal servo system.Based on adaptive dynamic programming and least-square principle,the FR algorithm is used to obtain the servo control strategy by collecting the online operation data of SGCMG system.This is a model-free learning strategy in which no prior knowledge of the SGCMG model is required.Then,combining the reinforcement learning mechanism,the servo control strategy is interacted with system dynamic of SGCMG.The adaptive evaluation and improvement of servo control strategy against the multi-source disturbance are realized.Meanwhile,a data redistribution method based on experience replay is designed to reduce data correlation to improve algorithm stability and data utilization efficiency.Finally,by comparing with other methods on the simulation model of SGCMG,the effectiveness of the proposed servo control strategy is verified.展开更多
This paper presents a new three-level hierarchical control parallel algorithm for large-scale systems by spatial and time decomposition. The parallel variable metric (PVM)method is found to be promising third-level al...This paper presents a new three-level hierarchical control parallel algorithm for large-scale systems by spatial and time decomposition. The parallel variable metric (PVM)method is found to be promising third-level algorithm. In the subproblems of second-level, the constraints of the smaller subproblem requires that the initial state of a subproblem equals the terminal state of the preceding subproblem. The coordinating variables are updated using the modified Newton method. the low-level smaller subproblems are solved in parallel using extended differential dynamic programmeing (DDP). Numerical result shows that comparing with one level DDP. the PVM /DDP algorithm obtains significant speed-ups.展开更多
In this paper,we study a model on joint decisions of scheduling and subcontracting, in which jobs(orders) can be either processed by parallel machines at the manufacturer in-house or subcontracted to a subcontractor.T...In this paper,we study a model on joint decisions of scheduling and subcontracting, in which jobs(orders) can be either processed by parallel machines at the manufacturer in-house or subcontracted to a subcontractor.The manufacturer needs to determine which jobs should be produced in-house and which jobs should be subcontracted.Furthermore,it needs to determine a production schedule for jobs to be produced in-house.We discuss five classical scheduling objectives as production costs.For each problem with different objective functions,we give optimality conditions and propose dynamic programming algorithms.展开更多
Aiming to reduce fuel consumption and emissions of a dual-clutch hybrid electric vehicle during cold start, multiobjective optimization for fuel consumption and HC/CO emission from a TWC(three-way catalytic converter)...Aiming to reduce fuel consumption and emissions of a dual-clutch hybrid electric vehicle during cold start, multiobjective optimization for fuel consumption and HC/CO emission from a TWC(three-way catalytic converter) outlet is presented in this paper. DP(dynamic programming) considering dual-state variables is proposed based on the Bellman optimality principle. Both the battery SOC(state of charge) and the temperature of TWC monolith are considered in the algorithm simultaneously. In this way the global optimal control strategy and the Pareto optimal solution of multi-objective function are derived. Simulation results show that the proposed method is able to promote the TWC light-off significantly by decreasing the engine's load and improving exhaust temperature from the outlet of the engine, in comparison with original DP considering the single battery SOC. Compared to the results achieved by rule-based control strategy, fuel economy and emission of TWC outlet for cold start are optimized comprehensively. Each indicator of Pareto solution set shows the significant improvement.展开更多
We establish a new type of backward stochastic differential equations(BSDEs)connected with stochastic differential games(SDGs), namely, BSDEs strongly coupled with the lower and the upper value functions of SDGs, wher...We establish a new type of backward stochastic differential equations(BSDEs)connected with stochastic differential games(SDGs), namely, BSDEs strongly coupled with the lower and the upper value functions of SDGs, where the lower and the upper value functions are defined through this BSDE. The existence and the uniqueness theorem and comparison theorem are proved for such equations with the help of an iteration method. We also show that the lower and the upper value functions satisfy the dynamic programming principle. Moreover, we study the associated Hamilton-Jacobi-Bellman-Isaacs(HJB-Isaacs)equations, which are nonlocal, and strongly coupled with the lower and the upper value functions. Using a new method, we characterize the pair(W, U) consisting of the lower and the upper value functions as the unique viscosity solution of our nonlocal HJB-Isaacs equation. Furthermore, the game has a value under the Isaacs’ condition.展开更多
We develop an online adaptive dynamic programming (ADP) based optimal control scheme for continuous-time chaotic systems. The idea is to use the ADP algorithm to obtain the optimal control input that makes the perfo...We develop an online adaptive dynamic programming (ADP) based optimal control scheme for continuous-time chaotic systems. The idea is to use the ADP algorithm to obtain the optimal control input that makes the performance index function reach an optimum. The expression of the performance index function for the chaotic system is first presented. The online ADP algorithm is presented to achieve optimal control. In the ADP structure, neural networks are used to construct a critic network and an action network, which can obtain an approximate performance index function and the control input, respectively. It is proven that the critic parameter error dynamics and the closed-loop chaotic systems are uniformly ultimately bounded exponentially. Our simulation results illustrate the performance of the established optimal control method.展开更多
In this paper, an optimal tracking control scheme is proposed for a class of discrete-time chaotic systems using the approximation-error-based adaptive dynamic programming (ADP) algorithm. Via the system transformat...In this paper, an optimal tracking control scheme is proposed for a class of discrete-time chaotic systems using the approximation-error-based adaptive dynamic programming (ADP) algorithm. Via the system transformation, the optimal tracking problem is transformed into an optimal regulation problem, and then the novel optimal tracking control method is proposed. It is shown that for the iterative ADP algorithm with finite approximation error, the iterative performance index functions can converge to a finite neighborhood of the greatest lower bound of all performance index functions under some convergence conditions. Two examples are given to demonstrate the validity of the proposed optimal tracking control scheme for chaotic systems.展开更多
A dynamic programming-sequential quadratic programming(DP-SQP)combined algorithm is proposed to address the problem that the traditional continuous control method has high computational complexity and is easy to fall ...A dynamic programming-sequential quadratic programming(DP-SQP)combined algorithm is proposed to address the problem that the traditional continuous control method has high computational complexity and is easy to fall into local optimal solution.To solve the globally optimal control law sequence,we use the dynamic programming algorithm to discretize the separation control decision-making process into a series of sub-stages based on the time characteristics of the separation allocation model,and recursion from the end stage to the initial stage.The sequential quadratic programming algorithm is then used to solve the optimal return function and the optimal control law for each sub-stage.Comparative simulations of the combined algorithm and the traditional algorithm are designed to validate the superiority of the combined algorithm.Aircraft-following and cross-conflict simulation examples are created to demonstrate the combined algorithm’s adaptability to various conflict scenarios.The simulation results demonstrate the separation deploy strategy’s effectiveness,efficiency,and adaptability.展开更多
基金supported by the National Natural Science Foundation of China(Grant Nos.61034002,61233001,61273140,61304086,and 61374105)the Beijing Natural Science Foundation,China(Grant No.4132078)
文摘A policy iteration algorithm of adaptive dynamic programming(ADP) is developed to solve the optimal tracking control for a class of discrete-time chaotic systems. By system transformations, the optimal tracking problem is transformed into an optimal regulation one. The policy iteration algorithm for discrete-time chaotic systems is first described. Then,the convergence and admissibility properties of the developed policy iteration algorithm are presented, which show that the transformed chaotic system can be stabilized under an arbitrary iterative control law and the iterative performance index function simultaneously converges to the optimum. By implementing the policy iteration algorithm via neural networks,the developed optimal tracking control scheme for chaotic systems is verified by a simulation.
基金partially supported by the National Science Foundation of China(Grants 71822105 and 91746210)。
文摘In short-term operation of natural gas network,the impact of demand uncertainty is not negligible.To address this issue we propose a two-stage robust model for power cost minimization problem in gunbarrel natural gas networks.The demands between pipelines and compressor stations are uncertain with a budget parameter,since it is unlikely that all the uncertain demands reach the maximal deviation simultaneously.During solving the two-stage robust model we encounter a bilevel problem which is challenging to solve.We formulate it as a multi-dimensional dynamic programming problem and propose approximate dynamic programming methods to accelerate the calculation.Numerical results based on real network in China show that we obtain a speed gain of 7 times faster in average without compromising optimality compared with original dynamic programming algorithm.Numerical results also verify the advantage of robust model compared with deterministic model when facing uncertainties.These findings offer short-term operation methods for gunbarrel natural gas network management to handle with uncertainties.
文摘An approach about large dynamic programming based on discrete linear system with a quadratic index function is proposed by importing two Lagrange multipliers.
文摘The convergence and stability of a value-iteration-based adaptive dynamic programming (ADP) algorithm are con- sidered for discrete-time nonlinear systems accompanied by a discounted quadric performance index. More importantly than sufficing to achieve a good approximate structure, the iterative feedback control law must guarantee the closed-loop stability. Specifically, it is firstly proved that the iterative value function sequence will precisely converge to the optimum. Secondly, the necessary and sufficient condition of the optimal value function serving as a Lyapunov function is investi- gated. We prove that for the case of infinite horizon, there exists a finite horizon length of which the iterative feedback control law will provide stability, and this increases the practicability of the proposed value iteration algorithm. Neural networks (NNs) are employed to approximate the value functions and the optimal feedback control laws, and the approach allows the implementation of the algorithm without knowing the internal dynamics of the system. Finally, a simulation example is employed to demonstrate the effectiveness of the developed optimal control method.
基金Project supported by the National Natural Science Foundation of China(Grant Nos.61304079,61673054,and 61374105)the Fundamental Research Funds for the Central Universities,China(Grant No.FRF-TP-15-056A3)the Open Research Project from SKLMCCS,China(Grant No.20150104)
文摘We develop an optimal tracking control method for chaotic system with unknown dynamics and disturbances. The method allows the optimal cost function and the corresponding tracking control to update synchronously. According to the tracking error and the reference dynamics, the augmented system is constructed. Then the optimal tracking control problem is defined. The policy iteration (PI) is introduced to solve the rain-max optimization problem. The off-policy adaptive dynamic programming (ADP) algorithm is then proposed to find the solution of the tracking Hamilton-Jacobi- Isaacs (HJI) equation online only using measured data and without any knowledge about the system dynamics. Critic neural network (CNN), action neural network (ANN), and disturbance neural network (DNN) are used to approximate the cost function, control, and disturbance. The weights of these networks compose the augmented weight matrix, and the uniformly ultimately bounded (UUB) of which is proven. The convergence of the tracking error system is also proven. Two examples are given to show the effectiveness of the proposed synchronous solution method for the chaotic system tracking problem.
基金supported by the National Natural Science Foundation of China under Grant No.61376024 and No.61306024Natural Science Foundation of Guangdong Province under Grant No.S2013040014366Basic Research Programme of Shenzhen No.JCYJ20140417113430642 and JCYJ20140901003939020
文摘Modulating both the clock frequency and supply voltage of the network-on-chip (NoC) during runtime can reduce the power consumption and heat flux, but will lead to the increase of the latency of NoC. It is necessary to find a tradeoff between power consumption and communication latency. So we propose an analytical latency model which can show us the relationship of them. The proposed model to analyze latency is based on the M/G/1 queuing model, which is suitable for dynamic frequency scaling. The experiment results show that the accuracy of this model is more than 90%.
文摘In the capacity planning of hydro-wind-solar power systems(CPHPS),it is crucial to use flexible hydropower to complement the variable wind-solar power.Hydropower units must be operated such that they avoid specific restricted operation zones,that is,forbidden zones(FZs),to avoid the risks associated with hydropower unit vibration.FZs cause limitations in terms of both the hydropower generation and flexible regulation in the hydro-wind-solar power systems.Therefore,it is essential to consider FZs when determining the optimal wind-solar power capacity that can be compensated by the hydropower.This study presents a mathematical model that incorporates the FZ constraints into the CPHPS problem.Firstly,the FZs of the hydropower units are converted into those of the hydropower plants based on set theory.Secondly,a mathematical model was formulated for the CPHPS,which couples the FZ constraints of hydropower plants with other operational constraints(e.g.,power balance constraints,new energy consumption limits,and hydropower generation functions).Thirdly,dynamic programming with successive approximations is employed to solve the proposed model.Lastly,case studies were conducted on the hydro-wind-solar system of the Qingshui River to demonstrate the effectiveness of the proposed model.
基金Project supported by the National Natural Science Foundation of China(Grant Nos.61304079 and 61374105)the Beijing Natural Science Foundation,China(Grant Nos.4132078 and 4143065)+2 种基金the China Postdoctoral Science Foundation(Grant No.2013M530527)the Fundamental Research Funds for the Central Universities,China(Grant No.FRF-TP-14-119A2)the Open Research Project from State Key Laboratory of Management and Control for Complex Systems,China(Grant No.20150104)
文摘This paper estimates an off-policy integral reinforcement learning(IRL) algorithm to obtain the optimal tracking control of unknown chaotic systems. Off-policy IRL can learn the solution of the HJB equation from the system data generated by an arbitrary control. Moreover, off-policy IRL can be regarded as a direct learning method, which avoids the identification of system dynamics. In this paper, the performance index function is first given based on the system tracking error and control error. For solving the Hamilton–Jacobi–Bellman(HJB) equation, an off-policy IRL algorithm is proposed.It is proven that the iterative control makes the tracking error system asymptotically stable, and the iterative performance index function is convergent. Simulation study demonstrates the effectiveness of the developed tracking control method.
基金the National Key R&D Program of China 2018YFB1800804the Nature Science Foundation of China (No. 61871254,No. 61861136003,No. 91638204)Hitachi Ltd.
文摘By Mobile Edge Computing(MEC), computation-intensive tasks are offloaded from mobile devices to cloud servers, and thus the energy consumption of mobile devices can be notably reduced. In this paper, we study task offloading in multi-user MEC systems with heterogeneous clouds, including edge clouds and remote clouds. Tasks are forwarded from mobile devices to edge clouds via wireless channels, and they can be further forwarded to remote clouds via the Internet. Our objective is to minimize the total energy consumption of multiple mobile devices, subject to bounded-delay requirements of tasks. Based on dynamic programming, we propose an algorithm that minimizes the energy consumption, by jointly allocating bandwidth and computational resources to mobile devices. The algorithm is of pseudo-polynomial complexity. To further reduce the complexity, we propose an approximation algorithm with energy discretization, and its total energy consumption is proved to be within a bounded gap from the optimum. Simulation results show that, nearly 82.7% energy of mobile devices can be saved by task offloading compared with mobile device execution.
基金Natural Science Foundation of Jiangxi Province (A grant 0330023)
文摘The cutting technic for thePinus elliottii plantation of the multi-benefit management pattern in the hilly region of Jiangxi Province was studied by establishing the model of growth progress according to Richards function and simulating the tending cutting on computer by use of dynamic programming. The results showed that the best time for the initial thinning was at tree age of 8–10 and final cutting was at tree age of 25. The optimal thinning project was 3 times of thinning cutting including the first thinning, and the thinning time was at tree ages of 8, 12 and 16, respectively. Their thinning intensities were separately 950, 700 and 300 trunks per hectare, and the preserved density was 550 trunks per hectare until the final cutting Keywords Pinus elliottir - Multi-benefit management pattern - Richards function - Cutting technic - Dynamic programming CLC number S757.4 Document code A Foundation item: This study was supported by Natural Science Foundation of Jiangxi Province (A grant 0330023)Biography: WANG Qing-chun (1970-), male, Ph. Doctor, Senior Engineer in Academy of Forest Inventory & Planning, Jiangxi, Nanchang 330046, P.R. China.Responsible editor: Song Funan
文摘Aim To find a more efficient learning method based on temporal difference learning for delayed reinforcement learning tasks. Methods A kind of Q learning algorithm based on truncated TD( λ ) with adaptive schemes of λ value selection addressed to absorbing Markov decision processes was presented and implemented on computers. Results and Conclusion Simulations on the shortest path searching problems show that using adaptive λ in the Q learning based on TTD( λ ) can speed up its convergence.
文摘An optimization model is established for a multi-product pipeline which has a known delivery demand and operation plan for each off-take station.The aim of this optimization model is to minimize the total pumping operation cost,considering not only factors including the energy equilibrium constraint,the maximum and minimum suction and discharge pressures constraints of pump stations,and pressure constraint at special elevation points,but also the regional differences in electricity prices along the pipeline.The dynamic programming method is applied to solve the model and to find the optimal pump configuration.
基金This work was supported by the National Natural Science Foundation of China(No.62022061)Tianjin Natural Science Foundation(No.20JCYBJC00880)Beijing Key Laboratory Open Fund of Long-Life Technology of Precise Rotation and Transmission Mechanisms.
文摘Single gimbal control moment gyroscope(SGCMG)with high precision and fast response is an important attitude control system for high precision docking,rapid maneuvering navigation and guidance system in the aerospace field.In this paper,considering the influence of multi-source disturbance,a data-based feedback relearning(FR)algorithm is designed for the robust control of SGCMG gimbal servo system.Based on adaptive dynamic programming and least-square principle,the FR algorithm is used to obtain the servo control strategy by collecting the online operation data of SGCMG system.This is a model-free learning strategy in which no prior knowledge of the SGCMG model is required.Then,combining the reinforcement learning mechanism,the servo control strategy is interacted with system dynamic of SGCMG.The adaptive evaluation and improvement of servo control strategy against the multi-source disturbance are realized.Meanwhile,a data redistribution method based on experience replay is designed to reduce data correlation to improve algorithm stability and data utilization efficiency.Finally,by comparing with other methods on the simulation model of SGCMG,the effectiveness of the proposed servo control strategy is verified.
文摘This paper presents a new three-level hierarchical control parallel algorithm for large-scale systems by spatial and time decomposition. The parallel variable metric (PVM)method is found to be promising third-level algorithm. In the subproblems of second-level, the constraints of the smaller subproblem requires that the initial state of a subproblem equals the terminal state of the preceding subproblem. The coordinating variables are updated using the modified Newton method. the low-level smaller subproblems are solved in parallel using extended differential dynamic programmeing (DDP). Numerical result shows that comparing with one level DDP. the PVM /DDP algorithm obtains significant speed-ups.
基金Supported by the National Natural Science Foundation of China(70731160015)Supported the National Natural Science Foundation of Jiangsu Province(yw06037)
文摘In this paper,we study a model on joint decisions of scheduling and subcontracting, in which jobs(orders) can be either processed by parallel machines at the manufacturer in-house or subcontracted to a subcontractor.The manufacturer needs to determine which jobs should be produced in-house and which jobs should be subcontracted.Furthermore,it needs to determine a production schedule for jobs to be produced in-house.We discuss five classical scheduling objectives as production costs.For each problem with different objective functions,we give optimality conditions and propose dynamic programming algorithms.
基金Funded by National Natural Science Foundation of China(No.51305472)National Natural Science Foundation of Chongqing Science and Technology Committee(No.cstc2014jcyj A60005)Natural Science Foundation of Chongqing Education Committee(No.KJ1400312)
文摘Aiming to reduce fuel consumption and emissions of a dual-clutch hybrid electric vehicle during cold start, multiobjective optimization for fuel consumption and HC/CO emission from a TWC(three-way catalytic converter) outlet is presented in this paper. DP(dynamic programming) considering dual-state variables is proposed based on the Bellman optimality principle. Both the battery SOC(state of charge) and the temperature of TWC monolith are considered in the algorithm simultaneously. In this way the global optimal control strategy and the Pareto optimal solution of multi-objective function are derived. Simulation results show that the proposed method is able to promote the TWC light-off significantly by decreasing the engine's load and improving exhaust temperature from the outlet of the engine, in comparison with original DP considering the single battery SOC. Compared to the results achieved by rule-based control strategy, fuel economy and emission of TWC outlet for cold start are optimized comprehensively. Each indicator of Pareto solution set shows the significant improvement.
基金supported by the NSF of China(11071144,11171187,11222110 and 71671104)Shandong Province(BS2011SF010,JQ201202)+4 种基金SRF for ROCS(SEM)Program for New Century Excellent Talents in University(NCET-12-0331)111 Project(B12023)the Ministry of Education of Humanities and Social Science Project(16YJA910003)Incubation Group Project of Financial Statistics and Risk Management of SDUFE
文摘We establish a new type of backward stochastic differential equations(BSDEs)connected with stochastic differential games(SDGs), namely, BSDEs strongly coupled with the lower and the upper value functions of SDGs, where the lower and the upper value functions are defined through this BSDE. The existence and the uniqueness theorem and comparison theorem are proved for such equations with the help of an iteration method. We also show that the lower and the upper value functions satisfy the dynamic programming principle. Moreover, we study the associated Hamilton-Jacobi-Bellman-Isaacs(HJB-Isaacs)equations, which are nonlocal, and strongly coupled with the lower and the upper value functions. Using a new method, we characterize the pair(W, U) consisting of the lower and the upper value functions as the unique viscosity solution of our nonlocal HJB-Isaacs equation. Furthermore, the game has a value under the Isaacs’ condition.
基金Project supported by the Open Research Project from the SKLMCCS(Grant No.20120106)the Fundamental Research Funds for the Central Universities of China(Grant No.FRF-TP-13-018A)+2 种基金the Postdoctoral Science Foundation of China(Grant No.2013M530527)the National Natural Science Foundation of China(Grant Nos.61304079 and 61374105)the Natural Science Foundation of Beijing,China(Grant No.4132078 and 4143065)
文摘We develop an online adaptive dynamic programming (ADP) based optimal control scheme for continuous-time chaotic systems. The idea is to use the ADP algorithm to obtain the optimal control input that makes the performance index function reach an optimum. The expression of the performance index function for the chaotic system is first presented. The online ADP algorithm is presented to achieve optimal control. In the ADP structure, neural networks are used to construct a critic network and an action network, which can obtain an approximate performance index function and the control input, respectively. It is proven that the critic parameter error dynamics and the closed-loop chaotic systems are uniformly ultimately bounded exponentially. Our simulation results illustrate the performance of the established optimal control method.
基金supported by the Open Research Project from SKLMCCS (Grant No. 20120106)the Fundamental Research Funds for the Central Universities of China (Grant No. FRF-TP-13-018A)+1 种基金the Postdoctoral Science Foundation of China (Grant No. 2013M530527)the National Natural Science Foundation of China (Grant Nos. 61304079, 61125306, and 61034002)
文摘In this paper, an optimal tracking control scheme is proposed for a class of discrete-time chaotic systems using the approximation-error-based adaptive dynamic programming (ADP) algorithm. Via the system transformation, the optimal tracking problem is transformed into an optimal regulation problem, and then the novel optimal tracking control method is proposed. It is shown that for the iterative ADP algorithm with finite approximation error, the iterative performance index functions can converge to a finite neighborhood of the greatest lower bound of all performance index functions under some convergence conditions. Two examples are given to demonstrate the validity of the proposed optimal tracking control scheme for chaotic systems.
基金supported in part by the National Natural Science Foundation of China(Nos.61773202,52072174)the Foundation of National Defense Science and Technology Key Laboratory of Avionics System Integrated Technology of China Institute of Aeronautical Radio Electronics(No.6142505180407)+1 种基金the Open Fund for Civil Aviation General Aviation Operation Key Laboratory of China Civil Aviation Management Cadre Institute(No.CAMICKFJJ-2019-04)the National key R&D plan(No.2021YFB1600500)。
文摘A dynamic programming-sequential quadratic programming(DP-SQP)combined algorithm is proposed to address the problem that the traditional continuous control method has high computational complexity and is easy to fall into local optimal solution.To solve the globally optimal control law sequence,we use the dynamic programming algorithm to discretize the separation control decision-making process into a series of sub-stages based on the time characteristics of the separation allocation model,and recursion from the end stage to the initial stage.The sequential quadratic programming algorithm is then used to solve the optimal return function and the optimal control law for each sub-stage.Comparative simulations of the combined algorithm and the traditional algorithm are designed to validate the superiority of the combined algorithm.Aircraft-following and cross-conflict simulation examples are created to demonstrate the combined algorithm’s adaptability to various conflict scenarios.The simulation results demonstrate the separation deploy strategy’s effectiveness,efficiency,and adaptability.