受限于模型方程决定的固定结构,传统四旋翼控制器设计难以有效应对模型参数和环境扰动变化带来的控制误差。提出了基于深度强化学习的四旋翼航迹跟踪控制方法,构建了对应的马尔可夫决策模型,并基于PPO框架提出了PPO-SAG(PPO with self a...受限于模型方程决定的固定结构,传统四旋翼控制器设计难以有效应对模型参数和环境扰动变化带来的控制误差。提出了基于深度强化学习的四旋翼航迹跟踪控制方法,构建了对应的马尔可夫决策模型,并基于PPO框架提出了PPO-SAG(PPO with self adaptive guide)算法。PPO-SAG在学习过程中加入自适应机制,利用PID专家知识进行引导和学习,提高了训练的收敛效果和稳定性。根据问题特点,设计了带有距离约束惩罚和熵策略的目标函数,提出扰动误差信息补充结构和航迹特征选择结构,补充控制误差信息、提取未来航迹关键要素,提高了收敛效果。并利用状态动态标准化、优势函数批标准化及奖励缩放策略,更合理地处理三维空间中的状态表征和奖励优势表达。单种航迹与混合航迹实验表明,所提出的PPO-SAG算法在收敛效果和稳定性上均取得了最好的效果,消融实验说明所提出的改进机制和结构均起到正向作用。所研究的未知扰动下基于深度强化学习的四旋翼航迹跟踪控制问题,为设计更加鲁棒高效的四旋翼控制器提供了解决方案。展开更多
The formation maintenance of multiple unmanned aerial vehicles(UAVs)based on proximity behavior is explored in this study.Individual decision-making is conducted according to the expected UAV formation structure and t...The formation maintenance of multiple unmanned aerial vehicles(UAVs)based on proximity behavior is explored in this study.Individual decision-making is conducted according to the expected UAV formation structure and the position,velocity,and attitude information of other UAVs in the azimuth area.This resolves problems wherein nodes are necessarily strongly connected and communication is strictly consistent under the traditional distributed formation control method.An adaptive distributed formation flight strategy is established for multiple UAVs by exploiting proximity behavior observations,which remedies the poor flexibility in distributed formation.This technique ensures consistent position and attitude among UAVs.In the proposed method,the azimuth area relative to the UAV itself is established to capture the state information of proximal UAVs.The dependency degree factor is introduced to state update equation based on proximity behavior.Finally,the formation position,speed,and attitude errors are used to form an adaptive dynamic adjustment strategy.Simulations are conducted to demonstrate the effectiveness and robustness of the theoretical results,thus validating the effectiveness of the proposed method.展开更多
A feedforward approach for generating near time optimal controller for flexible spacecraft rest-to-rest maneuvers is presented with the objective insensitivity to modeling errors, parameter uncertainty and minimizing ...A feedforward approach for generating near time optimal controller for flexible spacecraft rest-to-rest maneuvers is presented with the objective insensitivity to modeling errors, parameter uncertainty and minimizing the residual energy of the flexible modes. The perturbation estimation of flexible appendages to the rigid-hub is accomplished simply via compare the output of real plant with the reference model, and the approach is based on combine this estimation with the bang-bang control for the rigid-hub modes through analysis the basic constraint and the additional constraint, i.e. zero coupling torque and zero coupling torque derivative for general two orders system and three orders system with considerate attitude acceleration mode near time optimal controls. These time optimal controls with control constraints and state constraints leads to forming a boundary-value problem, and resolved the problem using an iterative numerical algorithm. The near time optimal control with perturbation estimation shows a good robust to parameter uncertainty and can suppress the vibration and minimizing the residual energy. The capability of this approach is demonstrated through a numerical example in detail.展开更多
文摘受限于模型方程决定的固定结构,传统四旋翼控制器设计难以有效应对模型参数和环境扰动变化带来的控制误差。提出了基于深度强化学习的四旋翼航迹跟踪控制方法,构建了对应的马尔可夫决策模型,并基于PPO框架提出了PPO-SAG(PPO with self adaptive guide)算法。PPO-SAG在学习过程中加入自适应机制,利用PID专家知识进行引导和学习,提高了训练的收敛效果和稳定性。根据问题特点,设计了带有距离约束惩罚和熵策略的目标函数,提出扰动误差信息补充结构和航迹特征选择结构,补充控制误差信息、提取未来航迹关键要素,提高了收敛效果。并利用状态动态标准化、优势函数批标准化及奖励缩放策略,更合理地处理三维空间中的状态表征和奖励优势表达。单种航迹与混合航迹实验表明,所提出的PPO-SAG算法在收敛效果和稳定性上均取得了最好的效果,消融实验说明所提出的改进机制和结构均起到正向作用。所研究的未知扰动下基于深度强化学习的四旋翼航迹跟踪控制问题,为设计更加鲁棒高效的四旋翼控制器提供了解决方案。
文摘The formation maintenance of multiple unmanned aerial vehicles(UAVs)based on proximity behavior is explored in this study.Individual decision-making is conducted according to the expected UAV formation structure and the position,velocity,and attitude information of other UAVs in the azimuth area.This resolves problems wherein nodes are necessarily strongly connected and communication is strictly consistent under the traditional distributed formation control method.An adaptive distributed formation flight strategy is established for multiple UAVs by exploiting proximity behavior observations,which remedies the poor flexibility in distributed formation.This technique ensures consistent position and attitude among UAVs.In the proposed method,the azimuth area relative to the UAV itself is established to capture the state information of proximal UAVs.The dependency degree factor is introduced to state update equation based on proximity behavior.Finally,the formation position,speed,and attitude errors are used to form an adaptive dynamic adjustment strategy.Simulations are conducted to demonstrate the effectiveness and robustness of the theoretical results,thus validating the effectiveness of the proposed method.
文摘A feedforward approach for generating near time optimal controller for flexible spacecraft rest-to-rest maneuvers is presented with the objective insensitivity to modeling errors, parameter uncertainty and minimizing the residual energy of the flexible modes. The perturbation estimation of flexible appendages to the rigid-hub is accomplished simply via compare the output of real plant with the reference model, and the approach is based on combine this estimation with the bang-bang control for the rigid-hub modes through analysis the basic constraint and the additional constraint, i.e. zero coupling torque and zero coupling torque derivative for general two orders system and three orders system with considerate attitude acceleration mode near time optimal controls. These time optimal controls with control constraints and state constraints leads to forming a boundary-value problem, and resolved the problem using an iterative numerical algorithm. The near time optimal control with perturbation estimation shows a good robust to parameter uncertainty and can suppress the vibration and minimizing the residual energy. The capability of this approach is demonstrated through a numerical example in detail.