摘要
单站点传送带给料加工站(Conveyor-serviced production station,CSPS)系统中,可运用强化学习对状态–行动空间进行有效探索,以搜索近似最优的前视距离控制策略.但是多站点CSPS系统的协同控制问题中,系统状态空间的大小会随着站点个数的增加和缓存库容量的增加而成指数形式(或几何级数)增长,从而导致维数灾,影响学习算法的收敛速度和优化效果.为此,本文在站点局域信息交互机制的基础上引入状态聚类的方法,以减小每个站点学习空间的大小和复杂性.首先,将多个站点看作相对独立的学习主体,且各自仅考虑邻近下游站点的缓存库的状态并纳入其性能值学习过程;其次,将原状态空间划分成多个不相交的子集,每个子集用一个抽象状态表示,然后,建立基于状态聚类的多站点反馈式Q学习算法.通过该方法,可在抽象状态空间上对各站点的前视距离策略进行优化学习,以寻求整个系统的生产率最大.仿真实验结果说明,与一般的多站点反馈式Q学习方法相比,基于状态聚类的多站点反馈式Q学习方法不仅具有收敛速度快的优点,而且还在一定程度上提高了系统生产率.
In a single conveyor-serviced production station (CSPS) system, we can learn an approximate optimal look- ahead policy by reinforcement learning (RL) through exploring the state-action space. However, for the coordinate control problem in a multiple CSPS system, the state space will grow exponentially or geometrically as the number of stations and the capacity of buffer increase. As a result, the learning process will suffer from the curse of dimensionality, which may have a negative influence on convergence speed and optimized value. Therefore, by combining a local information interaction mechanism among stations, we introduce a state aggregation method to reduce the size and complexity of each station's leaning space. Firstly, each station is regarded as an independent learning agent that incorporates only the buffer state of its nearest downstream station into its own learning process. Secondly, the original state space is divided into several disjoint sets and each set is represented by an abstract state, and a multiple-agent state aggregation feedback Q-learning (SAFQL) algorithm is proposed afterwards. Through our proposed approach, the agent can learn an optimized look-ahead policy over the abstract state space to improve the entire system's processing rate. Finally, we demonstrate by a numerical example that, in comparison to general feedback Q-learning algorithm, SAFQL algorithm can not only fasten the convergence speed, but also improve the processing rate in some degree.
出处
《自动化学报》
EI
CSCD
北大核心
2014年第5期901-908,共8页
Acta Automatica Sinica
基金
国家自然科学基金(61174186
71231004)
国家国际科技合作项目(2011FA10440)
教育部新世纪优秀人才计划项目(NCET-11-0626)
高等学校博士学科点专项科研基金(20130111110007)资助~~
关键词
多站点CSPS系统
局域信息交互
状态聚类
反馈式Q学习
Multiple conveyor-serviced production station (CSPS), local information interaction, state aggregation, feedback Q-learning (SAFQL)
作者简介
唐昊 合肥工业大学电气与自动化工程学院教授.2002年获得中国科学技术大学博士学位.主要研究方向为离散事件动态系统,强化学习,神经元动态规划及智能优化.本文通信作者.E-mail:htang@hfut.edu.cn
裴荣 合肥工业大学计算机与信息学院硕士研究生.2010年获得合肥工业大学计算机与信息学院学士学位.主要研究方向为强化学习,生产线优化.E-mail:peirong_1987@163.com
周雷 合肥工业大学计算机与信息学院博士研究生.2006年获得合肥工业大学计算机与信息学院硕士学位.主要研究方向为离散事件动态系统,强化学习,智能优化方法.Email:zhouleizhl@163.com
谭琦 合肥工业大学电气与自动化工程学院讲师,博士.主要研究方向为生产优化调度,智能计算方法.E-mail:tanqi@hfut.edu.cn