4Sutton R S, Barto A G. Introduction to Reinforcement Learning [M]. Cambridge: MIT Press, 1998.
5Liu C, Xu X, Hu D. Multiobjeetive reinforcement learning: A comprehensive overview [J]. IEEE Trans on Systems, Man, and Cybernetics, Part C: Applications and Reviews, 2013, 99(4): 1-13.
6Sutton R S, Precup D, Singh S P. Between MDPs and semi MDPs : A framework for temporal abstraction in reinforcement learning [J]. Artificial Intelligence, 1999, 112 (1) : 181-211.
7Parr R. Hierachieal control and learning for markov decision processes [D]. Berkeley: University of Californiac at Berkeley, 1998.
8Hengst B. Discovering hierarchical reinforcement learning [D]. Sydney: University of New South Wales, 2003.
9Dietterich T G. Hierarchical reinforcement learning with the MAXQ value function decomposition [J]. Journal of Artificial Intelligence Research, 2000, 13(1): 227-303.
10Hwang K S, Lin H Y, Hsu Y P, et al. Self-organizing state aggregation for architecture design of Q-learning [J]. Information Sciences, 2011, 181(13) : 2813-2822.