Learn to Human-level Control in Dynamic Environment Using Incremental Batch Interrupting Temporal Abstraction
- School of Computer Science and Technology, Soochow University Shizi
Street No.1 Box 158, Suzhou, China, 215006
yuchenfu@suda.edu.cn, 20134227052@stu.suda.edu.cn, fzhufei, quanliug@suda.edu.cn - Provincial Key Laboratory for Computer Information Processing Technology
Soochow University - Collaborative Innovation Center of Novel Software Technology and Industrialization
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University
Changchun, 130012, P.R. China - University of Basque Country
Spanish
xzhou001@ikasle.ehu.eus
Abstract
The temporal world is characterized by dynamic and variance. A lot of machine learning algorithms are difficult to be applied to practical control applications directly, while hierarchical reinforcement learning can be used to deal with them. Meanwhile, it is a commonplace to have some partial solutions available, called options, which are learned from knowledge or predefined by the system, to solve sub-tasks of the problem. The option can be reused for policy determination in control. Many traditional semi-Markov decision process methods take advantage of it. But most of them treat the option as a primitive object. However, due to the uncertainty and variability of the environment, they are unable to deal with real world control problems effectively. Based on the idea of interrupting option under the prerequisite for dynamic environment, a Q-learning control method which uses temporal abstraction, named as I-QOption, is introduced. The I-QOption approach combines the idea of interruption with the characteristics of dynamic environment so as to be able to learn and improve control policy in dynamic environment. The Q-learning framework helps to learn from interaction with raw data and achieving human-level control. The I-QOption algorithm is applied to grid world, a benchmark dynamic environment evaluation testing. The experiment results show that the proposed algorithm can learn and improve policy effectively in dynamic environment.
Key words
hierarchical reinforcement learning, option, reinforcement learning, online learning, dynamic environment
Digital Object Identifier (DOI)
https://doi.org/10.2298/CSIS160210015F
Publication information
Volume 13, Issue 2 (June 2016)
Year of Publication: 2016
ISSN: 2406-1018 (Online)
Publisher: ComSIS Consortium
Full text
Available in PDF
Portable Document Format
How to cite
Fu, Y., Xu, Z., Zhu, F., Liu, Q., Zhou, X.: Learn to Human-level Control in Dynamic Environment Using Incremental Batch Interrupting Temporal Abstraction. Computer Science and Information Systems, Vol. 13, No. 2, 561–577. (2016), https://doi.org/10.2298/CSIS160210015F