On-line Q-learner Using Moving Prototypes

Abstract

One of the most important breakthroughs in reinforcement learning has been the development of an off-policy control algorithm known as Q-learning. Unfortunately, in spite of its advantages, this method is only practical on a small number of problems. One reason is that a large number of training iterations is required to find a semi-optimal policy in even modest-sized problems. The other reason is that, often, the memory resources required by this method become too large.

At the heart of the Q-learning method is a function, which is called the Q-function. Modeling this function is what takes most of the memory resources used by this method. Several methods have been devised to tackle the Q-learning’s shortcomings, with relatively good success. However, even the most promising methods do a poor job at distributing the memory resources available to model the Q-function, which, in turn, limits the number of problems that can be solved by Q-learning. A new method called Moving Prototypes is proposed to alleviate this problem.


Interested in reading the entire thesis? (152 pages, 566,318 bytes, pdf)
Interested in viewing the Powerpoint presentation?


Homepage


Last modified: July 12, 2005 -- © François Cellier