Tech Report CS-96-10

A Generalized Reinforcement-Learning Model: Convergence and Applications

Michael L. Littman and Csaba Szepesva\'ri

February 1996


Reinforcement learning is the process by which an autonomous agent uses its experience interacting with an environment to improve its behavior. The Markov decision process (MDP) model is a popular way of formalizing the reinforcement-learning problem, but it is by no means the only way. In this paper, we show how many of the important theoretical results concerning reinforcement learning in MDPs extend to a generalized MDP model that includes MDPs, two-player games and MDPs under a worst-case optimality criterion as special cases. The basis of this extension is a stochastic-approximation theorem that reduces asynchronous convergence to synchronous convergence.

Keywords: Reinforcement learning, Q-learning convergence, Markov games

(complete text in pdf or gzipped postscript)