Tech Report CS-05-08
Amy Greenwald, Keith Hall and Martin Zinkevich
Recently, there have been several attempts to design multiagent learning algorithms that learn equilibrium policies in general-sum Markov games, just as Q-learning learns optimal policies in Markov decision processes. This paper introduces correlated-Q learning, one such algorithm. The contributions of this paper are twofold: (i) We show empirically that correlated-Q learns correlated equilibrium policies on a standard test bed of Markov games. (ii) We prove that certain variants of correlated-Q learning are guaranteed to converge to stationary correlated equilibrium policies in two special classes of Markov games, namely zero-sum and common-interest.
(complete text in pdf)