Cyclic Equilibria in Markov Games
Martin Zinkevich, Amy Greenwald, and Michael Littman
Abstract
Learning and decision making in multiagent sequential problems
involves considering the effect of one's actions on both the
future state of the world and the future behavior of the other
agents. Value iteration, an algorithm that serves as the
foundation for common reinforcement-learning schemes, computes
optimal decisions for Markov decision processes, zero-sum Markov
games, and team games on which the focus is only on future
states---the behavior of other agents need not be contemplated in
detail. Although variants of value iteration have been proposed
for computing Nash or correlated equilibrium policies in
general-sum Markov games by taking into consideration the behavior
of other agents, these algorithms have not been shown to be
effective in general. In this paper, we demonstrate by
construction that a large class of ``multiagent'' value-iteration
algorithms, including all existing variants, cannot find
stationary equilibrium policies in arbitrary general-sum Markov
games. Instead, we propose an alternative interpretation of the
output of value iteration based on a novel non-stationary
equilibrium concept that we call ``cyclic equilibria.'' We prove
that value iteration identifies cyclic equilibria in a special
class of games on which it fails to find stationary equilibria.
We also demonstrate empirically that value iteration finds cyclic
equilibria on nearly all examples drawn from a random distribution
of Markov games.