Cyclic Equilibria in Markov Games

Martin Zinkevich, Amy Greenwald, and Michael Littman

Abstract

Learning and decision making in multiagent sequential problems involves considering the effect of one's actions on both the future state of the world and the future behavior of the other agents. Value iteration, an algorithm that serves as the foundation for common reinforcement-learning schemes, computes optimal decisions for Markov decision processes, zero-sum Markov games, and team games on which the focus is only on future states---the behavior of other agents need not be contemplated in detail. Although variants of value iteration have been proposed for computing Nash or correlated equilibrium policies in general-sum Markov games by taking into consideration the behavior of other agents, these algorithms have not been shown to be effective in general. In this paper, we demonstrate by construction that a large class of ``multiagent'' value-iteration algorithms, including all existing variants, cannot find stationary equilibrium policies in arbitrary general-sum Markov games. Instead, we propose an alternative interpretation of the output of value iteration based on a novel non-stationary equilibrium concept that we call ``cyclic equilibria.'' We prove that value iteration identifies cyclic equilibria in a special class of games on which it fails to find stationary equilibria. We also demonstrate empirically that value iteration finds cyclic equilibria on nearly all examples drawn from a random distribution of Markov games.