Correlated and No-Regret Learning in Markov Games

Amy Greenwald and Keith Hall and David Gondek

Abstract

One objective of learning in game theory is to learn an equilibrium concept. Game theorists ask the question: in repeated games, does learning among high-rationality, highly-informed (e.g., fictitious play), or low-rationality, low-information (e.g., no-regret), agents converge to, for example, Nash equilibrium? One type of learning in AI is reinforcement learning, in which the goal is to learn Q-values. AI researchers ask the question: in Markov decision processes (or Markov games), does learning by locally updating Q-values converge to Q-values that support a globally optimal (or equilibrium) solution?

This talk describes one avenue for merging these two research agendas: the design of algorithms that simultaneously learn equilibrium concepts and Q-values, generalizing minimax-Q learning in zero-sum Markov games. The case for learning correlated equilibrium policies in general-sum Markov games is twofold: (i) correlated equilibria are easily computed by linear programming; and (ii) no-regret algorithms learn correlated equilibria. This talk introduces a family of no-regret, Q-learning algorithms, and presents the results of experimenting with these algorithms on a standard testbed of Markov games.