Correlated and No-Regret Learning in Markov Games
Amy Greenwald and Keith Hall and David Gondek
Abstract
One objective of learning in game theory is to learn an equilibrium
concept. Game theorists ask the question: in repeated games, does
learning among high-rationality, highly-informed (e.g., fictitious
play), or low-rationality, low-information (e.g., no-regret), agents
converge to, for example, Nash equilibrium? One type of learning in
AI is reinforcement learning, in which the goal is to learn Q-values.
AI researchers ask the question: in Markov decision processes (or
Markov games), does learning by locally updating Q-values converge to
Q-values that support a globally optimal (or equilibrium) solution?
This talk describes one avenue for merging these two research agendas:
the design of algorithms that simultaneously learn equilibrium
concepts and Q-values, generalizing minimax-Q learning in zero-sum
Markov games. The case for learning correlated equilibrium policies
in general-sum Markov games is twofold: (i) correlated equilibria are
easily computed by linear programming; and (ii) no-regret algorithms
learn correlated equilibria. This talk introduces a family of no-regret,
Q-learning algorithms, and presents the results of experimenting with
these algorithms on a standard testbed of Markov games.