The Dynamics of Reinforcement Learning in

Cooperative Multiagent Systems

 

Craig Boutilier, University of British Columbia

Reinforcement learning can provide a robust and natural means for agents to learn how to coordinate their action choices in cooperative multiagent systems. We examine some of the factors that can influence the dynamics of the learning process in such a setting.  We first distinguish reinforcement learners  that are unaware of (or ignore) the presence of other agents from those that explicitly attempt to learn the value of joint actions and the strategies of their counterparts. We study Q-learning in cooperative multiagent systems under these two perspectives, focusing on the influence of partial action observability, game structure, and exploration strategies on convergence to (optimal and suboptimal) Nash equilibria and on learned Q-values. We also consider variants of the usual exploration strategies that can induce convergence to optimal equilibria in cases where they might not otherwise be attained.
 
Joint work with Caroline Claus
 

Kee-Eung Kim

Get Back