Tech Report CS-06-10
Bounds for Regret-Matching Algorithms
Amy Greenwald, Zheng Li and Casey Marks
We introduce a general class of learning algorithms, which we call regret-matching algorithms, along with a general framework for analyzing their performance in online decision problems. In an online decision problem, an agent repeatedly chooses from a set of actions and receives a reward that depends on its choice of action. Our analytical framework is based on a set Φ of transformations over the agent's set of actions. We calculate a Φ-regret vector by comparing the average reward obtained by an agent over some finite sequence of rounds to the average reward that could have been obtained had the agent instead played each transformation φ ε Φ of its sequence of actions. Regret-matching algorithms select the agent's next action based on the vector of Φ-regrets and a link function f. In this paper, we derive bounds on the regret experienced by (&Phi,f)-regret matching algorithms for polynomial and exponential link functions. In special cases, we improve upon the bounds reported in past work. Of greater interest, though, is the fact that our analysis is more general than others because it does not rely on Taylor's theorem. Hence, we can analyze algorithms based on a wider variety of link functions, in particular non-differentiable link functions, such as the class of polynomial link functions.
(complete text in pdf)