NeurIPS, the Conference on Neural Information Processing Systems, is a multi-track interdisciplinary annual meeting that includes invited talks, demonstrations, symposia, and oral and poster presentations of refereed papers. This year, new research (“On the Expressivity of Markov Reward”) by Brown CS alums David Abel and Mark K. Ho (now at DeepMind and Princeton University, respectively), Professor Michael Littman, and their collaborators, Will Dabney, Anna Harutyunyan, Doina Precup, and Satinder Singh (all at DeepMind) has earned one of the event’s highest honors, the Outstanding Paper Award.
“Reward,” the authors explain, “is the driving force for reinforcement-learning agents. This paper is dedicated to understanding the expressivity of reward as a way to capture tasks that we would want an agent to perform.”
They frame their study around three new abstract notions of “task” that might be desirable:
a set of acceptable behaviors,
a partial ordering over behaviors, or
a partial ordering over trajectories.
Their main results prove that while reward can express many of these tasks, there exist instances of each task type that no Markov reward function can capture. The researchers then provide a set of polynomial-time algorithms that construct a Markov reward function that allows an agent to optimize tasks of each of these three types, and correctly determine when no such reward function exists. They conclude with an empirical study that corroborates and illustrates their theoretical findings.
This year, only six papers at NeurIPS were chosen as Outstanding Papers out of more than 9,000 submissions. The full list is available here.
For more information, click the link that follows to contact Brown CS Communication and Outreach Specialist Jesse C. Polhemus.