Tech Report CS-14-01

Translating English to Reward Functions

James MacGlashan, Monica Babes-Vroman, Marie desJardins, Michael Littman, Smaranda Muresan and Shawn Squire

April 2014


For intelligent agents and robots to be useful to the general public, people will need to able to communicate the tasks they want the agents to complete without having any technical knowledge or programming ability. Communicating tasks to agents via natural language is an especially appealing way to accomplish this goal. Similarly, providing agents with demonstrations of what they should do given a natural language command is an appealing low-effort way to train agents. In this work, we present a novel generative task model, in which tasks are defined by MDP reward functions. This generative task model can be combined with different language models to produce a complete system that can learn the meaning of individual words from demonstrations. Because these meanings are grounded in reward functions, goals can be executed that require complex multiple-step behavior. We present two language models that can be used with our task model and empirically validate them on a dataset with natural language commands gathered from a user study.

(complete text in pdf)