Brown CS News

Serena Booth Receives A Reinforcement Learning Conference Outstanding Paper Award

None
Click the links that follow for more news about Serena Booth and other recent accomplishments by Brown CS faculty.

Now in its second year, the Reinforcement Learning Conference, which aims to allow researchers to interact and share research in a more focused setting than typical large machine learning venues, was recently held in Edmonton, Canada. New research (“Towards Improving Reward Design in RL: A Reward Alignment Metric for RL Practitioners”) co-authored by Brown CS faculty member Serena Booth has received the conference’s Outstanding Paper Award for Emerging Topics in Reinforcement Learning. The paper builds on work originally pursued in Serena’s doctoral dissertation, and her collaborators include Calarina Muslimani and Matthew E. Taylor of the University of Alberta, Kerrick Johnstonbaugh of RL Core Technologies, Suyog Chandramouli of Princeton University, and W. Bradley Knox of the University of Texas at Austin.

“Reinforcement learning (RL) agents are fundamentally limited by the quality of the reward functions they learn from,” the researchers explain, “yet reward design is often overlooked under the assumption that a well-defined reward is readily available. However, in practice, designing rewards is difficult, and even when specified, evaluating their correctness is equally challenging: how do we know if a reward function is correctly specified? In our work, we address these challenges by focusing on reward alignment – assessing whether a reward function accurately encodes the preferences of a human stakeholder.”

As a concrete measure of reward alignment, the researchers introduce the Trajectory Alignment Coefficient to quantify the similarity between a human stakeholder's ranking of trajectory distributions and those induced by a given reward function. In a small user study of RL practitioners, they find that access to the Trajectory Alignment Coefficient during reward selection leads to statistically significant improvements in experts’ abilities to select correct reward functions. Compared to relying only on reward functions, their metric reduces cognitive workload by 1.5x, is preferred by 82% of users and increases the success rate of selecting reward functions that produce performant policies by 41%.

Serena’s research explores human-robot interaction, trustworthy AI, and mechanisms to ensure that AI systems and robots align with human values. Her recent work focuses on assisting people in designing specifications for AI systems to prevent the common problems of misspecification, overspecification, and underspecification. She’s also worked on AI policy issues addressing the use of AI systems in high-stakes applications such as housing and banking.

For more information, click the link that follows to contact Brown CS Communications Manager Jesse C. Polhemus.