Smoothness in Reinforcement Learning with Large State and Action Space

PhD Thesis Proposal

Reinforcement learning (RL) is the study of the interaction between an environment and an agent that learns to achieve a goal through trial-and-error. Owing to its generality, RL has successfully been applied to various applications including those with enormous state and action spaces. In light of the curse of dimensionality, a fundamental question here is how to design RL algorithms that are compatible with function approximation but also capable of tackling longstanding challenges of RL including convergence guarantees, exploration-exploitation, and planning. In this thesis, I study RL algorithms in presence of function approximation and through the lense of smoothness formally defined using Lipschitz continuity. These algorithms are typically comprised of several key ingredients such as value functions, models, operators, and policies. I present theoretical results showing an essential role for the smoothness of these ingredients in stability and convergence of RL, effective model learning and planning, and state-of-the-art continuous control. Through several experiments and examples, I demonstrate how to adjust the amount of smoothness of these ingredients to improve the performance of RL in large problems.