I am a 5th year Ph.D. candidate at Brown University working with Professor Michael Littman and Professor George Konidaris. My career goal is to understand computational principles underlying intelligence. More specifically, I study agents that interact with a sequential environment to improve their behavior through trial and error. This is nicely formulated as a Reinforcement Learning problem. You can read my research statement here.
I am also a passionate animal advocate.
Ph.D. in Computer Science, 2015 - Present
Brown University
M.Sc. in Computer Science, 2015
University of Alberta
B.Eng. in Computer Engineering, 2013
University of Tehran
Reinforcement learning (RL) is the study of the interaction between an environment and an agent that learns to achieve a goal through trial-and-error. Owing to its generality, RL has successfully been applied to various applications including those with enormous state and action spaces. In light of the curse of dimensionality, a fundamental question here is how to design RL algorithms that are compatible with function approximation but also capable of tackling longstanding challenges of RL including convergence guarantees, exploration-exploitation, and planning.
A core operation in reinforcement learning (RL) is finding an action that is optimal with respect to a learned state–action value function. This operation is often challenging when the learned value function takes continuous actions as input. We introduce deep RBF value functions: state–action value functions learned using a deep neural network with a radial-basis function (RBF) output layer. We show that the optimal action with respect to a deep RBF value function can be easily approximated up to any desired accuracy.
We consider the problem of knowledge transfer when an agent is facing a series of Reinforcement Learning (RL) tasks. We introduce a novel metric between Markov Decision Processes and establish that close MDPs have close optimal value functions. Formally, the optimal value functions are Lipschitz continuous with respect to the tasks space. These theoretical results lead us to a value transfer method for Lifelong RL, which we use to build a PAC-MDP algorithm with improved convergence rate.
When environmental interaction is expensive, model-based reinforcement learning offers a solution by planning ahead and avoiding costly mistakes. Model-based agents typically learn a single-step transition model. In this paper, we propose a multi-step model that predicts the outcome of an action sequence with variable length. We show that this model is easy to learn, and that the model can make policy-conditional predictions. We report preliminary results that show a clear advantage for the multi-step model compared to its one-step counterpart.