Project Groups

Stephen, Mark: Apprenticeship Learning via Inverse Reinforcement Learning by Abbeel and Ng (ICML 2004).
Nakul, David, Thomas: Generalizing Plans to New Environments in Relational MDPs by Guestrin, Koller, Gearhart, and Kanodia (IJCAI 2003).
Spandan, Anubhav, Daniel: Regularization and Feature Selection in Least-Squares Temporal Difference Learning by Kolter and Ng (ICML 2009).
Esha, John, Michail: Coco-Q: Learning in Stochastic Games with Side Payments by Sodomka, Hilliard, Littman, and Greenwald (ICML 2013).
Chau, Takehiro: Potential-based Shaping in Model-based Reinforcement Learning by Asmuth, Littman, and Zinkov (AAAI 2008).
Michael: Dopamine-dependent prediction errors underpin reward-seeking behaviour in humans by Pessiglione, Seymour, Flandin, Dolan, and Frith (Nature 2006).

CSCI2951-F Projects

Description: Our projects this semester have dual goals. First, as mentioned in the course description, we form into small groups of two to four, and each group will select a relevant paper from the literature. The group will choose a graph from the paper and create an independent implementation/replication of this result. Grades are based on the fidelity of the replication (25%), a demonstration of understanding of the original paper (25%), the quality of the presentation itself in terms of clarity and creativity (25%), and the short written report (25%). The grade on this project will represent 50% of the final grade in the class.

Second, we will use this opportunity to extend BURLAP, the Brown-UMBC Reinforcement Learning and Planning system, and to get it ready for a more public release.

Here are papers that describe functionality we'd really like to see in BURLAP. I'd like to see all these papers covered by groups in the class.

Fourier.
RBF (what paper?)
iFDD.
OMPTD.
BEBF.
Feature selection in RL.
LSPI.
LSPI variants (?)
Rmax with shaping.
multiagent Q-learning.
Natural gradient actor-critic.
domains (would be good to have a list of papers)
IRL.
GTD-learning.
point-based value iteration.
Relational RL for Freecraft.
relational RL for blocks world.

BURLAP is a java code library for the use and development of single or multi-agent planning and learning algorithms and domains to accompany them. At the core of the library is a rich state and domain representation framework based on the OO-MDP paradigm that facilitates the creation of discrete, continuous, or relational domains that can consist of any number of different "objects" in the world. Planning and learning algorithms range from classic forward search planners to value function-based stochastic planning and learning algorithms. Also included is a set of analysis tools such as a common framework for the visualization of domains and agent performance in various domains.