Project Groups
-  Stephen, Mark:
  
	Apprenticeship Learning via Inverse Reinforcement Learning
	by Abbeel and Ng (ICML 2004).
-  Nakul, David, Thomas: 
  
	Generalizing Plans to New Environments in Relational MDPs
	by Guestrin, Koller, Gearhart, and Kanodia (IJCAI 2003).
-  Spandan, Anubhav, Daniel:
  
	Regularization and Feature Selection in Least-Squares Temporal
	Difference Learning by Kolter and Ng (ICML 2009).
-  Esha, John, Michail: 
  
	Coco-Q: Learning in Stochastic Games with Side Payments by
  	Sodomka, Hilliard, Littman, and Greenwald (ICML 2013). 
-  Chau, Takehiro:
  
	Potential-based Shaping in Model-based Reinforcement
	Learning by Asmuth, Littman, and Zinkov (AAAI 2008).
-  Michael:
 Dopamine-dependent
  prediction errors underpin reward-seeking behaviour in humans by
  Pessiglione, Seymour, Flandin, Dolan, and Frith (Nature 2006).
CSCI2951-F Projects
Description: Our projects this semester have dual goals.
First, as mentioned in the course description, we form into small
groups of two to four, and each group will select a relevant paper
from the literature.  The group will choose a graph from the paper and
create an independent implementation/replication of this result.
Grades are based on the fidelity of the replication (25%), a
demonstration of understanding of the original paper (25%), the
quality of the presentation itself in terms of clarity and creativity
(25%), and the short written report (25%).  The grade on this project
will represent 50% of the final grade in the class.
Second, we will use this opportunity to extend BURLAP, the Brown-UMBC
Reinforcement Learning and Planning system, and to get it ready for a
more public release.
Here are papers that describe functionality we'd really like to see in
BURLAP.  I'd like to see all these papers covered by groups in the
class.
BURLAP is a java code library for the use and development of single or
multi-agent planning and learning algorithms and domains to accompany
them.  At the core of the library is a rich state and domain
representation framework based on the OO-MDP paradigm that facilitates
the creation of discrete, continuous, or relational domains that can
consist of any number of different "objects" in the world.  Planning
and learning algorithms range from classic forward search planners to
value function-based stochastic planning and learning algorithms.
Also included is a set of analysis tools such as a common framework
for the visualization of domains and agent performance in various
domains.