CSCI2951-F: Learning and Sequential Decision Making

Brown University
Fall 2013
Michael L. Littman (mlittman@cs.brown.edu, CIT 301)
TA: Hannah Quay-de la Vallee (hannahqd@cs.brown.edu)
course design: James MacGlashan (jmacglashan@cs.brown.edu, CIT 460)

Time: TTh 3-4:20
Place: Brown CIT 506
Semester: Fall 2013

Office hours: Hannah 4-5pm Tuesdays and by appointment.


Description: Through a combination of classic papers and more recent work, the course explores automated decision making from a computer-science perspective. It examines efficient algorithms, where they exist, for single agent and multiagent planning as well as approaches to learning near-optimal decisions from experience. Topics include Markov decision processes, stochastic and repeated games, partially observable Markov decision processes, and reinforcement learning. Of particular interest will be issues of generalization, exploration, and representation. Students will replicate a result in a published paper in the area. Participants should have taken a graduate-level machine-learning course and should have had some exposure to reinforcement learning from a previous computer-science class or seminar; check with instructor if not sure.

Prerequisites: CSCI 1950F or CSCI 1420 or permission of the instructor.

Online quizzes: There will be one online quiz per class to solidify the concepts.

Result replication presentation: Students will form into small groups of two to four, and select a relevant paper from the literature. They will choose a graph in the paper and create an independent implementation/replication of this result. Students often find that important parameters needed to replicate the result are not stated in the paper and that obtaining the same pattern of results is sometimes not possible. Students will present their work at the end of the semester. Grades are based on the fidelity of the replication (25%), how well they show they understand the original paper (25%), the quality of the presentation itself in terms of clarity and creativity (25%), and their short written report (25%). The grade on this project will represent 50% of the final grade in the class.

BURLAP: We will try to use and extend BURLAP, the Brown-UMBC Reinforcement Learning and Planning system, to learn about the algorithms in the class and for the result replications.

Grading: Final grade is derived from: Online quizzes (50%), result replication presentation (50%).

Calendar

Suggested project papers.