AUTHOR: Michael Lederman Littman

ADVISOR: Leslie Pack Kaelbling

TIME: 1pm, Pratice Defense Date: Thursday, February 8th

ABSTRACT:

Sequential decision making is a fundamental task faced by any intelligent agent in an extended interaction with its environment; it is the act of answering the question ``What should I do now?'' In this thesis, I show how this question can be answered when ``now'' is one of a finite set of states, ``do'' is one of a finite set of actions, ``should'' is maximize a long-run measure of reward, and ``I'' is an automated planning or learning system (agent). In particular, I collect basic results concerning methods for finding optimal (or near optimal) behavior in several different kinds of model environments: Markov decision processes, in which the agent always knows its state; partially observable Markov decision processes (POMDPs), in which the agent must piece together its state on the basis of observations it makes; and Markov games, in which the agent is in direct competition with an opponent. The thesis is written from a computer-science perspective, meaning that many mathematical details are not discussed, whereas descriptions of algorithms and the complexity of problems are emphasized. New results include an improved algorithm for solving POMDPs exactly over finite horizons, a method for learning minimax-optimal policies for Markov games, a pseudopolynomial bound for policy iteration, and a complete complexity theory for the problem of finding zero-reward POMDP policies.