Algorithms for Sequential Decision Making
AUTHOR: Michael Lederman Littman
ADVISOR: Leslie Pack Kaelbling
TIME: 1pm, Pratice Defense Date: Thursday, February 8th
ABSTRACT:
Sequential decision making is a fundamental task faced by any
intelligent agent in an extended interaction with its environment; it
is the act of answering the question ``What should I do now?'' In
this thesis, I show how this question can be answered when ``now'' is
one of a finite set of states, ``do'' is one of a finite set of
actions, ``should'' is maximize a long-run measure of reward, and
``I'' is an automated planning or learning system (agent). In
particular, I collect basic results concerning methods for finding
optimal (or near optimal) behavior in several different kinds of model
environments: Markov decision processes, in which the agent always
knows its state; partially observable Markov decision processes
(POMDPs), in which the agent must piece together its state on the
basis of observations it makes; and Markov games, in which the agent
is in direct competition with an opponent. The thesis is written from
a computer-science perspective, meaning that many mathematical details
are not discussed, whereas descriptions of algorithms and the
complexity of problems are emphasized. New results include an
improved algorithm for solving POMDPs exactly over finite horizons, a
method for learning minimax-optimal policies for Markov games, a
pseudopolynomial bound for policy iteration, and a complete complexity
theory for the problem of finding zero-reward POMDP policies.