# Tech Report CS-96-09

## Algorithms for Sequential Decision Making

### Michael Lederman Littman

#### March 1996

### Abstract:

Sequential decision making is a fundamental task faced by any intelligent agent in an extended interaction with its environment; it is the act of answering the question ``What should I do now?'' In this thesis, I show how to answer this question when ``now'' is one of a finite set of states, ``do'' is one of a finite set of actions, ``should'' is maximize a long-run measure of reward, and ``I'' is an automated planning or learning system (agent). In particular, I collect basic results concerning methods for finding optimal (or near-optimal) behavior in several different kinds of model environments: Markov decision processes, in which the agent always knows its state; partially observable Markov decision processes (POMDPs), in which the agent must piece together its state on the basis of observations it makes; and Markov games, in which the agent is in direct competition with an opponent. The thesis is written from a computer-science perspective, meaning that many mathematical details are not discussed, and descriptions of algorithms and the complexity of problems are emphasized. New results include an improved algorithm for solving POMDPs exactly over finite horizons, a method for learning minimax-optimal policies for Markov games, a pseudopolynomial bound for policy iteration, and a complete complexity theory for finding zero-reward POMDP policies.

(complete text in pdf or gzipped postscript)