Tutorial: Basic Planning and Learning

Tutorials > Basic Planning and Learning > Part 1

Tutorial Contents

Introduciton

The purpose of this tutorial is to get you familiar with using some of the planning and learning algorithms in the OO-MDP Toolbox. Specifically, this tutorial will cover instantiating a GridWorld domain from the existing implemention in the domains package (you should download this first if you have not yet already), creating a task for it, having the task solved with Q-learning, Sarsa learning, BFS, DFS, A*, and Value Iteration. The tutorial will also show you how to visualize these results using existing tools in the OO-MDP Toolbox package.

Creating the class shell

For this tutorial we will start by making a class that has data members for all the domain and task relevant properties. In the tutorial we will call this class "BasicBehavior" but feel free to name it to whatever you like. Since we will also be running the examples from this class, we'll include a main method.

	
import domain.gridworld.*;
import oomdptb.oomdp.Domain;
import oomdptb.oomdp.ObjectInstance;
import oomdptb.oomdp.RewardFunction;
import oomdptb.oomdp.State;
import oomdptb.oomdp.StateParser;
import oomdptb.oomdp.TerminalFunction;
import oomdptb.behavior.statehashing.DiscreteStateHashFactory;

public class BasicBehavior {

	
	
	GridWorldDomain				gwdg;
	Domain						domain;
	StateParser					sp;
	RewardFunction				rf;
	TerminalFunction			tf;
	StateConditionTest			goalCondition;
	State						initialState;
	DiscreteStateHashFactory	hashingFactory;
	
	
	public static void main(String[] args) {
	
		//we'll fill this in later
	
	}
	
	
}

If you're already familiar with MDPs in general, the importance of most of these data mambers will be pretty obvious. However, we will walk through what each one is and why we're going to need it.

GridWorldDomain gwdg
This object is a DomainGenerator provided in the domains package. We will use this object to create a basic grid world domain for our demonstration.

Domain domain
The domain object is an fundemental OO-MDP object. The domain object defines a set of attributes, object classes, propositional functions, and actions (along with the actions transition dynamics).

StateParser sp
A StateParser object is used to convert OO-MDP states to and from strings. This is useful if you want to be be able to record planning and learning results to files.

RewardFunction rf
A RewardFunction is an object that returns a double valued reward for any given state-action-state sequence. This is a fundemental component of every MDP and its what behavior seeks to maximize.

TerminalFunction tf
A common form MDPs are episodic MDPs: MDPs that end in some specific state or set of states. A typical reason to define an episodic MDP is when there is a goal state the agent is trying to reach. In such cases, the goal state is a terminal state. There may be other reasons to provide termination states as well, but either way, the TerminalFunction object defines which states are terminal states.

StateConditionTest goalCondition
Not all planning algorithms are designed to maximize reward functions. Many are instead defined as search algorithms that seek specific goal states. A StateConditionTest object operates much like a TerminalFunction, only it can be used as a means to specify any kind of state check. In this tutorial we will use it to specify goal states.

State initialState
To perform any planning or learning, we will naturally need to specify an initial state from which to perfomr it! An OO-MDP state consists of an arbitrary set of instantiated object classes from a given domain. An instantiated object of an object class means that there is a value is defined for each attribute of the object class. A state may also consist of an arbitrary number of object instances for any given class, but in some domains you may typically only have on instance for each. In the GridWorld domain, for instance, there will be one instance of the agent object (which specifies the agent's position) and one instance of a location object, which will be used to specify a goal location to which the agent should go.

DiscreteStateHashFactory hasingFactory
To perfom planning and learning, there will need to be some way to look up previoulsly examined states in datastructures and to do so efficiently will require some way to compute hash codes for states. The DiscreteStateHashFactory proviedes a general means to do this for any discrete OO-MDP domain. A nice property of the DiscreteStateHashFactory object is that hashing results are invariant to specific object references. For instance, if a state contained to otherwise identical block objects that were in different positions in the world and you merely swapped their locations, the DiscreteStateHashFactory would correctly compute the same HashCode for each state since they represent the same OO-MDP state.

Next Part