Tutorial: Basic Planning and Learning

Tutorials > Basic Planning and Learning > Part 2

Tutorial Contents

Initializing the datamembers

Now that we have the structure of our class, we'll need to initialze our datamembers to instnaces that will create our domain and define our task.

First add the below line to your imports:

import oomdptb.oomdp.common.*;

Next, create a default constructor. The first thing we'll do in the constructor is create our domain.

gwdg = new GridWorldDomain(11, 11);
gwdg.setMapToFourRooms(); 
domain = gwdg.generateDomain();

The first line will create an 11x11 deterministic GridWorld. The second line sets up the map to a pre-canned layout: the four rooms layout. This domain layout was used in Option learning work from Sutton, Precup, and Singh (1999) and it presents a simple envionrment for us to do some tests. Alternatively, you could also define your own map layout by either passing the constructor an 2D interger array (with 1s specifing the cells with walls and 0s specifying open cells), or you could simply specify the size of the domain like we did and then use the GridWorld object's horiztonalWall and verticalWall methods to place walls on it. For simplicity, we'll stick with the four rooms layout.

The third line will produce the Domain object for the GridWorld. Recall from the previous part of the tutorial that Domain objects hold reference to all the attributes, object classes, propositional functions, and actions (along with the actions' transiton dynamics). In our GridWorld's case, there are two attributes, an X attribute and a Y attribute. There are also two classes: an AGENT class and a LOCATION class, each of which is defined by the X and Y attributes. While there could potentially be any number of AGENT object instantiations in a state, in this domain we expect only one to ever be defined. The location objects will be used for points of interest. Specifically, we will use a single location object to represent a goal location.

The GridWorld domain also defines five propositional functions: agentAtLocation(AGENT, LOCATION); wallToNorth(AGENT); wallToSouth(AGENT); wallToEast(AGENT); wallToWest(AGENT). The first of those returns true when the specificed AGENT object is at the same location as the specified LOCATION object. The latter four propositional functions return true when there is a wall in the immediate cell of the defined direction of the specifed AGENT object.

Finally, the GridWorld domain defines four actions to move north, south, east, or west of the agent's current position. Although we could have told the GridWorldDomain generator to make these movements stochastic (that is, specify a probability in which the agent moves in an uninteded direction), in our specific example we have left them as the default deterministic actions. If an agent moves into a wall, then its position does not change.

Next we will create a state parser:

sp = new GridWorldStateParser(domain);

This state parser is a custom state parser that is part of the GridWorld package. Technically, we could have also used the UniversalStateParser class (part of the oomdptb.oomdp.common package), which can provide state parsing for any possible OO-MDP domain, however, the UniversalStateParser is verbose in its String output, which can be undesirable if you are recording thousands of learning results. In such cases, a custom parser, such as the one we used here, can be more compact.

The next thing we will want to do is define the actual task to be solved for this domain. We will do this by specifying a reward function, a termination function and a goal condition, the latter of which is used exclsuively for search-based deterministic planners that ignore costs and rewards.

rf = new UniformCostRF(); 
tf = new SinglePFTF(domain.getPropFunction(GridWorldDomain.PFATLOCATION)); 
goalCondition = new TFGoalCondition(tf);

The first line will create a reward function that always returns -1 for every state-action-state transition. This might seem like a problem; it may seem like we would need to define a reward function that returns a greate reward when the agent reaches the goal (and there are existing reward function classes in the oomdptb.oomdp.common package to do so). However, the next line which defines the termination function makes specifying a reward function that returns a greater reward at the goal unncessary. Before expalining why, lets examing the creation of the TerminalFunction, which is specified as an instance of the SinglePFTF class. This is a class which is provided a single propositional function of the domain. Any state for which any object binding of that propositional function is true will be marked as a terminal state. In the constructor parameters, the propositional function is retreived by querying the domain for the propositional function with the name GridWorldDomain.PFATLOCATION which is a constant field of the Gridworld domain refercing the name used for the atLocation propositional function. Note that if there were multiple LOCATION objects in the task, this TerminalFunction would mark any state in which the agent was at any of the locations as a terminal state.

The final line sets up the goal condition to be synonomous with the termination function.

The next step will be to define the initial state of this task. We could either do this by creating an empty State object and then manually adding object instantiations for each object class, or we could use some methods of the GridWorldDomain class to facilitate the process. We will do the latter for brevity.

initialState = GridWorldDomain.getOneAgentOneLocationState(domain);
GridWorldDomain.setAgent(initialState, 0, 0);
GridWorldDomain.setLocation(initialState, 0, 10, 10);

The first line will return a state object with a single instance of the AGENT class and a single instance of the LOCATION class. The second line of code then sets the agent to be at position 0,0. The third line of code sets the location to be at position 10,10. The first zero you see in the parameters indicates which LOCATION object index position to set. Since there is only one LOCATION object, we are setting the position of the 0th indexed LOCATION object.

The last part of the constructor is to define a method for computing state hash codes that can be used to efficiently look up state objects in various planning and leanring algorithm datastructures. Since this domain is discrete, we will use the common DiscreteStateHashFactory class.

hashingFactory = new DiscreteStateHashFactory();
hashingFactory.setAttributesForClass(GridWorldDomain.CLASSAGENT, 
			domain.getObjectClass(GridWorldDomain.CLASSAGENT).attributeList);

Note that the second line is optional. If we did not include that line, state hash codes would be computed with respect to all attributes of all objects. However, in this task, the location object position will be constant, therefore, there is no reason to use the location object attributes for computing hash codes. Instead the hashing only needs to be computed with respect to the X and Y attributes of the AGENT object, which will vary between states in the task. The second line of code tells the hashing factory that we are going to manually define the set of attributes to be used for computing hashing codes and it tells it to use all of the attributes used in the AGENT class for hashing. We could have also told it to use only the X attribute, or we could have specified attributes for other classes as well by adding additional calls to the setAttributesForClass method. Note that GridWorldDomain.CLASSAGENT is a constant field of the GridWorldDomain class that points to the name of AGENT class. The domain.getObjectClass method will return the object class with the specified name and the attributeList field of will return the list of attributes associated with that object class.

At this point you should have initialized all of the data members for the class and the final constructor will look something like the below.

public BasicBehavior(){
	
	//create the domain
	gwdg = new GridWorldDomain(11, 11);
	gwdg.setMapToFourRooms(); 
	domain = gwdg.generateDomain();
	
	//create the state parser
	sp = new GridWorldStateParser(domain); 
	
	//define the task
	rf = new UniformCostRF(); 
	tf = new SinglePFTF(domain.getPropFunction(GridWorldDomain.PFATLOCATION)); 
	goalCondition = new TFGoalCondition(tf);
	
	//set up the initial state of the task
	initialState = GridWorldDomain.getOneAgentOneLocationState(domain);
	GridWorldDomain.setAgent(initialState, 0, 0);
	GridWorldDomain.setLocation(initialState, 0, 10, 10);
	
	//set up the state hashing system
	hashingFactory = new DiscreteStateHashFactory();
	hashingFactory.setAttributesForClass(GridWorldDomain.CLASSAGENT, 
				domain.getObjectClass(GridWorldDomain.CLASSAGENT).attributeList); 
	
		
}

Setting up a result visualizer

Before we get to actually running planning and learning algorithms, we're going to want a way to visualize the resutls that they generate. To do that, we will need to define a visualizer and pass it to an EpisdoeSequenceVisualizer. To do so, create the below method.

public void visualize(String outputPath){
	Visualizer v = GridWorldVisualizer.getVisualizer(domain, gwdg.getMap());
	EpisodeSequenceVisualizer evis = new EpisodeSequenceVisualizer(v, domain, sp, outputPath);
}

Note that the outputPath parameter specifies the directory where our planning/learning results were stored (well get to this when we actually apply a planning/learning algorithm).

In order to visualize results for anything, the domain will need to have a Visualizer defiend for it, because it is impossible to tell how a domain should be visualized by attributes alone! In this case, however, a class to generate a GridWorldDomain Visualizer, called GridWorldVisualizer already exists as part of the domains package. This specific Visualizer class will require both the domain object and the 2D int array specifying the layout of the map (since different grid worlds can use different layouts). This is simply retrieved from our GridWorldDomain generator object using the getMap() method. The final line will construct and load the EpisodeSequenceVisualizer. Note that in order to visaulize episode results with this class, you will need to pass it a visualizer, the domain, the state parser (which will be used to extract the states out of stored files), and the path to which all the epsidoes you wish to visualize are located. Later in this tutorial, we will examine how to use the GUI of the EpisdoeSequenceVisualizer.

Before moving on to the next part of the turotial, lets also hook up our class constructor and visualizer method to the main class.

public static void main(String[] args) {


	BasicBehavior example = new BasicBehavior();
	String outputPath = "output/"; //directory to record results
	
	//we will call planning and leanring algorithms here
	
	
	//run the visualizer
	example.visualize(outputPath);

}

Note that you can set the output path to whatever you want. If it doesn't already exist, the code that will follow will automatically create it for you.

Next Part