In this tutorial we will be creating the classic Four Rooms options (Sutton Precup and Singh, 1999) so we will be extending the Basic Planning and Learning examples class made in the previous tutorial. If you have not already followed that tutorial, you should either do so now, or at least copy and paste the final code of that tutorial so that you may extend it in this example. You should start by creating the below shell class.
import java.util.Iterator; import oomdptb.behavior.*; import oomdptb.behavior.learning.LearningAgent; import oomdptb.behavior.learning.tdmethods.*; import oomdptb.behavior.options.*; import oomdptb.behavior.planning.*; import oomdptb.behavior.planning.commonpolicies.*; import oomdptb.behavior.planning.stochastic.valueiteration.ValueIteration; import oomdptb.behavior.statehashing.DiscreteMaskHashingFactory; import oomdptb.oomdp.ObjectInstance; import oomdptb.oomdp.State; import domain.gridworld.GridWorldDomain; public class OptionsExample extends BasicBehavior{ public static void main(String[] args) { OptionsExample example = new OptionsExample(); } public OptionsExample(){ super(); //override initial state goal location to be on a hallway where options can be most exploited GridWorldDomain.setLocation(initialState, 0, 5, 8); } }
Note that we overrode the constructor and made one small change from our previous tutorial; we set the goal location to be located in one of the hallways of the four rooms. The options we're going to to make will be subgoal options that take the agent from within a room to one of the two hallways connected to it, so placing our goal location in a hallway will make for a task that should benefit greatly from options.
Since the Four Rooms domain has four rooms, each connected to two rooms, we will create eight options for this domain. Each option will have an initation state set defined by a room and a deterministic termination condition that terminates when the agent exits the room. The defined policy for the each option, will take the agent to an assigned hallways connected to the room.
To define the these options, we could hardcode the policies, one for each option. However, doing so is a time consuming task and while it may not be so bad to do for a simple domain like Four Rooms, it will be more problematic in larger more complex domains. However, since subgoal options have defined goal conditions, can states in which it can be applied, we can use existing planning algorithms to define the policy from each of the possible states in which the option can be applied. Futhermore, there is an existing SubgoalOption class defined in the OO-MDP Toolbox which makes creating such options farily trivial.
To begin creting the option, we will first need a method to identify the initiation states. To fully exploit the tools for automatically creating Subgoal options and their policies, we will also want to be able to iterate over all of the initiation states so that a planner can be used to compute the policy for each of them. To accomplish this, we will implement a subclass of the StateConditionTestIterable interface. Specifically, this class will be used to test whether a given state defines the agent within a specific room. Since rooms are defined by rectangular regions, this can be test by defining a rectangle and testing if the agent position is within it. We will also need to be able to iterate through all of the possible states in the room, which can be achieved by incrementing across the rows and columns of the defined rectangle and setting the agent position at each of them. The implemented class, which we will walk through is defined below.
class InRoomStateCheck implements StateConditionTestIterable{ int leftBound; int rightBound; int bottomBound; int topBound; public InRoomStateCheck(int leftBound, int rightBound, int bottomBound, int topBound){ this.leftBound = leftBound; this.rightBound = rightBound; this.bottomBound = bottomBound; this.topBound = topBound; } @Override public boolean satisfies(State s) { ObjectInstance agent = s.getObjectsOfTrueClass(GridWorldDomain.CLASSAGENT).get(0); int ax = agent.getDiscValForAttribute(GridWorldDomain.ATTX); int ay = agent.getDiscValForAttribute(GridWorldDomain.ATTY); if(ax >= this.leftBound && ax <= this.rightBound && ay >= this.bottomBound && ay <= this.topBound){ return true; } return false; } @Override public Iteratoriterator() { return new Iterator () { int ax=leftBound; int ay=bottomBound; @Override public boolean hasNext() { if(ay <= topBound){ return true; } return false; } @Override public State next() { State s = GridWorldDomain.getOneAgentNLocationState(domain, 0); GridWorldDomain.setAgent(s, ax, ay); ax++; if(ax > rightBound){ ax = leftBound; ay++; } return s; } @Override public void remove() { throw new UnsupportedOperationException(); } }; } @Override public void setStateContext(State s) { //do not need to do anything here } }
Note that the constructor of this class takes as parameters the left, right, top, and bottom boundaries of the room (inclusively) and these are stored for later use. The satisfies(State s) method is what is called to check whether the state passed to the method satisfies the state condition test. In our case, that means returning true if the state passed to the method has the agent position defined within the room boundaries. We test this by first extracting the agent object from the provided state using the getObjectsOfTrueClass method, which returns a list of object instances of the designated class. Since we only ever expect there to be one agent object, we simple extract the first agent object defined in the state. After the agent object is extract the values for its x and y attributes are retrieved and compared against the defined room boundaries. If it's within the boundaries, we return true; otherwise, we return false.
The iterator method will return an OO-MDP state iterator that iterates through of all the states that would satisfy our state condition test (that is, state that are within the room boundaries). To achieve this we create an anonymous iterator object that starts with in the bottom left corner of the room. When ever the next method is called on this iterator, it will create a GridWorld state with only an agent object at the current corrdinate the iterator is on and return that state. The next position is defined my iterated along the x-axis first and when the coordiante reaches the right boundary of the room, it moves to the left-most position of the room on the next y-coordinate, until the y-coordinate is outside of the room.
Note that technically there could be many more OO-MDP states that satsify our test, because we could have different states with different number of location objects at different locations. The option that we create, however, is going to abstract away information about other objects so when we are iterating through the states, we only need to consider states with a single agent object
The StateConditionTestIterable interface also requires us to implement the setStateContext(State s) method. We don't actually need to use this method in this example; the method is defined for auxillery support for other contexts that we will not discuss in length this tutorial. In brief, however, this method is used to manage how state abstractions affect the state iterator. For instance, in this example, we abstracted away any information about location objects and as a result, the iterator created states without any location objects. In some other scenarios, we might want to have the iterator iterate over states that did have specific location objects in them. If so, this desire could be indicated to the class by first calling the setStateContext(State s) method and passing it a state with the desired location objects. The iterator could then make copies of this state in the iterator and merely modify the agent position, leaving other objects like location objects alone. However, in this options tutorial, we have no need to use the StateConditionTestIterable interface in this way, so we do not implement the method.
We will also need to define the termination conditions for the option, which for a subgoal option can be defined as the subgoal the option achieves (the subgoal option also is typically defined terminate anywhere the policy is not defined, but our SubgoalOption class will handle this implicitly). To create the subgoal definition, we will implement a subclass of the StateConditionTest interface. Note that this interface is a superclass of StateConditionTestIterable interface that we used for defining the initiation states, with the difference being that StateConditionTest does not require an iterator method to be defined.
As stated previously, the subgoal of a room option is to take the agent to one of the two hallways connected to the room for which the option is defined. Since hallways only take up a single cell of the grid world, we can define a StateConditionTest class that takes an x and y position and returns true when the agent is at that position. The below class does so.
class AtPositionStateCheck implements StateConditionTest{ int x; int y; public AtPositionStateCheck(int x, int y){ this.x = x; this.y = y; } @Override public boolean satisfies(State s) { ObjectInstance agent = s.getObjectsOfTrueClass(GridWorldDomain.CLASSAGENT).get(0); int ax = agent.getDiscValForAttribute(GridWorldDomain.ATTX); int ay = agent.getDiscValForAttribute(GridWorldDomain.ATTY); if(ax == this.x && ay == this.y){ return true; } return false; } }
This class works very similarly to our initation state class, so we will not walk through it in detail.
At this point we now have methods for defining our Subgoal option initiation conditions and termination condtions and the next step is to create the policy which defines it along with the option itself.