The next algorithm we discuss was actually the first exact POMDP algorithm proposed and is due to Sondik (1971). This algorithm starts with an arbitrary belief point, constructs the vector for that point and then defines a set of constraints over the belief space where this vector is guaranteed to be dominant.
The region defined is actually the intersection of three easier to describe regions. When we construct a vector from a belief point b we know what is the strategy that vector represents. This strategy is the best one for that belief point and some nearby belief points. However, it might not be the best strategy for all belief points. There are two ways that this strategy might not be the best strategy: either the immediate action doesn't change and the future strategy changes; or another action might become better. The first region assures that even if the best action doesn't change, the future course of action doesn't change either.
The figure below shows the value function for action a and the first region we defined above.
The dashed red line represents a particular future strategy for the non-optimal action. In other words, the best strategy to do if we were force to take that action first. The above region makes sure we restrict ourselves to belief points where this strategy is not better than what we have at belief point b. However, The final region is for the case where not only does the best action change, but the best strategy for that other action changes from what is the current best strategy. The dashed red line only shows part of the story for what is happening for the other action. We know that the dashed red line represents the best future strategy for that action at point b, but we know nothing about other future strategies for that action.
To ensure that we restrict ourselves to belief points where the action and future strategy doesn't change, we need to include the region where the dashed red line is the best strategy for the other action. This figure shows all the different strategies for the other action, though there are only two in this example.
The following figure shows the complete region that is imposed by Sondik's one-pass algorithm. It is the intersection of the three regions previously described.
Unfortunately, the regions defined by Sondik's algorithm are extremely conservative. These are not guaranteed to be the complete region where the vector is dominant as we saw in the last figure. Thus we might find that we generate the same vector for many belief points.