With `MDP`s we have a set of states, a set of actions to choose
from, and immediate reward function and a probabilistic transition
matrix. Our goal is to derive a mapping from states to actions, which
represents the best actions to take for each state, for a given
horizon length. This is easy if we know the value of each state for
this horizon length. The value iteration algorithm computes this
value function by finding a sequence of value functions, each one
derived from the previous one.

The value iteration algorithm starts by trying to find the value
function for a horizon length of `1`. This will be the value
of each state given that we only need to make a single decision.
There isn't much to do to find this in an `MDP`. Recall that
we have the immediate rewards, which specify how good each action is
in each state. Since our horizon length is `1`, we do not need
to consider any future effects (there is no future). Thus, we can
simply look at the immediate rewards and choose the action with the
highest immediate value for each state.

The next step, which is the second iteration of the algorithm, is to
determine the value function for a horizon length of `2`. The
value of acting when there are two steps to go, is the immediate
rewards for the immediate action you will take, plus the value of the
next action you choose. Conveniently, we have already computed the
values of each state for a horizon length of `1`. So to find
the value for horizon `2`, we can just add the immediate
effects of each of the possible actions to the already computed value
function to find the action with the best value given that there will
be two decision to be made.

Now the algorithm iterates again; it finds the horizon `3`
value function using the horizon `2` value function. This
iterates until we have found the value function for the desired
horizon.

We could give the formula for how to calculate one value function from
another, but we promised there would be no formulas. All we will say
is that it involves iterating over all the states and using the
transition probabilities to weight the values. Anyway, the formulas
can be found most anywhere people talk about `MDP`s, so just be
content that it is easy to derive one value function from another by
iterating over the states.