Given an image and a vocabulary, build a histogram that represents the distance between an image and the vocabulary. You want to find the word in the vocabulary that is the closest (minimum distance) from the words in the vocabulary. Once that minimum-distance word is found, you increment the index of that word in the histogram. Then divide this histogram by the length of the histogram. This will give a representation of the distribution of words in the image.
The histogram function is used in two places, first to categorize the training data, and then to categoize the test images. So the training histograms are fed into an SVM and then the testing histograms are fed into the same SVM to get categorized.
The results of my scene detector were pretty good. Here is the graph:
The accuracy value tended to fluctuate between .65 and .68. The accuracy value represented by this particular graph is .6653