Visual Vocabulary



There were two main steps that we had to implement in this project. The first was building a vocabulary of visual words that we could then use to identify images. For each image img, I used vl_disift(img, 'size', 4, 'step', 8) to gather dense features of the image. I randomly sampled 50 of these visual words so that kmeans would run with a reasonable frequency. So for each image, there are 50 features. Once the features are gathered for each image, I run kmeans on the 128 x (50 x 1500) matrix with a k of vocab_size. This returns the vocabulary of visual words

Histogram



Given an image and a vocabulary, build a histogram that represents the distance between an image and the vocabulary. You want to find the word in the vocabulary that is the closest (minimum distance) from the words in the vocabulary. Once that minimum-distance word is found, you increment the index of that word in the histogram. Then divide this histogram by the length of the histogram. This will give a representation of the distribution of words in the image.

Everything else



The histogram function is used in two places, first to categorize the training data, and then to categoize the test images. So the training histograms are fed into an SVM and then the testing histograms are fed into the same SVM to get categorized.

Results



The results of my scene detector were pretty good. Here is the graph:

The accuracy value tended to fluctuate between .65 and .68. The accuracy value represented by this particular graph is .6653