In order to have a vocabulary, we must first have "words". In the computer vision sense, "words" are common features from the images that appear frequently. To find these words, I used vl_dsift
to find a dense cluster of 128-dimensional scale-invariant features for each image in my training set. (I used 100 images from each of 15 categories). I then amalgamated 600 random features from each training image into one big collection of features. Then, I used vl_kmeans
on this large collection of features, asking it to find 200 clusters. These 200 clusters formed my vocabulary of "words".
The next step was to make histograms for each training image to show the frequencies with which each word appeared in an image. This was done by again finding features for each test image and then for each feature of that image, finding the closest word in the vocab using vl_alldist2
. Here are some example histograms:
Now that we have histograms for the training images, we need to use them to define what each of the image categories look like in terms of our vocabulary. In other words, what does a typical Forest or typical Office image histogram look like? To do that, I used the default primal_svm
function with a linear kernel. The results are 15 different classifiers, one for each image category. A classifier maps a positive or negative value to each of the 200 words in the vocabulary. Positive values correspond to words that typify that classifier's category and negative values correspond to words that are rare for that category.
Now we have everything we need to start classifying test images. For each test image, find its histogram. Then, multiply its histogram with each classifier (and add the bias for that classifier). Find which multiplication resulted in the highest confidence value. Hopefully, the classifier for the correct image category produced the highest confidence!
93 1 0 1 1 0 0 1 1 0 1 0 0 1 0 4 78 0 4 0 2 12 0 0 0 0 0 0 0 0 0 0 94 0 0 3 1 2 0 0 0 0 0 0 0 0 7 1 81 4 2 3 0 0 0 1 0 1 0 0 4 4 0 1 60 0 1 5 5 0 1 0 10 1 8 8 0 3 2 0 81 2 2 1 0 1 0 0 0 0 5 23 9 9 0 9 43 1 0 0 1 0 0 0 0 1 0 0 9 21 3 0 52 4 0 0 3 0 2 5 0 3 2 0 2 4 0 7 73 2 1 3 2 0 1 0 0 0 0 0 0 0 0 0 89 3 0 8 0 0 1 0 0 0 1 2 2 0 3 14 42 2 23 6 4 7 5 1 9 8 5 4 3 9 2 1 28 5 3 10 1 0 0 0 12 1 0 1 1 20 7 1 51 2 3 2 0 0 1 4 3 0 1 3 22 26 2 18 12 6 1 0 3 3 18 7 0 3 6 2 4 1 4 1 47
Category |
Accuracy |
Sample Training Images |
Correct Classifications |
Incorrect Classifications |
Suburb | 93% | |||
Coast | 78% | |||
Forest | 94% | |||
Highway | 81% | |||
City | 60% | |||
Mountain | 81% | |||
Open Country | 43% | |||
Street | 52% | |||
Tall Building | 73% | |||
Office | 89% | |||
Bedroom | 42% | |||
Industrial | 28% | |||
Kitchen | 51% | |||
Living Room | 12% | |||
Store | 47% |