Ben Freudberg
CSCI1430
Project 3
Introduction
The goal of this project is to define a vocabulary of image features that, when used with an SVM learning function and image training set, is capable of sorting test images into the categories that the training images were classified as. The general steps are: pull out a random sample of special case pixels from each test image and describe them in 128 dimensional space using vl_dsift, run a kmeans function on all the sampled pixels to define a certain number of “words” (approx. 200) that represent different types of pixels in 128 dimensional space, and run an SVM learning algorithm so that the program is able to associate certain image features with certain categories of images. Finally, test images are sorted and results are tallied and a final score for the sorting method is calculated.
Running vl_dsift
When running v1_dsift, there are several parameters that may be tuned. First, the can adjust the number of random pixels taken from each image to use later with kmeans. This is paired with parameters that determine which types of pixels are most likely to be pulled. By tuning the ‘size’ and ‘step’ parameters of vl_dsift, we can adjust how special the possible pixels to be picked from are. If these options are not set, the dsift function will simply return the 128 dimension description of every pixel. We set these options to reduce the number of pixels returned. Only the most interesting pixels from a certain select number of boxes of a certain size are picked. By reducing the number of pixels returned, we raise the threshold on how special they must be. However, if raised too far, we lose representation of pixels that may be less special in the way the program looks at pixels, but are still useful in describing the content of different images. Leaving everything else constant and changing step size from 4 to 8 to 16 to 32, I got the following results for my accuracy:
Step Size |
Accuracy |
4 |
0.644 |
8 |
0.6487 |
16 |
0.666 |
32 |
0.6687 |
Running kmeans
The next step of optimization was to run the kmeans function with a different vocabulary size. As expected, increased vocabulary size provides diminishing improvements in performance with vastly increasing computation time. The graph of these results is shown below:
Vocab Size |
Accuracy |
10 |
0.448 |
100 |
0.6507 |
200 |
0.6647 |
1000 |
0.684 |
5000 |
0.7127 |
Scoring
When test images are run through the program, it attempts to classify each image into one of the categories the SVM learned from the training set. A square matrix with a size equal to the number of image categories on each axis is created. First a set of 100 images in one category (one row on the results matrix) is run through the program. Each image is classified and a +1 is applied to the entry in the column of the calculated category. The program then moves on to the next category of test images and the next row in the matrix. The final matrix represents the score of the sorting algorithm. A perfect score would be the identity matrix (multiplied by the number of test images in each category). An overall score is computed by taking the sum of the diagonal and dividing by the number of test images. My best score was 0.7127 (out of 1) when I ran the program using a vocabulary of 5000 “words”. The graphical representation of the scoring matrix from that run is shown below: