Ben Freudberg

CSCI1430

Project 3

 

Introduction

            The goal of this project is to define a vocabulary of image features that, when used with an SVM learning function and image training set, is capable of sorting test images into the categories that the training images were classified as. The general steps are: pull out a random sample of special case pixels from each test image and describe them in 128 dimensional space using vl_dsift, run a kmeans function on all the sampled pixels to define a certain number of “words” (approx. 200) that represent different types of pixels in 128 dimensional space, and run an SVM learning algorithm so that the program is able to associate certain image features with certain categories of images. Finally, test images are sorted and results are tallied and a final score for the sorting method is calculated.

 

Running vl_dsift

            When running v1_dsift, there are several parameters that may be tuned. First, the can adjust the number of random pixels taken from each image to use later with kmeans. This is paired with parameters that determine which types of pixels are most likely to be pulled. By tuning the ‘size’ and ‘step’ parameters of vl_dsift, we can adjust how special the possible pixels to be picked from are. If these options are not set, the dsift function will simply return the 128 dimension description of every pixel. We set these options to reduce the number of pixels returned. Only the most interesting pixels from a certain select number of boxes of a certain size are picked. By reducing the number of pixels returned, we raise the threshold on how special they must be. However, if raised too far, we lose representation of pixels that may be less special in the way the program looks at pixels, but are still useful in describing the content of different images. Leaving everything else constant and changing step size from 4 to 8 to 16 to 32, I got the following results for my accuracy:

 

Step Size

Accuracy

4

0.644

8

0.6487

16

0.666

32

0.6687

 

Running kmeans

            The next step of optimization was to run the kmeans function with a different vocabulary size. As expected, increased vocabulary size provides diminishing improvements in performance with vastly increasing computation time. The graph of these results is shown below:

Vocab Size

Accuracy

10

0.448

100

0.6507

200

0.6647

1000

0.684

5000

0.7127

 

Scoring

            When test images are run through the program, it attempts to classify each image into one of the categories the SVM learned from the training set. A square matrix with a size equal to the number of image categories on each axis is created. First a set of 100 images in one category (one row on the results matrix) is run through the program. Each image is classified and a +1 is applied to the entry in the column of the calculated category. The program then moves on to the next category of test images and the next row in the matrix. The final matrix represents the score of the sorting algorithm. A perfect score would be the identity matrix (multiplied by the number of test images in each category). An overall score is computed by taking the sum of the diagonal and dividing by the number of test images. My best score was 0.7127 (out of 1) when I ran the program using a vocabulary of 5000 “words”. The graphical representation of the scoring matrix from that run is shown below: