This project is aimed at image classification: given a tagged set of input images, train classifiers that are able to, given another input image, classify it with a specific tag. For instance, given a lot of pictures of a forest, enable a computer to tag a completely new image (of a forest) as a forest.
The first step involved finding a way to train the classifiers in the first place. For this project, we extracted features from the images (a version of SIFT features). After performing this feature extraction, we clustered the features using k-means clustering to produce centroids through which we could then determine what featuresets made a specific image.
We then took every image we used to generate our vocabulary and then created a histogram of features given proximity to the centroids of our vocabulary.
The final step was to train SVMs and then use them to classify the final image. Each SVM is trained with the histograms, and is a 1vsAll svm. Then, to classify images, we extracted the features, created histograms again, and tested them with all the SVMs we trained. The classification is then chosen by the SVM that chose with the highest confidence.
After some tweaking, I ended up also comparing features from three levels of a gaussian pyramid. This increased my scores by about 5%, regardless of how many samples I took or the vocab size.
I also tested a number of different sampling rates and vocabulary sizes. Here are some socres (all are taking multiple levels of gaussian pyramids and pre-smoothing unless otherwise specified):
Vocab 200, 2000 samples: 31% (this was the best)
Vocab 200, 500 samples per image: 27%
Vocab 100, 500 samples: 22%
Vocab 200, 1500 samples, no pre-smoothing: 26.8 (this showed how important the pre-smoothing was)
Vocab 200, 500 samples, pre-smoothed, but no gaussian pyramid: 20%
Vocab 100, 500 samples, pre-smoothed, no gaussian pyramid: 16%
Vocab 50, 500 samples, pre-smoothed, no gaussian pyramid: 9%