Project 3 Report

By Andersen Chen

Algorithm: Scene Recognition with Bag of Words

To collect features from our training pictures, we extract a dense set of SIFT features from each image, and then randomly sample 620 features from every image to reduce runtime and memory. K-means is used to cluster our collection of features into 300 words. For each of the training images, we assign each observed feature to the nearest visual word in our vocabulary to build a normalized histogram of visual word occurances. We learn a set of one-vs-all classifiers from the training histograms, then classify each test image and build a confusion matrix.

Results

My accuracy is 0.6680 on 100 test images per scene. While suburban scenes did the best, industrial scenes did the worst.