The object of this project was to create a 15 category scene recogition pipeline after Lazebnik et al. 2006. A tiny image pipline was also developed as a jumping-off point. A 400 word vocabulary was built from training images by extracting dense SIFT features (using 'fast' vl_dsift with a step size of 20 or 30) and clustering these features using k-means. From there, a bag of SIFTs (again, using 'fast' vl_dsift with a step of 10) was assembled and binned into a histogram of the k-means categories for each test image. A kNN classifier and a linear SVM classifier (trained on 100 test images) were developed to assign test images to categories. This pipline classified images with about 64% accuracy. From this basic pipeline, other options were explored. Soft binning was attempted, but this actually decreased performance to 50% if 3 nearest neighbors were considered, and to 40% if 15 nearest neighbors were considered. This could very well be a result of a bad implementation, but this approach was abandoned. Using vl_dsift without the 'fast' parameter was also attempted, but this showed no appreciable improvement in performance, and in fact slight decrease, so this too was abandoned. Next, a 512 dimensional GIST vector was obtained (using LMgist by Aude Oliva, Antonio Torralba) from each image and appended to the 400 dimensonal histograms of images, which resulted in a great increase in performance. Finally, the bag of SIFTs was binned spatially, which created a 400*bins + 512 dimensional vector in total, which also increased performance. However, these increases in performance are not very elegant and come at the cost of increased computation time and memory use.
Gist Features | Spatial Binning | Num Spatial Bins | Performance |
No | No | 1 | 0.64 |
Yes | No | 1 | 0.726 |
Yes | Yes | 4 | 0.761 |
Yes | Yes | 16 | 0.777 |
No | Yes | 16 | 0.724 |
The highest performance, 78%, was seen when binning each image into 16 spatial bins and including the GIST features. The confusion matrix and example classifications are below.
Category name | Accuracy | Sample training images | Sample true positives | False positives with true label | False negatives with wrong predicted label | ||||
---|---|---|---|---|---|---|---|---|---|
Kitchen | 0.620 | ![]() |
![]() |
![]() |
![]() |
![]() Bedroom |
![]() LivingRoom |
![]() Bedroom |
![]() Bedroom |
Store | 0.640 | ![]() |
![]() |
![]() |
![]() |
![]() TallBuilding |
![]() InsideCity |
![]() TallBuilding |
![]() InsideCity |
Bedroom | 0.610 | ![]() |
![]() |
![]() |
![]() |
![]() LivingRoom |
![]() LivingRoom |
![]() LivingRoom |
![]() LivingRoom |
LivingRoom | 0.590 | ![]() |
![]() |
![]() |
![]() |
![]() Industrial |
![]() Suburb |
![]() Office |
![]() Store |
Office | 0.950 | ![]() |
![]() |
![]() |
![]() |
![]() LivingRoom |
![]() Store |
![]() Bedroom |
![]() Store |
Industrial | 0.690 | ![]() |
![]() |
![]() |
![]() |
![]() Store |
![]() TallBuilding |
![]() Store |
![]() TallBuilding |
Suburb | 0.990 | ![]() |
![]() |
![]() |
![]() |
![]() Industrial |
![]() Industrial |
![]() LivingRoom |
|
InsideCity | 0.770 | ![]() |
![]() |
![]() |
![]() |
![]() Highway |
![]() Industrial |
![]() Kitchen |
![]() Kitchen |
TallBuilding | 0.830 | ![]() |
![]() |
![]() |
![]() |
![]() InsideCity |
![]() InsideCity |
![]() Store |
![]() Store |
Street | 0.910 | ![]() |
![]() |
![]() |
![]() |
![]() OpenCountry |
![]() InsideCity |
![]() Highway |
![]() InsideCity |
Highway | 0.870 | ![]() |
![]() |
![]() |
![]() |
![]() OpenCountry |
![]() Kitchen |
![]() Street |
![]() Coast |
OpenCountry | 0.680 | ![]() |
![]() |
![]() |
![]() |
![]() Coast |
![]() Coast |
![]() Highway |
![]() Coast |
Coast | 0.750 | ![]() |
![]() |
![]() |
![]() |
![]() OpenCountry |
![]() OpenCountry |
![]() OpenCountry |
![]() OpenCountry |
Mountain | 0.830 | ![]() |
![]() |
![]() |
![]() |
![]() OpenCountry |
![]() OpenCountry |
![]() Forest |
![]() Street |
Forest | 0.930 | ![]() |
![]() |
![]() |
![]() |
![]() OpenCountry |
![]() OpenCountry |
![]() Mountain |
![]() Mountain |
Category name | Accuracy | Sample training images | Sample true positives | False positives with true label | False negatives with wrong predicted label |