This project's goal was to implement various image recognition techniques. We implemented two different image representation/feature extraction techniques and two different feature detection techniques and evaluated these techniques for scene recognition of 15 different scenes and 1500 total images. The specific image representation and feature detection methods are as follows:
Tiny Image: 16 x 16 pixel sized tiny images worked the best in my tests.
Bag of SIFT: Vocabulary size of 400 and sample 800 random images to get features from, finally I found a SIFT step size of 8 to work best for vocab building and bag of sift generation.
SVM: The LAMBDA value for SVM classification that I found most successful was 0.00009
During testing of various parameters I found that by using the 'fast' option for the vl_dsift call it would speed things up but also change accuracy (this is probably expected). For example, when testing different LAMBDA values for the SVM step if I set LAMBDA to 0.00009 and used 'fast' for my bag of SIFT calculations it would yield 63.9% accuracy where as without 'fast' it yielded 68.2%
Tiny Image + 1-Nearest Neighbor | Tiny Image + SVM | SIFT + 1-Nearest Neighbor | SIFT + SVM |
![]() |
![]() |
![]() |
![]() |
Category name | Accuracy | Sample training images | Sample true positives | False positives with true label | False negatives with wrong predicted label | ||||
---|---|---|---|---|---|---|---|---|---|
Kitchen | 0.610 | ![]() |
![]() |
![]() |
![]() |
![]() Store |
![]() Store |
![]() Office |
![]() Office |
Store | 0.460 | ![]() |
![]() |
![]() |
![]() |
![]() InsideCity |
![]() LivingRoom |
![]() Kitchen |
![]() InsideCity |
Bedroom | 0.570 | ![]() |
![]() |
![]() |
![]() |
![]() LivingRoom |
![]() Kitchen |
![]() Street |
![]() LivingRoom |
LivingRoom | 0.410 | ![]() |
![]() |
![]() |
![]() |
![]() Store |
![]() Bedroom |
![]() Bedroom |
![]() Bedroom |
Office | 0.910 | ![]() |
![]() |
![]() |
![]() |
![]() Kitchen |
![]() Bedroom |
![]() Kitchen |
![]() Bedroom |
Industrial | 0.580 | ![]() |
![]() |
![]() |
![]() |
![]() InsideCity |
![]() Store |
![]() LivingRoom |
![]() Street |
Suburb | 0.940 | ![]() |
![]() |
![]() |
![]() |
![]() Mountain |
![]() Coast |
![]() LivingRoom |
![]() Store |
InsideCity | 0.590 | ![]() |
![]() |
![]() |
![]() |
![]() Suburb |
![]() Store |
![]() Store |
![]() Store |
TallBuilding | 0.810 | ![]() |
![]() |
![]() |
![]() |
![]() Store |
![]() Street |
![]() Bedroom |
![]() InsideCity |
Street | 0.630 | ![]() |
![]() |
![]() |
![]() |
![]() Forest |
![]() Bedroom |
![]() Store |
![]() InsideCity |
Highway | 0.850 | ![]() |
![]() |
![]() |
![]() |
![]() OpenCountry |
![]() Industrial |
![]() Coast |
![]() Mountain |
OpenCountry | 0.410 | ![]() |
![]() |
![]() |
![]() |
![]() Coast |
![]() Coast |
![]() Mountain |
![]() TallBuilding |
Coast | 0.710 | ![]() |
![]() |
![]() |
![]() |
![]() OpenCountry |
![]() OpenCountry |
![]() Mountain |
![]() OpenCountry |
Mountain | 0.820 | ![]() |
![]() |
![]() |
![]() |
![]() OpenCountry |
![]() Bedroom |
![]() Highway |
![]() OpenCountry |
Forest | 0.930 | ![]() |
![]() |
![]() |
![]() |
![]() OpenCountry |
![]() Store |
![]() Mountain |
![]() Mountain |
Category name | Accuracy | Sample training images | Sample true positives | False positives with true label | False negatives with wrong predicted label |