This project implements scene recognition using various different approaches, and evaluates these approaches using confusion matrices and accuracy scores. Specifcally, tiny image features and bag of sift features are used for image representation, while support vector machine and k-nearest neighbor classification are used for image classification. All four combinations of these approaches are evaluated and results are shown below.

Parameters

There are lots of tune-able parameters for this project. I've experiemented with a variety of them and come up with parameters that seem to produce the best results for this particular implementation. For tiny image features, images are resized to 16 x 16 resolution, vectorized, and then normalized. For k-NN, k = 1 actually performs very well (a lot of times even better than values of k greater than 1), and k values around 10 also does well. For bag of sift features, I'm using a step size of 8 and bin size of 4. For SVM, performance differs as different image representations are used. Lambda values around 10^-4 seems to do well with bag of sift features.

Tiny Image with k-NN

Accuracy varied between 0.18 and 0.23. At k = 1 the accuracy is 0.225. At k = 20 it is 0.215, and at 40 it is 0.213.

Tiny Image with SVM

The best accuracy I could get with linear SVM was 0.186 (I also tried radial kernel SVM but the result was worse). This was under lambda = 1e-4.

Bag of Sift with k-NN

With k = 1 the accuracy is 0.524. At k = 10 it is improved to 0.531, which is also the best I could get with this configuration. The confusion matrix and table of classifier results are included here:

Bag of Sift with k-NN, k = 10

Accuracy (mean of diagonal of confusion matrix) is 0.531

Category name Accuracy Sample training images Sample true positives False positives with true label False negatives with wrong predicted label

Kitchen 0.460
Bedroom
InsideCity
Office
Store

Store 0.520
LivingRoom
InsideCity
Office
Office

Bedroom 0.330
Office
Kitchen
Kitchen
Office

LivingRoom 0.150
Office
Bedroom
Office
Office

Office 0.780
Bedroom
Bedroom
InsideCity
Bedroom

Industrial 0.300
Store
LivingRoom
Highway
Office

Suburb 0.910
Mountain
OpenCountry
InsideCity
Bedroom

InsideCity 0.420
Suburb
TallBuilding
Coast
Forest

TallBuilding 0.330
LivingRoom
Bedroom
Store
Street

Street 0.560
TallBuilding
Kitchen
Store
InsideCity

Highway 0.770
Coast
Coast
Coast
Coast

OpenCountry 0.260
Bedroom
Industrial
Mountain
Coast

Coast 0.670
Industrial
Suburb
Forest
Highway

Mountain 0.550
Kitchen
OpenCountry
Forest
Forest

Forest 0.960
Bedroom
OpenCountry
Mountain
Suburb

Category name Accuracy Sample training images Sample true positives False positives with true label False negatives with wrong predicted label

Category name	Accuracy	Sample training images	Sample true positives	False positives with true label	False negatives with wrong predicted label
Kitchen	0.460					Bedroom	InsideCity	Office	Store
Store	0.520					LivingRoom	InsideCity	Office	Office
Bedroom	0.330					Office	Kitchen	Kitchen	Office
LivingRoom	0.150					Office	Bedroom	Office	Office
Office	0.780					Bedroom	Bedroom	InsideCity	Bedroom
Industrial	0.300					Store	LivingRoom	Highway	Office
Suburb	0.910					Mountain	OpenCountry	InsideCity	Bedroom
InsideCity	0.420					Suburb	TallBuilding	Coast	Forest
TallBuilding	0.330					LivingRoom	Bedroom	Store	Street
Street	0.560					TallBuilding	Kitchen	Store	InsideCity
Highway	0.770					Coast	Coast	Coast	Coast
OpenCountry	0.260					Bedroom	Industrial	Mountain	Coast
Coast	0.670					Industrial	Suburb	Forest	Highway
Mountain	0.550					Kitchen	OpenCountry	Forest	Forest
Forest	0.960					Bedroom	OpenCountry	Mountain	Suburb
Category name	Accuracy	Sample training images	Sample true positives	False positives with true label	False negatives with wrong predicted label

Bag of Sift with SVM

I was able to get an accuracy of 0.678 with this combination. This is achieved at lambda = 10^-4 (0.0001). Detailed results are shown below:

Bag of Sift with Linear SVM, lambda = 0.0001

Accuracy (mean of diagonal of confusion matrix) is 0.678

Category name Accuracy Sample training images Sample true positives False positives with true label False negatives with wrong predicted label

Kitchen 0.600
Office
Bedroom
Office
LivingRoom

Store 0.580
Bedroom
InsideCity
Kitchen
LivingRoom

Bedroom 0.440
Office
OpenCountry
LivingRoom
LivingRoom

LivingRoom 0.330
Industrial
Bedroom
Office
Store

Office 0.840
Kitchen
Bedroom
Bedroom
Store

Industrial 0.490
Kitchen
InsideCity
LivingRoom
Highway

Suburb 0.940
Industrial
Kitchen
TallBuilding
Coast

InsideCity 0.500
Industrial
Store
Store
Kitchen

TallBuilding 0.760
Office
InsideCity
Forest
OpenCountry

Street 0.730
TallBuilding
Industrial
Industrial
Highway

Highway 0.840
OpenCountry
Street
Industrial
Coast

OpenCountry 0.540
Coast
TallBuilding
Suburb
Highway

Coast 0.770
OpenCountry
Industrial
Highway
OpenCountry

Mountain 0.870
LivingRoom
OpenCountry
OpenCountry
Forest

Forest 0.940
Store
Store
Mountain
Street

Category name Accuracy Sample training images Sample true positives False positives with true label False negatives with wrong predicted label

Category name	Accuracy	Sample training images	Sample true positives	False positives with true label	False negatives with wrong predicted label
Kitchen	0.600					Office	Bedroom	Office	LivingRoom
Store	0.580					Bedroom	InsideCity	Kitchen	LivingRoom
Bedroom	0.440					Office	OpenCountry	LivingRoom	LivingRoom
LivingRoom	0.330					Industrial	Bedroom	Office	Store
Office	0.840					Kitchen	Bedroom	Bedroom	Store
Industrial	0.490					Kitchen	InsideCity	LivingRoom	Highway
Suburb	0.940					Industrial	Kitchen	TallBuilding	Coast
InsideCity	0.500					Industrial	Store	Store	Kitchen
TallBuilding	0.760					Office	InsideCity	Forest	OpenCountry
Street	0.730					TallBuilding	Industrial	Industrial	Highway
Highway	0.840					OpenCountry	Street	Industrial	Coast
OpenCountry	0.540					Coast	TallBuilding	Suburb	Highway
Coast	0.770					OpenCountry	Industrial	Highway	OpenCountry
Mountain	0.870					LivingRoom	OpenCountry	OpenCountry	Forest
Forest	0.940					Store	Store	Mountain	Street
Category name	Accuracy	Sample training images	Sample true positives	False positives with true label	False negatives with wrong predicted label

Other than linear SVM, I've also tried to use radial kernel SVM for classifying the features. However, radial kernels do not seem to play well with this task. After trying a dozen of different parameters, I was only able to get to an accuracy of about 0.55. This can be achieved under several different settings. Two possible ones are sigma = 2 and lambda = 1, or sigma = 2^5 and lambda = 1e-2 (sigma is the radial kernel parameter). The confusion matrix for sigma = 2^5 and lambda = 1e-2 is shown below (it seems that store has much higher possibility of showing up in the result than other scenes):

Bag of Sift with Radial Kernel, sigma = 32 and lambda = 0.01

Accuracy (mean of diagonal of confusion matrix) is 0.554

Jincheng Li (jl253)

CS 143 / Project 3 / Scene Recognition with Bag of Words