Sam Boosalis

Algorithm

Utilize a strong feature such as SIFT or HoG to dramatically improve detection accuracy over the baseline raw image patches.
Implement a strategy for iteratively re-training a classifier with mined hard negatives and compare this to a baseline with random negatives.
Utilize and compare linear and non-linear classifiers. Linear classifiers can be trained with large amounts of data, but they may not be expressive enough to actually benefit from that training data. Non-linear classifiers can represent complex decision boundaries, but are limited in the amount of training data they can use at once.

get_hard_negatives.m Here, we iteratively bootstrap a database and a classifier. Each iteration's database contains the images most difficult to classify for that iteration's particular classifier. We do this by running an iteration's classifer upon the last iteration's databse. Since any database contains only false-negatives. (the base case is random images).
crops2features.m . Convert crops to some better image feature. I normalized (divided by 255) the image, and I used two fancy gradient-based descriptors (SIFT and HOG).
primal_svm.m . I train a linear and non-linear SVM classifier from the positive and negative examples. The linear kernel uses the linear kernel, a simple dotproduct, while the non-linear classifier used an RBF kernel. So the linear SVM's decision boundary is a line, while the the nonlinear SVM's decision boundary can be 'arbitarily' complex.
As a stopping criterion, I simply hardcoded a number of iterations. More than 2 stages may train a too-conservative classifier.
qualitatively, non-maximum suppression removes off-center heads (in which some neck/hair/cheek/etc contain enough 'face-features' to trigger a positive classification)

Results

Difficult v. Random Negatives

"Implement a strategy for iteratively re-training a classifier with mined hard negatives and compare this to a baseline with random negatives."

Shared Params

max training = 100
max crops = 5
lambda = 10.0
start scale = 2

Evaluation

Stages=1 (linear HOG)

Stages=2 (linear HOG)

Evaluation

Linear v. Nonlinear kernels

"Utilize and compare linear and non-linear classifiers. Linear classifiers can be trained with large amounts of data, but they may not be expressive enough to actually benefit from that training data. Non-linear classifiers can represent complex decision boundaries, but are limited in the amount of training data they can use at once."

linear kernel, HOG

Params

trainding data = 1000
2 stages
lambda = 10.0
start scale = 1

Evaluation

nonlinear kernel, HOG

Params

trainding data = 1000
2 stages
lambda = 0.01
start scale = 1

Evaluation

Start Scale (a parameter tuning of mine own)

Shared Params

2 stages
bin size = 10
step size = 36
lambda = 10.0
max training = 1000
max crops = 10

Evaluation

Start Scale=3 (SIFT, linear kernel)

Start Scale=2 (SIFT, linear kernel)

Lambda, SVM with a RBF kernel (a parameter tuning of mine own)

Shared Params

2 stages
lambda = 10.0
max training = 1000

Evaluation

Lambda=1 (HOG, nonlinear kernel)

Lambda=0.01 (HOG, nonlinear kernel)

Discussion

Mining Hard Negatives

We see a threefold increase in accuracy by merely once mining for hard negatives.

Sophisticated Features

After getting dismal results from a normalized but raw image, I used SIFT and HOG representations of images. This itself dramatically improved performance.

Accuracy

I got accuracies between 23% and 63%. With an AP over 63% (seen above), my final classifier was a nonlinear SVM with RBF kernel, HOG feature respresentation, and a lambda of 0.01. It was trained upon 1000 positive and negative training data.

lambda and stages

Here, there is a dialog between two parameters: 'lambda' and 'stages'. lambda determines whether the SVM overfits or underfits, while stages determines how difficult the data an SVM is trained upon. they are quite dependent, and thus should not be independently tuned.

nonlinear SVMs

The nonlinear SVM is best tuned with a small (<1) lambda.

Musing