Algorithm
- Utilize a strong feature such as SIFT or HoG to dramatically improve detection accuracy over the baseline raw image patches.
- Implement a strategy for iteratively re-training a classifier with mined hard negatives and compare this to a baseline with random negatives.
- Utilize and compare linear and non-linear classifiers. Linear classifiers can be trained with large amounts of data, but they may not be expressive enough to actually benefit from that training data. Non-linear classifiers can represent complex decision boundaries, but are limited in the amount of training data they can use at once.
-
get_hard_negatives.m
Here, we iteratively bootstrap a database and a classifier. Each iteration's database contains the images most difficult to classify for that iteration's particular classifier. We do this by running an iteration's classifer upon the last iteration's databse. Since any database contains only false-negatives. (the base case is random images).
-
crops2features.m
. Convert crops to some better image feature. I normalized (divided by 255) the image, and I used two fancy gradient-based descriptors (SIFT and HOG).
-
primal_svm.m
. I train a linear and non-linear SVM classifier from the positive and negative examples. The linear kernel uses the linear kernel, a simple dotproduct, while the non-linear classifier used an RBF kernel. So the linear SVM's decision boundary is a line, while the the nonlinear SVM's decision boundary can be 'arbitarily' complex.
- As a stopping criterion, I simply hardcoded a number of iterations. More than 2 stages may train a too-conservative classifier.
- qualitatively, non-maximum suppression removes off-center heads (in which some neck/hair/cheek/etc contain enough 'face-features' to trigger a positive classification)
Results
Difficult v. Random Negatives
"Implement a strategy for iteratively re-training a classifier with mined hard negatives and compare this to a baseline with random negatives."
Shared Params
- max training = 100
- max crops = 5
- lambda = 10.0
- start scale = 2
Evaluation
Stages=1 (linear HOG)
Stages=2 (linear HOG)
Evaluation
Linear v. Nonlinear kernels
"Utilize and compare linear and non-linear classifiers. Linear classifiers can be trained with large amounts of data, but they may not be expressive enough to actually benefit from that training data. Non-linear classifiers can represent complex decision boundaries, but are limited in the amount of training data they can use at once."
linear kernel, HOG
Params
- trainding data = 1000
- 2 stages
- lambda = 10.0
- start scale = 1
Evaluation
nonlinear kernel, HOG
Params
- trainding data = 1000
- 2 stages
- lambda = 0.01
- start scale = 1
Evaluation
Start Scale (a parameter tuning of mine own)
Shared Params
- 2 stages
- bin size = 10
- step size = 36
- lambda = 10.0
- max training = 1000
- max crops = 10
Evaluation
Start Scale=3 (SIFT, linear kernel)
Start Scale=2 (SIFT, linear kernel)
Lambda, SVM with a RBF kernel (a parameter tuning of mine own)
Shared Params
- 2 stages
- lambda = 10.0
- max training = 1000
Evaluation
Lambda=1 (HOG, nonlinear kernel)
Lambda=0.01 (HOG, nonlinear kernel)
Discussion
Mining Hard Negatives
We see a threefold increase in accuracy by merely once mining for hard negatives.
Sophisticated Features
After getting dismal results from a normalized but raw image, I used SIFT and HOG representations of images. This itself dramatically improved performance.
Accuracy
I got accuracies between 23% and 63%. With an AP over 63% (seen above), my final classifier was a nonlinear SVM with RBF kernel, HOG feature respresentation, and a lambda of 0.01. It was trained upon 1000 positive and negative training data.
lambda and stages
Here, there is a dialog between two parameters: 'lambda' and 'stages'. lambda determines whether the SVM overfits or underfits, while stages determines how difficult the data an SVM is trained upon. they are quite dependent, and thus should not be independently tuned.
nonlinear SVMs
The nonlinear SVM is best tuned with a small (<1) lambda.
Musing