Face Detection with a Sliding Window
by Sam Swarr (sswarr)
CS 143 Fall 2011
 |
 |
Confidences of 0.3 or higher on the easy picture. |
Confidences of 0.5 or higher on the easy picture. (It even detects clock faces!) |
 |
 |
Confidences of 0.3 or higher on the hard picture. |
Confidences of 0.5 or higher on the hard picture. |
Training the Linear SVM Classifier
In order to train the classifier, I need to feed it features of positive facial crops and features of non-face crops. I chose to use a SIFT (via vl_dsift
) to obtain features since they are more robust and invariant than raw image patches. I obtained a single SIFT feature for each of the faces in the positive training set. Then, I obtained an initial collection of 1,000 random negative features by running a SIFT on random crops of images containing no faces. Then, with 1,000 of the positive features and the 1,000 negative features, I initially trained my linear SVM using a lambda value of 50. Here is the performance of the classifier after this first stage of training:
Linear SVM Classifier; lambda = 50.0; Fed 1000 positives and 1000 random negatives
Linear SVM Classifier; lambda = 50.0; Fed 1000 positives and 1000 random negatives
Step-size = 2; Scale-factor = 1.2; Start-scale = 2
|
Stage 1 |
TPR |
0.489 |
FPR |
0.009 |
TNR |
0.491 |
FNR |
0.011 |
To hopefully improve on this, I used the above SVM on a series of non-face scenes. 1,000 false positives detected here were transformed into SIFT features and added to the pool of negatives. The SVM was then retrained with these mined hard-negatives. Here is the performance of the classifier after two stages of training:
Linear SVM Classifier; lambda = 50.0; Fed 2000 positives and 2000 negatives (half random negatives/half hard negatives)
Linear SVM Classifier; lambda = 50.0; Fed 2000 positives and 2000 negatives (half random negatives/half hard negatives)
Step-size = 2; Scale-factor = 1.2; Start-scale = 2
|
Stage 1 |
Stage 2 |
TPR |
0.481 |
0.301 |
FPR |
0.012 |
0.021 |
TNR |
0.488 |
0.646 |
FNR |
0.018 |
0.033 |
As hoped, using mined hard-negatives improved precision by over 8%. Note also that the true-negative rate went up dramatically as a result of using mined hard-negatives. This most likely contributed to the precision improvement.
Using a Non-Linear SVM Classifier
In attempts to improve precision, I experimented with a non-linear SVM classifier. I first compared a linear SVM and a non-linear SVM using lax detection parameters to speed up testing. The non-linear SVM had around a 10% precision increase. I then ran the detector with the non-linear SVM using tighter parameters. Here are the results:
Non-Linear SVM Classifier; lambda = 1.0; sigma = 200.0; Fed 2000 positives and 2000 negatives (half random negatives/half hard negatives)
Non-Linear SVM Classifier; lambda = 1.0; sigma = 200.0; Fed 2000 positives and 2000 negatives (half random negatives/half hard negatives)
Step-size = 2; Scale-factor = 1.2; Start-scale = 3
*My Personal Best Result*
|
Stage 1 |
Stage 2 |
TPR |
0.498 |
0.494 |
FPR |
0.001 |
0.001 |
TNR |
0.499 |
0.499 |
FNR |
0.002 |
0.005 |
As you can see, the non-linear SVM outperformed the linear SVM by nearly 13%.
Side-by-Side Comparisons
The detector parameters for both were: step = 2, scale = 1.2, start-scale = 2
Linear SVM trained only on random negatives |
Linear SVM trained on random and mined hard-negatives |
 |
 |
Both classifiers were trained on 2000 positives and 2000 negatives (half random and half hard). The detector parameters for both were: step = 2, scale = 1.2
Linear SVM (lambda = 50) |
Non-linear SVM (lambda = 1; sigma = 200) |
 |
 |
Conclusion
Overall, I'm happy with my results. I was pleased to see that using mined hard-negatives increased precision over just random negatives, and that using a non-linear SVM increased precision over a linear one. Had I had the time to run the classifier over night, I would've liked to have used more training data and tweak the detector parameters to be even more thorough.