Face Detection with a Sliding Window
by Sam Swarr (sswarr)

CS 143 Fall 2011

easy0.3 easy0.5
Confidences of 0.3 or higher on the easy picture. Confidences of 0.5 or higher on the easy picture.
(It even detects clock faces!)


hard0.3 hard0.5
Confidences of 0.3 or higher on the hard picture. Confidences of 0.5 or higher on the hard picture.

Training the Linear SVM Classifier

In order to train the classifier, I need to feed it features of positive facial crops and features of non-face crops. I chose to use a SIFT (via vl_dsift) to obtain features since they are more robust and invariant than raw image patches. I obtained a single SIFT feature for each of the faces in the positive training set. Then, I obtained an initial collection of 1,000 random negative features by running a SIFT on random crops of images containing no faces. Then, with 1,000 of the positive features and the 1,000 negative features, I initially trained my linear SVM using a lambda value of 50. Here is the performance of the classifier after this first stage of training:

Linear SVM Classifier; lambda = 50.0; Fed 1000 positives and 1000 random negatives

stage1


Linear SVM Classifier; lambda = 50.0; Fed 1000 positives and 1000 random negatives
Step-size = 2; Scale-factor = 1.2; Start-scale = 2

onestageap

Stage 1

TPR

0.489

FPR

0.009

TNR

0.491

FNR

0.011

To hopefully improve on this, I used the above SVM on a series of non-face scenes. 1,000 false positives detected here were transformed into SIFT features and added to the pool of negatives. The SVM was then retrained with these mined hard-negatives. Here is the performance of the classifier after two stages of training:

Linear SVM Classifier; lambda = 50.0; Fed 2000 positives and 2000 negatives (half random negatives/half hard negatives)

stage2

Linear SVM Classifier; lambda = 50.0; Fed 2000 positives and 2000 negatives (half random negatives/half hard negatives)
Step-size = 2; Scale-factor = 1.2; Start-scale = 2

twostagesap

Stage 1

Stage 2

TPR

0.481 0.301

FPR

0.012 0.021

TNR

0.488 0.646

FNR

0.018 0.033


As hoped, using mined hard-negatives improved precision by over 8%. Note also that the true-negative rate went up dramatically as a result of using mined hard-negatives. This most likely contributed to the precision improvement.

Using a Non-Linear SVM Classifier

In attempts to improve precision, I experimented with a non-linear SVM classifier. I first compared a linear SVM and a non-linear SVM using lax detection parameters to speed up testing. The non-linear SVM had around a 10% precision increase. I then ran the detector with the non-linear SVM using tighter parameters. Here are the results:

Non-Linear SVM Classifier; lambda = 1.0; sigma = 200.0; Fed 2000 positives and 2000 negatives (half random negatives/half hard negatives)

nonlinearclass

Non-Linear SVM Classifier; lambda = 1.0; sigma = 200.0; Fed 2000 positives and 2000 negatives (half random negatives/half hard negatives)
Step-size = 2; Scale-factor = 1.2; Start-scale = 3
*My Personal Best Result*

nonlinearap

Stage 1

Stage 2

TPR

0.498 0.494

FPR

0.001 0.001

TNR

0.499 0.499

FNR

0.002 0.005

As you can see, the non-linear SVM outperformed the linear SVM by nearly 13%.

Side-by-Side Comparisons

The detector parameters for both were: step = 2, scale = 1.2, start-scale = 2
Linear SVM trained only on random negatives Linear SVM trained on random and mined hard-negatives
onestageap twostagesap

Both classifiers were trained on 2000 positives and 2000 negatives (half random and half hard). The detector parameters for both were: step = 2, scale = 1.2
Linear SVM (lambda = 50) Non-linear SVM (lambda = 1; sigma = 200)
twostagesap nonlinearap

Conclusion

Overall, I'm happy with my results. I was pleased to see that using mined hard-negatives increased precision over just random negatives, and that using a non-linear SVM increased precision over a linear one. Had I had the time to run the classifier over night, I would've liked to have used more training data and tweak the detector parameters to be even more thorough.