CS 143: Face detection with a sliding window
Algorithm
A sliding window face detector implements a brute-force method: test all patches that might contain a face and return any detections. This is implemented by sliding a rectangular window across the image at different positions and scales. If the same face is detected at multiple scales, non-maximal suppression will pick the best match.
Classifier
Each patch was represented by a single SIFT feature, calculated using the VLFeat library with a patch size of 36 and a bin size of 10. I used a linear classifier, specifically Olivier Chapelle's primal_svm with a lambda of 1. The classifier was initially trained on 5000 randomly chosen positive and negative examples. To reduce the false positive rate, I retrained the classifier twice with an additional set of 5000 negative examples chosen as the most confident by the previous classifier. I added a constant of 2 to each confidence to improve the recall rate.
Detector
The sliding window was implemented using vl_dsift with a bin size of 10 and a step size of 2. To detect faces at multiple scales, the image was repeatedly downscaled by 2/3 and the sliding window was run at each level until the image was less than 50 pixels along one dimension.
Results
My face detector achieved an accuracy of 77.2% on the test set.
The graphs below summarize the detector's performance after 1, 2, and 3 training iterations. The graph on the left shows the positive training examples (green) and the negative training examples (red) sorted by confidence. The graph on the right shows precision (percentage of positives that were correct) versus recall (percentage of positives identified correctly).
Classifications | Precision-Recall |
---|---|
![]() The classifications after 1 iteration |
![]() The precision-recall curve after 1 iteration (accuracy of 71.9%) |
![]() The classifications after 2 iterations |
![]() The precision-recall curve after 2 iterations (accuracy of 58%) |
![]() The classifications after 3 iteration |
![]() The precision-recall curve after 3 iterations (accuracy of 77.2%) |
One interesting way to look at the results is to sort all detections by confidence. The images below show the first ten positive and negative results in that sorted order. Notice how the first 115 detections are all correct ones.
Positive | Negative |
---|---|
![]() First correct at #1 |
![]() First incorrect at #116 |
![]() Second correct at #2 |
![]() Second incorrect at #168 |
![]() Third correct at #3 |
![]() Third incorrect at #183 |
![]() Fourth correct at #4 |
![]() Fourth incorrect at #189 |
![]() Fifth correct at #5 |
![]() Fifth incorrect at #198 |
![]() Sixth correct at #6 |
![]() Sixth incorrect at #215 |
![]() Seventh correct at #7 |
![]() Seventh incorrect at #217 |
![]() Eighth correct at #8 |
![]() Eighth incorrect at #221 |
![]() Ninth correct at #9 |
![]() Ninth incorrect at #223 |
![]() Tenth correct at #10 |
![]() Tenth incorrect at #224 |
I also tried a non-linear classifier with an RBF kernel, but the parameters were sensitive and I couldn't find parameters for which the accuracy improved over the linear classifier. The best combination I found was a lambda of 1 and a sigma of 100:
Classifications | Precision-Recall |
---|---|
![]() The classifications for the non-linear classifier |
![]() The precision-recall curve for the non-linear classifier |