Project 4: Face Detection with a Sliding Window

Varun Singh

Overview

For this assignment I implemented a face detector. I used SIFT features and both a linear and non-linear SVM. For the entire assignment, my parameters for the sift features were a step size of 36 and a bin size of 10.

We were given crops of faces as our positive training data. First, I converted these face crops to features, by converting each 36x36 image patch to one feature using the SIFT parameters described above.

Then, I got 1000 random negative training images, and converted these images to features using VLFeat's dsift function, instead of first converting the image to a series of crops, using the same parameters as above (bin size of 10, step of 36). This was much, much faster. Then, I randomly sampled 1000 out of all of the positive features.

Finally, I trained my linear SVM, which used a sliding window model. I used a default lambda of 100, and the detector parameters were: step_size of 4, scale_factor of 1.5, and start_scale of 3.

After doing this, I ran through the whole process again, but instead of using random negatives I mined for hard negatives instead by running my classifier on the negative traning examples, and using the false positives as my negative examples (thus getting the 'hardest' images that my classifier was failing on). I determined that the ideal number for max_crops_per_scn, which limits the number of crops for each scene randomly, was 10. Using all of these parameters, and just one round of randomly mining hard negatives (two total stages of training), I got an accuracy of 39.8%:

Running the classifier for 5 stages instead actually (surprisingly) worsened my accuracy, resulting in an accuracy of 31.7%:

As you can see, the TPR went down with each succesive stage:

Stage 1. TPR: 0.500, FPR: 0.000, TNR: 0.500, FNR: 0.000

Stage 2. TPR: 0.325, FPR: 0.008, TNR: 0.659, FNR: 0.009

Stage 3. TPR: 0.226, FPR: 0.014, TNR: 0.736, FNR: 0.024

Stage 4. TPR: 0.194, FPR: 0.013, TNR: 0.763, FNR: 0.030

Stage 5. TPR: 0.182, FPR: 0.013, TNR: 0.774, FNR: 0.030

Running the classifier without mining for hard negatives at all (only one stage of training for the classifier), was significantly worse. I got an accuracy of 24.3%

Lowering the parameters for detector scaling, stepping, and starting in run_detector helped immensely, but only when done right. When running with: All the above parameters (with one round of mining for hard negatives , and these new parameters in run_detector):

step_size = 3 (default 4)

scale_factor = 1.5 (default 1.5)

start_scale = 2 (default 3)

I got an accuracy of 64.8%, a huge improvement over the default parameters, although it did take longer to run. Stupidly enough, I tried running it at first with a scale_factor of 1, and clearly got 0% accuracy.

With these new parameters, increasing the amount of training images to 2000 also helped significantly, resulting in an accuracy of 74.3%:

I got the highest accuracy with my nonlinear classifier. I got an accuracy of 74.7% with all of these parameters:

step_size = 2 (default 4)

scale_factor = 1.5 (default 1.5)

start_scale = 2 (default 3)

1000 negative/positive examples

One stage (no mining for hard negatives)

lambda: 1

sigma: 50

Although the accuracy is about the same as my highest for the linear classifier (74.3%) , it is significantly higher than the linear classifier with the same number (1000) of training images, which got 64.8%.

This is a visualization of my kernel: