Face Detection (i.e. how to teach a computer to be a creeper)

Alex Hills (ahills)

Project

In this project we implement face detection using strong features and a moving window classifier. We train classifiers on many negatives pulled from a sampling of images and a large number of positive crops (36x36 pixels).

Pipeline

Step 1: First, we load a selection of known faces. In this case, ~6700 of them. We then extract feature descriptions from those faces using some feature metric (SIFT, HOG, etc).

Step 2: Load a random selection of non-faces from a large selection of images known to contain no faces and extract features from those.

Step 3: Use those positive and negative features to train a SVM (linear or nonlinear) to use to classify unknown images.

Step 3.5: If doing hard-negative analysis, run the classifier on known negatives and use any false positive results to retrain the classifier (add to negative features and go back to step 3).

Step 4: Run the resultant classifier on unknown images, then perform non-maximum suppression on the results

Step 5: Compute precision-recall.

Results (81.2 percent at best), extensions, and observations

All results here were using 10k negative samples and all 6.7k positive samples. The step size for the detector is 3, scale 1.5, start scale 2. Lambda: 100.

The first test uses a custom HOG feature descriptor where the weighting metric was just a vote for each direction (as opposed to a magnitude). HOG detailed later.
Stage 1. TPR: 0.514, FPR: 0.001, TNR: 0.482, FNR: 0.003

This test used SIFT features, with all other parameters the same. It took MUCH longer to run, and didn't perform as well as the HOG (even a suboptimal HOG).
Stage 1. TPR: 0.515, FPR: 0.000, TNR: 0.485, FNR: 0.000

Better HOG here, weighted contributions to the histogram are based on the magnitue of the gradient. This increased performance a fair bit as compared to an unweighted HOG (unsurprisingly). This was the best socre I got, 81.2%
Stage 1. TPR: 0.514, FPR: 0.000, TNR: 0.485, FNR: 0.001

This used a nonlinear SVM. It performed worse. I may have trained it with too much data (doesn't make sense?). I'm not sure about the performance here. Lambda 200, sigma 20 (it was the only way to get a range of values for the kernal).

Finally, I used a 3-stage process with two iterations of hard-negative analysis. This actually made my performance worse. It seemed like ti actually made the classifier less likely to get positives (makes sense with more negative examples) and more likely to get negatives (again, makes sense). However, the negative effect of this was greater than the positive effect of reducing false positives.
Stage 1. TPR: 0.515, FPR: 0.000, TNR: 0.485, FNR: 0.001
Stage 2. TPR: 0.498, FPR: 0.001, TNR: 0.500, FNR: 0.002
Stage 3. TPR: 0.483, FPR: 0.002, TNR: 0.511, FNR: 0.005

HOG

I found an implementation of a HOG descriptor online, but, honestly, it had a number of issues that needed fixing. I have the end result here. It's different than the paper's implementation of a HOG in the following ways: 1) It doesn't use any overlap for the cells. 2) It doesn't do the block summing, which actually seemed to be totally ok in this case. The illumination and contrast issues that could arise without binning didn't seem to impact this. I tried a number of different cell sizes and orientation numbers for the gradients, and ended up using signed gradients with 9 bins, done over 3x3 pixel cells (this follows the paper closely, though not exactly).





Credits: totally stole some of this stylesheet from Paul