In this project we implement face detection using strong features and a moving window classifier. We train classifiers on many negatives pulled from a sampling of images and a large number of positive crops (36x36 pixels).
Step 1: First, we load a selection of known faces. In this case, ~6700 of them. We then extract feature descriptions from those faces using some feature metric (SIFT, HOG, etc).
Step 2: Load a random selection of non-faces from a large selection of images known to contain no faces and extract features from those.
Step 3: Use those positive and negative features to train a SVM (linear or nonlinear) to use to classify unknown images.
Step 3.5: If doing hard-negative analysis, run the classifier on known negatives and use any false positive results to retrain the classifier (add to negative features and go back to step 3).
Step 4: Run the resultant classifier on unknown images, then perform non-maximum suppression on the results
Step 5: Compute precision-recall.
All results here were using 10k negative samples and all 6.7k positive samples. The step size for the detector is 3, scale 1.5, start scale 2. Lambda: 100.
The first test uses a custom HOG feature descriptor where the weighting metric was just a vote for each direction (as opposed to a magnitude). HOG detailed later.
Stage 1. TPR: 0.514, FPR: 0.001, TNR: 0.482, FNR: 0.003
This test used SIFT features, with all other parameters the same. It took MUCH longer to run, and didn't perform as well as the HOG (even a suboptimal HOG).
Stage 1. TPR: 0.515, FPR: 0.000, TNR: 0.485, FNR: 0.000
Better HOG here, weighted contributions to the histogram are based on the magnitue of the gradient. This increased performance a fair bit as compared to an unweighted HOG (unsurprisingly). This was the best socre I got, 81.2%
Stage 1. TPR: 0.514, FPR: 0.000, TNR: 0.485, FNR: 0.001
This used a nonlinear SVM. It performed worse. I may have trained it with too much data (doesn't make sense?). I'm not sure about the performance here. Lambda 200, sigma 20 (it was the only way to get a range of values for the kernal).
Finally, I used a 3-stage process with two iterations of hard-negative analysis. This actually made my performance worse. It seemed like ti actually made the classifier less likely to get positives (makes sense with more negative examples) and more likely to get negatives (again, makes sense). However, the negative effect of this was greater than the positive effect of reducing false positives.
Stage 1. TPR: 0.515, FPR: 0.000, TNR: 0.485, FNR: 0.001
Stage 2. TPR: 0.498, FPR: 0.001, TNR: 0.500, FNR: 0.002
Stage 3. TPR: 0.483, FPR: 0.002, TNR: 0.511, FNR: 0.005