Project 4: Face Detection with a Sliding Window

Brian Thomas

Baseline Version

The baseline version uses SIFT features as descriptors for faces. It had two total stages, the first which mined random negatives and the second which mined hard negatives, as in Dalal-Triggs. Also following their lead, new negative training features were merged with old ones. During face detection, a step size of 2 and a starting scale of 2 were used to increase accuracy at the tradeoff of speed. For the linear classifier, lambda = 100 was used. For the nonlinear classifier, an RBF kernel with lambda = 1 was used.

Results (linear):

An AP (average precision) of 0.673 was obtained.

Looking at the training data, the linear classifier had difficulty separating positive and negative examples (first and second stage classifiers below):

Stage 1. TPR: 0.465, FPR: 0.015, TNR: 0.485, FNR: 0.035
Stage 2. TPR: 0.259, FPR: 0.007, TNR: 0.660, FNR: 0.074

Since false negatives get eliminated at each stage of the cascade, to improve accuracy, we need to find better features, find a classifier which better separates the data, or create more steps in the cascade and eliminate less false negatives each time. The last two of these are explored below.

Results (nonlinear):

An AP (average precision) of 0.747 was obtained.

Looking at the training data, the nonlinear classifier better separated positive and negative examples (first and second stage classifiers below):

Stage 1. TPR: 0.497, FPR: 0.001, TNR: 0.499, FNR: 0.003
Stage 2. TPR: 0.319, FPR: 0.001, TNR: 0.666, FNR: 0.014

Viola-Jones cascade

Finally, creating more steps in the cascade and eliminating less false negatives (in other words, allowing more false positives) in the style of Viola-Jones was investigated. Two trials for reducing false-negatives were investigated. In each, as with Viola-Jones, new negative training features were not merged with old ones but instead replaced them in each stage of the cascade. For each, the cascade was limited to a maximum of 10 total stages. Both linear and nonlinear classifiers were tested. Finally, their runtimes both in testing and in evaluation were compared to those of the baseline. Except where mentioned above, all other parameters were held constant to the first cascade trials.

In the first trial, each stage of the cascade barring the last adjusted the offset (cascade{cur_stage,2} in code) so that all false negatives became true positives. This has the effect of eliminating the easiest true negatives early and enabling the classifier to then focus on the most face-like non-faces (false positives) in the next cascade.

In the second trial, each stage of the cascade barring the last adjusted the offset (cascade{cur_stage,2} in code) so that a different rate of false positives were allowed. In the first stage, all false negatives became true positives as in the first trial. Then, for each consecutive stage, the false positive rate was halved from the previous stage.

Results (linear, no false negatives):

An AP (average precision) of 0.562 was obtained.

Stage 1. TPR: 0.499, FPR: 0.369, TNR: 0.131, FNR: 0.001
Stage 2. TPR: 0.500, FPR: 0.446, TNR: 0.054, FNR: 0.000
Stage 3. TPR: 0.500, FPR: 0.412, TNR: 0.087, FNR: 0.000
Stage 4. TPR: 0.499, FPR: 0.469, TNR: 0.032, FNR: 0.001
Stage 5. TPR: 0.500, FPR: 0.483, TNR: 0.017, FNR: 0.000
Stage 6. TPR: 0.499, FPR: 0.485, TNR: 0.015, FNR: 0.001
Stage 7. TPR: 0.500, FPR: 0.495, TNR: 0.004, FNR: 0.000
Stage 8. TPR: 0.500, FPR: 0.498, TNR: 0.002, FNR: 0.000
Stage 9. TPR: 0.500, FPR: 0.493, TNR: 0.007, FNR: 0.000
Stage 10. TPR: 0.455, FPR: 0.003, TNR: 0.497, FNR: 0.045

Results (nonlinear, no false negatives):

An AP (average precision) of 0.793 was obtained.

Stage 1. TPR: 0.500, FPR: 0.022, TNR: 0.478, FNR: 0.000
Stage 2. TPR: 0.500, FPR: 0.049, TNR: 0.451, FNR: 0.000
Stage 3. TPR: 0.500, FPR: 0.256, TNR: 0.244, FNR: 0.000
Stage 4. TPR: 0.500, FPR: 0.223, TNR: 0.278, FNR: 0.000
Stage 5. TPR: 0.500, FPR: 0.304, TNR: 0.196, FNR: 0.000
Stage 6. TPR: 0.500, FPR: 0.196, TNR: 0.304, FNR: 0.000
Stage 7. TPR: 0.500, FPR: 0.172, TNR: 0.329, FNR: 0.000
Stage 8. TPR: 0.500, FPR: 0.472, TNR: 0.028, FNR: 0.000
Stage 9. TPR: 0.500, FPR: 0.427, TNR: 0.072, FNR: 0.000
Stage 10. TPR: 0.475, FPR: 0.002, TNR: 0.498, FNR: 0.025

Results (linear, controlled false positives):

An AP (average precision) of 0.340 was obtained.

Stage 1. TPR: 0.500, FPR: 0.378, TNR: 0.122, FNR: 0.000
Stage 2. TPR: 0.495, FPR: 0.189, TNR: 0.311, FNR: 0.004
Stage 3. TPR: 0.482, FPR: 0.095, TNR: 0.405, FNR: 0.018
Stage 4. TPR: 0.446, FPR: 0.048, TNR: 0.452, FNR: 0.054
Stage 5. TPR: 0.281, FPR: 0.025, TNR: 0.475, FNR: 0.218
Stage 6. TPR: 0.367, FPR: 0.012, TNR: 0.488, FNR: 0.133
Stage 7. TPR: 0.930, FPR: 0.006, TNR: 0.036, FNR: 0.028
Stage 8. TPR: 0.995, FPR: 0.004, TNR: 0.001, FNR: 0.000
Stage 9. TPR: 0.995, FPR: 0.002, TNR: 0.003, FNR: 0.000
Stage 10. TPR: 0.995, FPR: 0.005, TNR: 0.000, FNR: 0.000

Results (nonlinear, controlled false positives):

An AP (average precision) of 0.598 was obtained.

Stage 1. TPR: 0.500, FPR: 0.041, TNR: 0.459, FNR: 0.000
Stage 2. TPR: 0.500, FPR: 0.020, TNR: 0.480, FNR: 0.000
Stage 3. TPR: 0.495, FPR: 0.011, TNR: 0.489, FNR: 0.005
Stage 4. TPR: 0.489, FPR: 0.005, TNR: 0.494, FNR: 0.011
Stage 5. TPR: 0.771, FPR: 0.004, TNR: 0.223, FNR: 0.002
Stage 6. TPR: 0.982, FPR: 0.003, TNR: 0.015, FNR: 0.000
Stage 7. TPR: 0.987, FPR: 0.001, TNR: 0.012, FNR: 0.000
Stage 8. TPR: 0.990, FPR: 0.001, TNR: 0.009, FNR: 0.000
Stage 9. TPR: 0.990, FPR: 0.001, TNR: 0.009, FNR: 0.000
Stage 10. TPR: 0.991, FPR: 0.009, TNR: 0.000, FNR: 0.000

Summary

The best performance of each category for linear classifiers is underlined and for nonlinear classifiers is underlined and italicised.

Type	AP	Training time (s)	Testing time (s)
Linear (Dalal-Triggs baseline)	0.673	1164	598
Nonlinear (Dalal-Triggs baseline)	0.747	3250	2272
Linear no-FN (Viola-Jones cascade)	0.562	10273	907
Nonlinear no-FN (Viola-Jones cascade)	0.793	29824	2338
Linear controlled-FP (Viola-Jones cascade)	0.340	9126	668
Nonlinear controlled-FP (Viola-Jones cascade)	0.598	27541	2014

Discussion

In performance, the nonlinear no false negative Viola-Jones cascade performed the best of all the classifiers. The author would hypothesize that this result is due to the non-linear classifier being able to accurately shave off large non-face spaces.

A similar performance was not seen on the linear no false negative Viola-Jones cascade. It appears that the linear version, although able to also shave off some space, could not make as deep of cuts as the non-linear version.

Examining the FPR rate of both over the cascade, the linear version approaches a large (>0.4) FPR much more rapidly than the non-linear version. A large FPR would suggest that while false negatives could be found, there was not a boundary which could separate all of the faces (so they were TP) from most of the non-faces (so they were TN).

Both controlled false positive cascades performed poorly. It is hypothesized that this is because the false positive reduction rate was too aggressive. (Unfortunately, time contraints do not allow for this test. Second, both appear to have exhausted their training data, evidenced by a TPR approaching 1, with the linear classifier doing so after 6 stages and the nonlinear classifier doing so after 5.

It is also hypothesized that the controlled false positive cascades performed poorly because of a lack of data. Note how well, for instance, the linear Dalal-Triggs baseline classifier performed. In its second iteration, it was given twice as many false negative examples. This extra training data may have enabled clearer boundaries to be formed. (2000 examples in 128-dimensional space is pushing against the minimum rule-of-thumb ratio (~ a 10:1 examples-to-dimensions) for creating a boundary.) In other words, the poorly-performing cascades may have had poor boundaries due to lack of examples, thus making false negatives of actual faces.

Finally, the nonlinear controlled false positive Viola Jones cascade did slightly better for testing speed at the sacrifice of accuracy. It is hypothesized that this was because very few examples reached past stage 2, so performance was similar to the Dalal-Triggs baseline, but this data was not available to examine.