The baseline version uses SIFT features as descriptors for faces. It had two total stages, the first which mined random negatives and the second which mined hard negatives, as in Dalal-Triggs. Also following their lead, new negative training features were merged with old ones. During face detection, a step size of 2 and a starting scale of 2 were used to increase accuracy at the tradeoff of speed. For the linear classifier, lambda = 100 was used. For the nonlinear classifier, an RBF kernel with lambda = 1 was used.
Since false negatives get eliminated at each stage of the cascade, to improve accuracy, we need to find better features, find a classifier which better separates the data, or create more steps in the cascade and eliminate less false negatives each time. The last two of these are explored below.
Finally, creating more steps in the cascade and eliminating less false negatives (in other words, allowing more false positives) in the style of Viola-Jones was investigated. Two trials for reducing false-negatives were investigated. In each, as with Viola-Jones, new negative training features were not merged with old ones but instead replaced them in each stage of the cascade. For each, the cascade was limited to a maximum of 10 total stages. Both linear and nonlinear classifiers were tested. Finally, their runtimes both in testing and in evaluation were compared to those of the baseline. Except where mentioned above, all other parameters were held constant to the first cascade trials.
In the first trial, each stage of the cascade barring the last adjusted the offset (cascade{cur_stage,2} in code) so that all false negatives became true positives. This has the effect of eliminating the easiest true negatives early and enabling the classifier to then focus on the most face-like non-faces (false positives) in the next cascade.
In the second trial, each stage of the cascade barring the last adjusted the offset (cascade{cur_stage,2} in code) so that a different rate of false positives were allowed. In the first stage, all false negatives became true positives as in the first trial. Then, for each consecutive stage, the false positive rate was halved from the previous stage.
Type | AP | Training time (s) | Testing time (s) |
Linear (Dalal-Triggs baseline) | 0.673 | 1164 | 598 |
Nonlinear (Dalal-Triggs baseline) | 0.747 | 3250 | 2272 |
Linear no-FN (Viola-Jones cascade) | 0.562 | 10273 | 907 |
Nonlinear no-FN (Viola-Jones cascade) | 0.793 | 29824 | 2338 |
Linear controlled-FP (Viola-Jones cascade) | 0.340 | 9126 | 668 |
Nonlinear controlled-FP (Viola-Jones cascade) | 0.598 | 27541 | 2014 |
In performance, the nonlinear no false negative Viola-Jones cascade performed the best of all the classifiers. The author would hypothesize that this result is due to the non-linear classifier being able to accurately shave off large non-face spaces.
A similar performance was not seen on the linear no false negative Viola-Jones cascade. It appears that the linear version, although able to also shave off some space, could not make as deep of cuts as the non-linear version.
Examining the FPR rate of both over the cascade, the linear version approaches a large (>0.4) FPR much more rapidly than the non-linear version. A large FPR would suggest that while false negatives could be found, there was not a boundary which could separate all of the faces (so they were TP) from most of the non-faces (so they were TN).
Both controlled false positive cascades performed poorly. It is hypothesized that this is because the false positive reduction rate was too aggressive. (Unfortunately, time contraints do not allow for this test. Second, both appear to have exhausted their training data, evidenced by a TPR approaching 1, with the linear classifier doing so after 6 stages and the nonlinear classifier doing so after 5.
It is also hypothesized that the controlled false positive cascades performed poorly because of a lack of data. Note how well, for instance, the linear Dalal-Triggs baseline classifier performed. In its second iteration, it was given twice as many false negative examples. This extra training data may have enabled clearer boundaries to be formed. (2000 examples in 128-dimensional space is pushing against the minimum rule-of-thumb ratio (~ a 10:1 examples-to-dimensions) for creating a boundary.) In other words, the poorly-performing cascades may have had poor boundaries due to lack of examples, thus making false negatives of actual faces.
Finally, the nonlinear controlled false positive Viola Jones cascade did slightly better for testing speed at the sacrifice of accuracy. It is hypothesized that this was because very few examples reached past stage 2, so performance was similar to the Dalal-Triggs baseline, but this data was not available to examine.