CS143 Introduction to Computer Vision: Project 4 Face Detection with a Sliding Window

Jixiong Wang (jameswang@cs.brown.edu)

 

For this project we will be implementing a sliding window face detector.

Baseline Method

The three key elements of the baseline algorithm is described as below:

  1. Representation: SIFT features.
  2. Strategies for utilizing training data: Two-stage classifier with mined hard negatives described in Dalal-Triggs 2005.
  3. Classification methods: Linear SVM and non-linear SVM with the rbf kernel.

Raw Result

Linear vs non-linear:

linear
non-linear
AP=0.355
AP=0.389

where the lambda for the linear classifier is tuned to 100, the lambda for the non-linear classifier is 1 and the sigma is tuned to 1000. We could see a big improvement of the average precision. In the following experiment, we will always use the rbf kernel with the same parameters.

Random negatives vs mined hard negatives:

random negatives
mined hard negatives
AP=0.389
AP=0.391

We use the similar mining method described in Dalal-Triggs 2005. The above result is gained in the simplest two-stage model which could already beat the former one's performance. As expected, more mining stages will gain an improvement of the average precision.

Cascade Architecture

In this part, a cascade architecture described in Viola-Jones 2001 is implemented.

The classifier at each node will be learned using only the positives and the misclassified negatives by the former classifiers in the cascade. We will control the false positive around 0.3 at each node except the last one. A 10-node cascade architecture is used to gain the following result:

non-cascade
cascade
AP=0.391
AP=0.398

We could see that the cascade model outperforms the non-cascade one because it makes it possible to use much more negatives to train the classifier using less time.

Asymmetric Classifier

To make the classifier awared of the asymmetry between the false negatives and the false positives, the asymmetric classifier which could be used at each node of the cascade architecture is proposed in Wu et al. 2008:

Because the closed-form solution will only control the false positive at 0.5, more nodes are needed in our architecture. We use a 32-node cascade architecture in this case to gain the following result:

symmetric
asymmetric
AP=0.398
AP=0.405

An improvement is seen by using the asymetric model and this is the result we will report for this project.

Other Classification Method

In this part, a nearest neightbor method is also tested for our face detection task. At first, 1000 random positives and 1000 random negatives forms the training set. Later, more examples are used to form the training set. Although in our experiment, the average precision is much lower than that gained by the former model, as expected, a larger training set will gain a higher precision:

References

  1. Rowley et al. 1998
  2. Dalal-Triggs 2005
  3. Viola-Jones 2001
  4. Wu et al. 2008.