For this project we will be implementing a sliding window face detector.
![]() |
Baseline Method
The three key elements of the baseline algorithm is described as below:
- Representation: SIFT features.
- Strategies for utilizing training data: Two-stage classifier with mined hard negatives described in Dalal-Triggs 2005.
- Classification methods: Linear SVM and non-linear SVM with the rbf kernel.
Raw Result
Linear vs non-linear:
linear |
non-linear |
![]() |
![]() |
AP=0.355 |
AP=0.389 |
where the lambda for the linear classifier is tuned to 100, the lambda for the non-linear classifier is 1 and the sigma is tuned to 1000. We could see a big improvement of the average precision. In the following experiment, we will always use the rbf kernel with the same parameters.
Random negatives vs mined hard negatives:
random negatives |
mined hard negatives |
![]() |
![]() |
AP=0.389 |
AP=0.391 |
We use the similar mining method described in Dalal-Triggs 2005. The above result is gained in the simplest two-stage model which could already beat the former one's performance. As expected, more mining stages will gain an improvement of the average precision.
Cascade Architecture
In this part, a cascade architecture described in Viola-Jones 2001 is implemented.
![]() |
The classifier at each node will be learned using only the positives and the misclassified negatives by the former classifiers in the cascade. We will control the false positive around 0.3 at each node except the last one. A 10-node cascade architecture is used to gain the following result:
non-cascade |
cascade |
![]() |
![]() |
AP=0.391 |
AP=0.398 |
We could see that the cascade model outperforms the non-cascade one because it makes it possible to use much more negatives to train the classifier using less time.
Asymmetric Classifier
To make the classifier awared of the asymmetry between the false negatives and the false positives, the asymmetric classifier which could be used at each node of the cascade architecture is proposed in Wu et al. 2008:
![]() |
Because the closed-form solution will only control the false positive at 0.5, more nodes are needed in our architecture. We use a 32-node cascade architecture in this case to gain the following result:
symmetric |
asymmetric |
![]() |
![]() |
AP=0.398 |
AP=0.405 |
An improvement is seen by using the asymetric model and this is the result we will report for this project.
Other Classification Method
In this part, a nearest neightbor method is also tested for our face detection task. At first, 1000 random positives and 1000 random negatives forms the training set. Later, more examples are used to form the training set. Although in our experiment, the average precision is much lower than that gained by the former model, as expected, a larger training set will gain a higher precision:
![]() |