Project 4: Face detection with a sliding window

CS 143: Introduction to Computer Vision

Li Sun (li)

Overview

A basic flow of our face detection algorithm with a sliding window is as follows:

Training

Load positive crops (crops with face).
Get random negative crops (first stage) or hard negative crops (false positives got from SVM trained in the previous stage).
Extract dense SIFT features from both positive crops and negative crops.
Train a non-linear SVM with the features extracted as well as their labels.
Go to step 2 unless stopping criteria met.

Detecting

Multi-scale provided images, and break the multi-scaled images into patches.
Extract dense SIFT features from those patches.
Feed our final non-linear SVM with the extracted features.
Perform non-maximum suppression to remove redundant detections.

Results

Here is the final detection result of our face detector:

Discussion

Parameters

As we use non-linear SVM with RBF kernel, we have two parameters, sigma and lumbda, need to be decided. We trained our non-linear SVM with different lumbdas and sigmas with the same training set and testing set to see their performances. The results are shown in figure 1. We can see choosing lumbda = 0.001, sigma = 200 produces the best performance.

Figure 1: Results with different lumbdas and sigmas

Hard Negative Mining

The strategy of mining hard negatives we used is as follows:

For the first stage, we randomly mine 1000 negative crops for training
For the nth (n>1) stage,
1. we mine 1000 hard negatives(false positives) with SVM trained in the (n-1)th stage.
2. Combine hard negatives and the old negatives into a new negative set for training

We set total stage as 2(because it's hard to find 1000 hard negatives after 2 stages) and ran the program. The result is shown in figure 2. We can see that the detector performs slightly better after mining hard negatives.

Figure 2: Testing result for each stage

Linear SVM and Non-Linear SVM

When only random negatives are chosen as negative data, we test linear SVM and non-linear SVM, both of which are trained with the same training data(1000 positives and 1000 negatives). The performances are similar (both around 0.4).

This result is not surprising because non-linear SVM only works better than linear SVM does when positive data and negative data cannot be easily(linearly) seperated.