Project 4: Face detection with a sliding window

mashby's writeup

Intro

This project uses the sliding window detection system to detect faces in photographs. In this fairly simple model, each patch of a picture is considered, and a trained classifier decides whether the patch is a face or not.

General algorithm

The first step in the algoritm is to collect features from faces and non-faces. In my impelementation, I used a single SIFT vector as the feature for a given patch, using vl_dsift with a bin size of 10. Using this strong SIFT feature immediately gave a significant improvement over the raw image data. SIFT features are gathered in the functions crops2features.m and get_img_feats.m. Then, using these collected features a linear or non-linear SVM is trained on the set of positives (faces) and negatives (non-faces). Finally, the sliding window is used in conjunction with the trained SVM over a test set to find faces, and average precision as compared to base truth is recorded.

Results

I use three different approaches to compare the extent of improvements in different key parameters and algorithm choices. In each approach, I use 1000 positive and 1000 negative examples in training training. I also use both the linear SVM and the non-linear SVM each time to compare the differences between these two models accross each approach. For the linear SVM, I use a lamda of 10, and in the non-linear SVM I use lambda = 1, sigma = 500, and the 'rbf' kernel.

Approach 1: Random

In the first approach, I simply use random features from the negative data to train the SVM.

Linear SVM
Non-linear SVM

Approach 2: Hard negative mining

In the second approach, I mine for hard negatives. This means first building an SVM using a set of random negative features, and then building a second SVM, using only features from non-face imagesa that the first SVM detected as faces. In my implementation, I only mine for hard negative once. I implement the get_hard_negatives.m in much the same was as run_detector.m.

Linear SVM
Non-linear SVM

Approach 3: hard negatives and robust windows

In the third approach, I again mine for hard negatives a single time, but I turn down the parameters of run_detector. The first two approaches uses step_size = 4, scale_factor = 1.5, and start_scale = 3. In this test, I instead use step_size = 3, scale_factor = 1.25, and start_scale = 2. Though this takes much longer to run the improvement is significant.

Linear SVM
Non-linear SVM

The results from all three approaches indicate that non-linear SVMs significanly outperform linear SVMs, beacause the decision boundary can be arbitrarily complex. Comparing Approach 1 to Approach 2, it's clear than mining for hard negatives has relatively little impact on accuracy. In my tests, mining for hard negatives kept accuracy the same for the linear SVM and contributed a modest improvement for the non-linear SVM. A much greather improvement in accuracy, as shown in Approach 3, comes when the sliding window simply looks at more sections of the picture.