Project 4: Face Detection with a sliding window

CS 143: Introduction to Computer Vision

Justin Ardini (jardini)

Overview

The goal of this assignment is to implement a face detector using a sliding window model. The basic pipeline involved in a sliding window face detector is to take patches from an image, represent each patch by one or more feature descriptors, then train a classifier based on a training data set. Once a classifier is trained, it can be applied to a test data set.

Implementation

My implementation of a sliding window face detector uses SIFT descriptors to represent each image patch and a support vector machine (SVM) for classification. I tried using both linear and non-linear SVMs.

To improve accuracy of the detector, I attempted to use a "difficut" set of non-faces as negatives. I implemented this step of mining hard negatives by checking the confidence value (on a scale of 0-1) of each negative and only the first 1000 negatives with a confidence higher than 0.5. I attempted a higher threshold for the confidence, but this increased running time without providing any significant improvement.

To speed up the running time of the algorithm, I cached my SIFT descriptors for each image using MATLAB memoization.

Below is the complete pipeline for the algorithm as implemented.

Load a set of faces and a set of non-faces and extract SIFT descriptors from each image.
Run a sliding window over each image in the training set at each scale to train a SVM to classify faces.
Mine for hard negatives by rerunning the classifier over the negative (non-face) training set and add in negatives with a high (over 0.5) confidence. Use these to retrain the classifier. Repeat this step n times.
Load a test data set and use the trained SVM to classify each patch as either a face or non-face.
Create a precision-recall curve to determine performance.

Results

Common Parameters

For all reported results, I use a sliding window step size of 2 when calculating image patches.
For my linear SVM, I set lambda = 5.
For my nonlinear SVM, I set lambda = 1 and RBF sigma = 1.

Precision-Recall

My best precision score was 0.850 by a nonlinear SVM with 3000 positives, 3000 random negatives, and 3000 hard negatives gathered from 1 iteration of mining for hard negatives. More detailed results are below.

For all reported results, I give a starting scale, which determines how many times I scale down the image by a factor of two before testing image patches. I also report the number of initial random negatives, and the number of iterations of mining for hard negatives. Note that if there are initially n random negatives, each iteration of mining adds n more negatives. Also note that the number of positives is equal to the number of initial negatives.

Linear SVM

Scale = 2, 5000 negatives, no mining

Scale = 1, 5000 negatives, no mining

Scale = 2, 5000 negatives, 1 mining iteration

Scale = 2, 5000 negatives, 3 mining iterations

Nonlinear SVM

Scale = 2, 3000 negatives, no mining

Scale = 2, 2500 negatives, 1 mining iteration

Scale = 2, 3000 negatives, 1 mining iteration

When I attempted to set the starting scale to 1 or used more than 3000 random negatives, the nonlinear SVM ran out of memory. On a computer with more memory, I expect that the nonlinear SVM could perform a bit better with more iterations of mining for hard negatives.

Discussion

The nonlinear SVM consistently performed about 10% better than the linear SVM. This increase in performance was expected, since the nonlinear SVM can capture a more complex decision boundary at the expense of extra memory and running time.

Mining for hard negatives improved the performance by 1-3%, however most of this performance gain came from the fact that each iteration added more training data, rather than from the process of finding hard negatives. When I tried replacing the training data instead of augmenting it, the performance was not improved by mining for hard negatives.

Here are the five faces with the highest confidence value for the nonlinear SVM.

Here are the five false positives with the highest confidence value for the nonlinear SVM. There is no clear pattern in the most confident negatives. If there were, it would give better insight into how we could improve our algorithm.