The goal of this assignment is to implement a face detector using a sliding window model. The basic pipeline involved in a sliding window face detector is to take patches from an image, represent each patch by one or more feature descriptors, then train a classifier based on a training data set. Once a classifier is trained, it can be applied to a test data set.
My implementation of a sliding window face detector uses SIFT descriptors to represent each image patch and a support vector machine (SVM) for classification. I tried using both linear and non-linear SVMs.
To improve accuracy of the detector, I attempted to use a "difficut" set of non-faces as negatives. I implemented this step of mining hard negatives by checking the confidence value (on a scale of 0-1) of each negative and only the first 1000 negatives with a confidence higher than 0.5. I attempted a higher threshold for the confidence, but this increased running time without providing any significant improvement.
To speed up the running time of the algorithm, I cached my SIFT descriptors for each image using MATLAB memoization.
Below is the complete pipeline for the algorithm as implemented.
For all reported results, I use a sliding window step size of 2 when calculating image patches.
For my linear SVM, I set lambda = 5.
For my nonlinear SVM, I set lambda = 1 and RBF sigma = 1.
My best precision score was 0.850 by a nonlinear SVM with 3000 positives, 3000 random negatives, and 3000 hard negatives gathered from 1 iteration of mining for hard negatives. More detailed results are below.
For all reported results, I give a starting scale, which determines how many times I scale down the image by a factor of two before testing image patches. I also report the number of initial random negatives, and the number of iterations of mining for hard negatives. Note that if there are initially n random negatives, each iteration of mining adds n more negatives. Also note that the number of positives is equal to the number of initial negatives.
Scale = 2, 5000 negatives, no mining
Scale = 1, 5000 negatives, no mining
Scale = 2, 5000 negatives, 1 mining iteration
Scale = 2, 5000 negatives, 3 mining iterations
Scale = 2, 3000 negatives, no mining
Scale = 2, 2500 negatives, 1 mining iteration
Scale = 2, 3000 negatives, 1 mining iteration
When I attempted to set the starting scale to 1 or used more than 3000 random negatives, the nonlinear SVM ran out of memory. On a computer with more memory, I expect that the nonlinear SVM could perform a bit better with more iterations of mining for hard negatives.
The nonlinear SVM consistently performed about 10% better than the linear SVM. This increase in performance was expected, since the nonlinear SVM can capture a more complex decision boundary at the expense of extra memory and running time.
Mining for hard negatives improved the performance by 1-3%, however most of this performance gain came from the fact that each iteration added more training data, rather than from the process of finding hard negatives. When I tried replacing the training data instead of augmenting it, the performance was not improved by mining for hard negatives.
Here are the five faces with the highest confidence value for the nonlinear SVM.
Here are the five false positives with the highest confidence value for the nonlinear SVM. There is no clear pattern in the most confident negatives. If there were, it would give better insight into how we could improve our algorithm.