Computer Vision, Project 4: Face Detection with a Sliding Window

Bryce Richards

Project Description The sliding window method is a natural way to detect an object in images. It involves examining an image patch-by-patch for the object. In this project we implement a sliding window face detector, in the spirit of

Algorithm Design

The algorithm consists of five main steps. First, we convert positive training data (36x36 crops of faces) into the image features we will use for face detection. Second, we do the same for negative training data, by extracting 36x36 crops from images with no faces and converting those crops to features. Third, we train an SVM (either linear or non-linear) on those features. Fourth, we use this initial SVM to search for "hard negatives" -- crops from images with no faces that our detector currently identifies as faces. Retraining the SVM on these hard negatives should in theory improve performance. Finally, we run our face detector on test data, and visualize/quantify its accuracy. Below we describe the implementations of these steps in more detail.

Steps 1 and 2: Extract Features We converted each 36x36 crop into a single SIFT feature using the vl_dsift funtion. We chose a bin size of 10.

Step 3: Train Initial SVM We found that the parameter values for the linear SVM had a slight effect on performance, while those for the non-linear SVM had a huge effect. We found that the linear SVM performed best with a lambda value of 100. We found that the non-linear SVM performed best with a lambda value of 0.1 and a sigma value of 1000. We ran the linear SVM with 3000 positive and negative training features, and the non-linear SVM with 2000 positive and negative training features.

Step 4: Mine for Hard Negatives Using our SVM trained on random negative crops, we collected an equal number of new negative crops -- faceless crops that our SVM identified as faces. These hard negatives should, in theory, push our SVM to be able to distinguish between faces and nonfaces that are "facelike" (symmetrical, oval-shaped, etc.) in appearance. To speed up this phase, instead of extracting a SIFT feature from many individual crops, we extracted SIFT features from entire images. We then ran the face detector directly on the SIFT features.

Step 5: Test Face Detector The last step of the algorithm is simple: apply the face detector to a test set of images, graph the precision-recall curve, and report the average precision. At this stage, the starter code also provides a way to visualize the face detections.

Results and Extra Credit Our implementation had a lackluster performance. The mining for hard negatives didn't seem to help much. And the nonlinear SVM failed to outperform the linear SVM. Worst of all, no combination of number of mining stages, SVM parameters, and linear/non-linear SVM outperformed about 0.4 precision. Below are the precision-recall plots for various combinations of parameters.

Linear SVM, lambda = 100, zero rounds of mining hard negatives

Linear, lambda = 100, one round of mining. Note the decrease in performance.

Non-linear, lambda = 1000, sigma = 0.1, zero rounds of mining

Non-linear, lambda = 1000, sigma = 0.1, one round of mining

Non-linear, lambda = 1000, sigma = 0.1, three rounds of hard mining. Note the significant drop in performance.

HoG Descriptor: In addition to using SIFT features from the vl_dsift function, we also implemented a HoG descriptor in MATLAB. We ran our algorithm with both a linear and non-linear SVM using HoG features instead of SIFT features, for a few combinations of HoG parameters (number of angle bins, number of boxes). We found that the optimal number of boxes was 16 and the optimal number of bins was 8. However, we did not have much time to tune the SVM parameters, which probably lowered the precision by a significant amount. Below are some precision-recall curves.

Linear, number of boxes = 16 (4x4 grid), number of bins = 8, lambda = 100

Nonlinear, number of boxes = 16, number of bins = 8, lambda = 0.1 lambda = 100

Conclusion There were some major problems with our project. For instance, mining for hard negatives didn't improve results at all, and actually seemed to make the detector less accurate. We also needed to spend more time tuning parameters, but running the algorithm took a long time, and we had some things come up. Also, we swear to God that at one point we had this running with precision around 0.75, but that we changed some part of the code and haven't been able to recover that performance. There's a bug hiding in there somewhere...