Facial Recognition

Overview

This project was concerned with facial detection in images. We utilized a sliding-window approach to detect these faces. The classification method that we used was an SVM (both linear and non-linear).

Broadly, a sliding-window approach works by taking many many crops from each test image by densely sampling the image at many different scales. Each of these crops is then transformed into a representation and classified with a confidence as to whether it is a face or not. For our implementation, we focused on three areas:

1. We came up with a better representation than using raw image patches. As in other areas of computer vision, raw image patches are a very poor representation. For my project, I decided to utilize the SIFT descriptor instead of raw image patches.

2. We came up with a better way to get our negative training data than randomly sampling non-face images. This consisted of mining hard negatives from the non-face training images. What this means is iteratively running the classifier (linear or non-linear svm) on the non-face training images and using what the classifier thinks are faces as negative data in the next iteration of classifier training.

3. We experimented with both linear and non linear support vector machines. This included trying both of these SVMs and experimentally tuning the parameters.

Baseline

The baseline stencil code utilized the following parameters:

Linear SVM with lambda of 100
Representation: Raw image patches
Negative Training Data: 1000 randomly sampled negative training examples
Positive Training Data: 1000 randomly sampled positive training examples

This baseline resulted in an average precision of 5.7%, with the recall precision curve pictured below:

Representation Improvement

As previously mentioned, I improved on the baseline representation of raw image patches by utilizing the SIFT descriptor. Additionally, I changed the implementation in order to speed up the process by performing a dense SIFT descriptor operation over entire images rather than performing it once for each crop. The move to using the SIFT descriptor resulted in a significant performance increase (from 5.7% to 34.2%).

Linear SVM with lambda of 100
Representation: SIFT descriptor
Negative Training Data: 4000 randomly sampled negative training examples
Positive Training Data: 6700 positive training examples

This change in representation resulted in an increase of average precision to 34.2%. The precision-recall curve is pictured below:

Linear SVM Parameter Tuning

In tuning the linear svm parameters (with lambda being the only free parameter), I saw very little difference in performance. More quantitatively, when keeping all other variables constant, varying the lambda input to the linear svm resulted in small changes in average precision.

Lambda	Average precision
100	34.2%
10	34.5%
1	33.2%
0.1	34.15%

Linear SVM plus Mining of Hard Negativs

Similarly, when using a linear svm, adding more negative training examples via mining hard negatives did not result in an increase in average precision. It actually resulted in a decrease from 33.2% to 31.8% in one example. Parameters:

Linear SVM with lambda of 10
Representation: SIFT descriptor
Negative Training Data: 1000 randomly sampled negative training examples plus 2000 hard negatives mined over two stages
Positive Training Data: 6700 positive training examples

Nonlinear SVM

Switching to a nonlinear svm did not immediately show a performance improvement. Indeed, until I was able to set the sigma correctly for the rbf kernel that I utilized, the nonlinear svm resulted in horrible performance (on the order of 3.5% average precision). In order to tune the nonlinear svm, I tried a variety of lambda values and sigma values for the rbf kernel, shown in the tables below.

When varying lambda, I left the sigma up to the kernel.m function provided (with it usually hovering between 300 and 350). When varying the sigma explicitly, I kept the lambda at 10. For both of these tests, I used an rbf kernel with 1000 randomly sampled positive and negative training examples.

Lambda	Average precision
100	30.5%
10	30.1%
1	34.4%
0.1	29.2%

Sigma	Average precision
300	32.8%
325	32.6%
350	34.2%
400	33.2%

As can be clearly seen here, the initial switch to a nonlinear svm, even after tuning free parameters, did not result in a significant difference in performance from the linear svm. This is probably due to the small amount of training data that I used in these specific tests. Indeed, we will see the best overall performance that I achieve with a non-linear svm coupled with hard negative mining.

Non-Linear SVM plus Mining of Hard Negativs

In this regime, I combined the mining of hard negatives with the use of a tuned non-linear svm. This combination seemed to be the best. This is due to the benefit that a non-linear svm derives from a lot of training data and its ability to fit the data better than a linear svm. The best parameters that I came up with are as follows:

NonLinear SVM with rbf kernel, lambda of 10, and sigma of 350
Representation: SIFT descriptor
Negative Training Data: 1000 randomly sampled negative training examples plus 2000 hard negatives mined over two stages
Positive Training Data: 6700 positive training examples

Best Results

Once I determined the best parameters for the linear and non-linear svm situations, I used these parameters while decrementing the start scale in order to get my best results. I decremented the start scale from 3 to 1, which slowed the speed of my detector significantly, but also hugely boosted my average precision.

Linear SVM with lambda of 10
Representation: SIFT descriptor
Negative Training Data: 4000 randomly sampled negative training examples
Positive Training Data: 6700 positive training examples
Start scale: 1

NonLinear SVM with rbf kernel, lambda of 10, and sigma of 350
Representation: SIFT descriptor
Negative Training Data: 1000 randomly sampled negative training examples plus 2000 hard negatives mined over two stages
Positive Training Data: 6700 positive training examples

Conclusion

Overall, the highest average precision that I achieved was 79.2%. By far the biggest increases that I saw were the change from raw image patches to the SIFT feature for the representation as well as the decrementing of the start scale. The rest of the performance increase from the baseline were due to the switch to a nonlinear svm along with mining of hard negatives, though introducing a non-linear svm without mining hard negatives or mining hard negatives while still using a linear svm did not result in a significant change in performance.