Nov 16, 2011

Face detection with a sliding Window

Sungmin Lee

1. Abstract

In this assignment, I implemented face detection application using bag of words model inspired by Viola-Jones, 2001 paper. This assignment result shows that a sliding window performs reasonably well with non-linear SVM(84% in the best result) without using any other complicated techniques.

2. Algorithm

The main idea of this project is building a variety size of sliding windows to compare face image features vs non-face image features. To make a image descriptor, I used SIFT feature, with bin size 10, and step size 10. Here is a basic flow of this algorithm.

 1. Collect all the face images(6713 in total, 36x36 size for each)
 2. Convert all the crop images to feature by using SIFT, HOG or something else. (SIFT is used in this project)
 3. Train face vs non-face crops by using using SVM (either linear, or non-linear)
 4. Go back to step2 with your trained classifier to suppress false-positives
 5. Apply sliding windows to test sets(130 different images with face(s)) with the classifier from step3

<Algorithm 1. Basic Flow of sliding window model >

I first collected 6,713 cropped 36x36 faces which are provided by the Caltech Web Faces project and applied SIFT features to get image descriptor of faces. SIFT is somewhat slower than any other image descriptors such as HoG, but performs generally better.

Caltech Web Faces

< Fig 1. Caltech 10,000 web faces >

After that I also sampled 275 different non-face images, as a false data set. I collected those images not to be biased in random order not to be biased to some specific images, and I also applied the same SIFT module to the crops from those images.

Input:
non_face_img[] (images with non face)
svm_classifier[] (svm classifier from previous step)
patch_size (size of patch. 36 in this project)
linear (boolean. is this linear? or non-linear svm classifier?)
num_crops (how many crops will you get? 1000~2000 in this project)
Output:
crops[] (num_crops x feature size of hard negative features)
Functions:
get_detections_all_scales(img, svm_classifier, linear) - find faces from the image(could be either true or false positive)
detection2bbox(detection, patch_size) - convert detection as a patch image
cat(mat1, mat2) - concatenate mat1 with mat2
Algorithm:
crops ← []
confidences ← []
for i from 1 to size(img[]) by 1
   cur_detections ← get_detections_all_scales(img, svm_classfier, linear ..)
   [cur_crops, cur_confidences ← detection2bbox(detection, patch_size)
   crops ← cat(crops, cur_crops)
   confidences ← cat(confidences, cur_confidences)
Index ← sort(confidence, desc)
crops ← sort(crops, Index)

<Algorithm 2. Algorithm of hard negative function. >
(Note that I sorted the crops based on the confidence of each bboxes to eliminate the strong false-positives first)

Since we have positive and negative sets we can make a classifier by using SVM. I experimented with both linear and non-linear SVMs and non-linear SVM performed generally better. Even if you have a classifier from those positive and negative sets, you have very high chance to have a lot of false-positives. To suppress those false-positives, I iteratively eliminated hard negatives by resampling the non-face data. By hard negatives, it means the false-positives even with the SVM classifier from the previous iteration. The best parameter from experiments were that lambda=1.0 for linear SVM, and lambda=0.5 and sigma=256 for non-linear SVM.

SVM classifiers

< Fig 2. SVM classifiers (Left: 1st iteration(without hard negatives),Center: 2nd iteration, Right: 3rd iteration) >

I applied this retrieved classifer to the CMU+MIT test set, which contains 130 images with 511 faces. One of the interesting points of this project is that even if you use the same classifier, the result varies significantly based on your sliding window parameters(step size, scale factor, minimum size).

< Fig3, 4. Linear SVM results with same svm parameters >
(Left: Step=4, Scale=1.5 Start Scale=3, AP=45.7% / Right: Step=1, Scale=1.1, Start Scale=1, AP=69.3%)

< Fig5, 6. Non-Linear SVM results with same svm parameters >
(Left: Step=4, Scale=1.5 Start Scale=3, AP=46.6% / Right: Step=1, Scale=1.1, Start Scale=1, AP=83.7%)

As you could see the results above, smaller step size, scale factor, start scale yields more precise result, even though it becomes computationally very expensive, and surprisingly SVM parameters don't affect the results heavily even though you need a very careful calibration to get the "best" result.

3. Galleries

Here are some face-detection examples using this project. All these images are clickable to enlarge.

Test image with class mates1

< Fig 7. used non-linear svm, w/ confience 0.5. You can also see some false-positives >

Test image with class mates2

< Fig 8. used non-linear svm, w/ confience 1.0. no false-postives but some faces are down >

Test harder image with class mates1

< Fig 9. used non-linear svm, w/ confience 0.4. some false-positives are seen >

Here are some tested pictures from test set and the internet.

audrey hepburn Some hand drawing audrey hepburn