Oh La La~~~ Face detection

Charles Zhang








1. Intro

In a nutshell, we make windows of different scale and "slide" them over an image and see if the little crop under the window is "seen" as a face by the computer.

How we represent each crop of image is by using dense sift descriptor.

2.Tuning and tweeking

At first I tried using sift features with bin size 10 and step size 1 but forgetting to play with the lambda, I got a rather bad result of only 0.234 precision using random negatives as well as mined negatives. The result curve is shown below.






Oddly enough, by using only random negatives, we could get a slightly better result of 0.296 AP shown as below:






After that, I changed sift parameters to size 8 with step 8, which will only get one sift feature out of a 36 by 36 crop of face image. This will somehow slightly boost the performance to a precision of 0.322 without mined hard negatives as shown below.








Now we turn to nonlinear SVM. I only have time to play with several choices of free parameters, and the best result so far is one with sigma set to 250 and lambda set to 10. This will give me a precision of 0.376. Result curve is shown below.

3.More Training Data

With more positive training data, we can get better results. Below is the result curve that I got using 3000 positive training data, and 3000+ negative training data using with lambda set to 10 for both linear and nonlinear SVM and a sigma of 250 for nonlinear SVM.





Linear SVM





With nonlinear SVM, we can achieve a better one with an AP of 0.425.

Nonlinear SVM result curve









Back to top!

Tan "Charles" Zhang