Project 4: Face detection with a sliding window

CS 143: Introduction to Computer Vision

Hang Su

Nov. 14th, 2011
Figure1 Face detector tested on class photo Figure2 Video demo: Vehicle detection using HOG descriptor with other clues

1 Algorithm

1.1 Overview

Object detection using sliding window is straightforward. The key factors involved are: (1) to choose an efficient descriptor, (2) to select and train a proper classifier.

The descriptor used in this project is dense SIFT descriptor and HOG descriptor [1]. The latter doesn't appear often in literature of face detection, but turns out to be quite reliable compared with SIFT descriptor.

The classifier used in this project is a cascade linear or non-linear SVM. Linear SVM works amazingly well with HOG descriptor, while non-linear SVM brings quite a boost when using SIFT descriptor.

1.2 HOG descriptor

I've implemented HOG descriptor both in matlab and C, code can be found here. They are implemented strictly as in [1], except for the first "Normalize gamma & colour" step (see Figure3). The C implentation can generate a 1536-d descriptor for 36*36 patch in ~0.4 ms using following tuned parameters: blockSize=2, cellSize=4, stride=1, nBins=6. This is several times faster then other implementations like this.

Figure3 HOG descriptor [1]

Because of the existence of overlapping, calculating HOG in the whole image ( using get_img_feat() function ) can make the whole process another 10 times faster. However there are some tricky issues implementing this. First, the trilinear interpolation shouldn't be used because interpolation between neighboring blocks could possibly get different results from the ones calculated individually. Second, the step size of sliding windows should be compatible with cellSize(I use 4 for both of them for simplicity). Even after dealing these carefully, the cropped descriptors are still slightly different from the descriptors calculated directly since the image gradients on boundary are rounded in different scales; nevertheless, experiments show little influence on final result.

1.3 Cascade detector

Inspired by [2], a cascade detector is employed to achieve higher precision. Unlike [2], the purpose here is not for higher speed, because the feature used in each stage is the same. In each stage, (1) current detector runs on negative samples to get hard negatives, (2) a new classifier is trained, and (3) the threshold of the classifier is increased to achieve a target false negative like 99%. More stages doesn't necessarily increase performance, e.g. 5 stages is already sufficient for non-linear SVM when using HOG descriptor.

2 Result

The best result in my project is got when using HOG descriptor with non-linear SVM. Figure 4 gives its ROC cruve. Figure1 shows the detection result on class photo using this detector. The comparison with SIFT is not fair because its parameters are not well tuned as for HOG.

Figure4 HOG detector using 5-stage cascade linear SVM Figure5 vl_dsift using 10-stage cascade non-linear SVM with RBF kernel

Below are more test results:

Figure6 More test results. (threshold are selected quite high to suppress PNs)

3 Another Application: Vehicle detection

I use MIT car dataset as positive training data. It provides 516 cropped car patches. They are all scaled to 128*64 with some boundary(Figure7). Negative samples are ramdomly selected from another dataset containing 1000 pictures. For simplicity, no mining for hard negatives is used. In order to show the real-time capability of this framework, I test the detector on video(Figure2) captured by PKU POSS lab.

Figure7 MIT car dataset Figure8 Framework for car detection

To eliminate image crops for the evaluation of HOG descriptor, some intuitive clues are used for Hypothesis Generation, e.g. dark shadows below vehicles, geometry restriction, etc. Passing vehicles are detected using pyramid LK optical flow. Figure8 shows a general framework for this.

This program is coded using C++ with the help of OpenCV for some useful functionality: floodfill, pyramid LK optical flow, video streamming, etc.


[1] N. Dalal, and B. Triggs. Histograms of Oriented Gradients for Human Detection. In Proc. CVPR, 2005.

[2] P. Viola and M. Jones. Robust Real-time Object Detection. In IJCV, 2001.

Last update: Nov. 14th, 2011