Project 4: Face recognition with a sliding window

Name: Chen Xu
login: chenx

Algorithm Design

The basic flow is as follows:

Extract the SIFT features of the 36-by-36 postive training faces, with a bin size of 10.
Mine radom negtives training samples for the first training stage and use linear or non-linear SVM for training.
Recursively mine hard negtives and add them as negtive training examples for the following training stages.
Test the learned SVM using test set and evaluate the results.

For all the basic part, I use SIFT feature. And a wrapper function is written for mining hard negatives and detecting to speed up.

Results of Basic Part

Linear SVM

When using linear SVM for training, I compare the result of one stages process(no hard-negatives) with the result of recursively mining hard negatives(mine hard-negatives once). The precision results are given in Table 1. Both the process are given the same training parameters: lambda = 100, detecting_step_size = 2, detecting_scale_factor = 1.1. I also give out the Miss Rate and FPPW(false positive per window) log-log plot here. We can see there is a big improvement of average precision after adding the hard negatives, raising AP from 0.3626 to 0.7446. Both the recalls are nearly 1. And the miss_rate - FPPW plot also shows the trad-off curve is lower after adding the hard-negatives.

Without hard negatives(1 stage)		With hard negtives(2 stages)
AP = 0.3626		AP = 0.7446
Precision-Recall	Miss Rate - FPPW	Precision-Recall	Miss Rate - FPPW

Table 1: The effect of adding hard negatives into training with linear SVM.

Non-linear SVM

Tuning Lambda and Sigma

To train non-linear SVM, I use RBF kernel. And first I tune the value of training parameter lambda and sigma. Figure 1 shows the 3D bars of the effects of different lambdas and sigmas. The best average precision is achieved at lambda = 0.05 and sigma = 500, ap = 0.4373. So I choose these values for future training.

Figure 1: Left : Tuing lambda and sigma, taining with 1000 random negatives and 1000 hard negatives. Detecting step size is 4 and scale factor is 1.5.
Right: Changing detecting step size and scale factor.

Changing Step-size and Scale-factor

Using lambda = 0.05 and sigma = 500, I change the detecting step-size and scale-factor (not for mining hard-negatives)to improve the AP performance. The result is also shown in Figure 1. We can see that lowering detecting scale size is more effective to improve performance than lowering detecting step size, and the AP improvement is really significant when lowering the scale factor from 1.5 to 1.3, and is not that significant from 1.3 to 1.1. And the best average precision is achieved when step_size = 2, scale_factor = 1.1, ap = 0.8578.

Tuning minimun scale size

The minimun scale size indicates the minimun patch size which I use to extract features (the training patch size is 36-by-36), its size should also affect the AP performance. Figure 2 shows the effects of different minimun scale size. But from Fig.2 the minimun scale size (from 36 to 60) seems not affect the AP performance a lot. Best performance is achieved at min_scale_size = 36, but not that obvious. From tuition we can know that, 36 should be best because it's the patch size.

Stage Number	Average Precision
2	0.8578
3	0.8441
4	0.8477

Figure 2: Left: The effect of different minimun scale size, best performance is at size = 36. (step_size = 4, scale_factor = 1.3)
Right: effects of number of stages.

More Recursive Stages

As Viola-Jones 2001 used multiple recursive stages while Dalal-Triggs 2005 found no significant performance improvement when using more than one stages. Now I will exam how many recursive stages will reach the best performance. According to Figure 2 left, I also find there will be no big improvement of performance after using more than one mining hard negative stages. Results show that mining hard negatives once gives the best performance.

Final Results

My final results after tuning parameters is as follows (after cross-validation of 9 tests):

Average Precision(AP): mean: 0.8477 std: 0.0125

SVM: non-linear

lambda: 0.05

sigma: 500

step_size: 2

scale_factor: 1.1

min_scale_size: 50

total_stages: 2

Figure 3 shows one of the best results, both in precision-recall plot and miss_rate-FPPW plot, it reaches a high AP of 0.865 with a high recall. And when the miss rate is about 0.1, the false positive per window is about 1.

Figure 3: (Left) precision-recall plot, and (right) miss_rate-FPPW plot of one of the best results.

Discussion and Extra Points

I write my own HOG feature for extra points. I write both the matlab version and c version and wish to speed up the feature extraction, because the dimension is of HOG is really high (576 for 6x6 cells and 2x2 blocks) and it's really slow. For HOG, I only use linear SVM, and use step_size = 4 and scale_factor = 1.5 for detection.

Here's how I extract HOG features.

Compute gradient orientation and magnitude.
Weighted vote into spatial bins and orientation cells. Use gradient magnitude as weight and use 9 bins (0 to 180 degree), ignore the signs.
Contrast normalization using L1-norm and R-HOG.
Collect HOG feature over detection window.

First, I tuned the parameter lambda of linear SVM, Figure 4 left shows the lambda tuning. According to Fig. 4, I choose lambda = 10 for HOG.

Second, I investigate the the effects of different cell sizes and block sizes. Fig. 4 right shows the results. Can we can see that cell size = 4x4 and block size = 2x2 makes the good performance at 0.29. Cell size = 8x8 and block size = 3x3 makes really poor performance because these sizes are too large as the patch size is only 36x36. The 0.29 performance is acceptable because the detecting step size is large(4) and scale factor is large(1.5) and SVM is linear. The precision is comparable to SIFT if I tune the same parameter for SIFT. All when block size = 1x1 the precision is high, but the recall for HOG is a little low, only at about 0.5 ~ 0.6, maybe we can increase the threshold b0 for SVM to increase the recall.

Figure 4.Left: Tuning lambda, Right: Tuning cell size and block size.