In this project, face detection is done using the following process (there are train and test images):

1. Get image features: Get SIFT features (128 dimensions) from positive image crops and negative images (of different scales).
2.Train SVM: From the positive features and negative features sets, train the SVM classifier to detect negative and positive results.
3. Retrain SVM: Without using the cascade architecture, hence after training the SVM the first time, the newly discovered classifier is used to detect "false positive" from the set of negative images. Any detections returned are called hard negatives. These hard negatives are then randomly subsampled and appended to the set of negative features and passed to the SVM to retrain the classifier.
4. Classify every test image in the test set with the trained SVM model.

After various trials, I find this set of parameters produce the best results with respect to speed, average precision and recall:

start_scale = num_scales - 4. This is hard coded in get_detections_all_scales.
step_size = 4
lambda = 10
sigma = 250 for RBF kernel.

This is the best result, from RBF kernel SVM and the parameter set above, with AP = 0.641.

1. Linear SVM: finding the best lambda:

So far, lambda = 10 yields the best result, with AP = 0.547.

Lambda = 10, AP = 0.547 Lambda = 100, AP = 0.484

Lambda = 500, AP = 0.524 Lambda = 1000, AP = 0.542

 

2. Linear SVM: finding the best threshold:

This is done with lambda = 10. The best result appears when I increase cascade column 2 by 5, and the AP is 0.574.

Cascase{:,2} += 2. AP = 0.413 Cascase{:,2} += 5. AP = 0.574

Cascase{:,2} += 8. AP = 0.498 Cascase{:,2} += 10. AP = 0.506

 

3. Linear SVM: Random negatives only versus appended mined hard negatives:

Training the SVM only once clearly does not give a good result as re-iterating SVM and appending mined hard negatives. Results are obtained from the best parameter set found.

 

Training with only random negatives. AP = 0.413 Retraining and append hard negatives once. AP = 0.574


4. Non-linear SVM: RBF kernel: Finding the best sigma:

Sigma = 250 gives the best result with AP = 0.627.

Sigma = 100. AP = 0.352 Sigma = 250. AP = 0.627

Sigma = 350. AP = 0.575 Sigma = 500. AP = 0.560

 

5. Non-linear SVM: RBF kernel: Random negatives only versus appended mined hard negatives:

Surprisingly, using random negatives only gives better results than mining hard negatives. This might be because the kernel is non-linear, which is more expressive than the linear one, hence there is no need to mine for hard negatives anymore. The best result is AP = 0.644.

Using random negatives, 1st try. AP = 0.644 Using random negatives, 2nd try. AP = 0.632

Mining hard negatives once (total stages = 2). AP = 0.627 Mining hard negatives twice (total stages = 3). AP = 0.615

5. Linear versus non-linear SVM:

Non-linear SVM yields a better result. Among non-linear SVM, RBF seems to give the best average precision. The non-linear SVM does not mine for hard negatives, but does lower the threshold by increasing cascade column 2 by 5.

Best result of linear SVM. Lambda = 10. AP = 0.574 Best result of non-linear SVM. AP = 0.641