CSCI1430 : Project 4 Face detection with a sliding window
The goal of this assignment is to build a face detector of cascade architecture and use sliding window model to test the detector on MIT+CMU database. The implementation is strictly based on support code. Here I will skip the algorithm description part and focus on only the choices we made
Linear V.S. Non-linear
Linear SVM favors to data-driven approaches while non-linear SVM models decision boundaries more carefully.
Linear SVM: ap=70.6%, positive_sample=1000, stage=3
Non-linear SVM: ap=83.8%, positive_sample=1000, stage=3
Choice of Lambda
As described in the document, lambda here is the regularization parameter which prevent our model from overfitting. I tested against different lambda values ranging from 0.001 to 1, the result shows there is no single lambda value can benefit the performance in every cases. When the total stage number set to 1, a small lambda will give a performance boost. After we set the total stage number to 3, the abundance of training data with a small penalty lead our model to overfitting problem. We decide leave the lambda value to 1 for non-linear SVM.
lambda=1.0 v.s. lambda=0.1
From the plot we can small lambda gives better fit to training data, which leads to a clearer division of data. However, it leads to overfitting when we use more an more training data.
Choice of sigma for Non-linear SVM
The aim of tuning simga for RBF kernel is to make sure our kernel matrix will have a big variance so data can be separated more easily. My implementation decision is as follow: for each iteration of training, we choose a vector of candidate sigma values ranging from 1 to 2000 with step size 10-50. For each sigma, a kernel matrix is constructed and its standard variance is evaluated. We then choose the sigma value that gives us the biggest stdvar in our real kernel construction step. Our experiments shows that the fuction std = f(sigma) has single maxima most of the cases. Moreover, sigma values close to 300 usually gives the largest variance of kernel.
Plot of standard variation of kernel matrix as a function of sigma
sgima=1 | lambda=100 | sigma=300 (click to see original size)
We can see sigma=300 gives a much better visualization of kernel matrix. Positive training samples at this value looks similar to each other while negative samples are pretty diverse.
Choice of threshold for confidence
Some different algorithms are experimented to tune the threshold: 1) Optimizing TPR 2) Optimizing FPR and 3) Optimizing TPR+FPR. It seems algo 1) gives our better recall rate while 2) and 3) gives a decent average precision.
Choice of sliding window parameters for detecting phase
Experiments show that huge boost of performance will be gained through tuning the sliding window parameters. Performance jumps from ~45% to ~80% after we set step size=2 and scale_factor=1.2. While there is trade off between efficiency and accuracy, it's worth exploring new algorithm that locate potential faces more quickly.
Results
Detector runs on normal class faces with confidence threshold = 0.5
Detector runs on non-frontal faces with confidence threshold = 0.5