For this assignment, I followed the recommended pipeline, but tried out some different experiments to see if I could improve my results. I primarily used parameter tweaking to improve the accuracy of my face detector. As exepcted, my detector trained with a non-linear SVM outperformed my linear implementation, but only after finding a good comination of lambda/sigma values. In addition, both the linear and non-linear implementations were similar in the amount of false positives they returned at low confidence levels.
I started out by using vl_dsift to compute features on image crops, but this process was slow. I ended up using external mex-compiled code to compute HoG features. The external function was written by our very own psastras. Turning crops into features was simple process - for each row in our crops matrix, I computed 1 HoG feature and ultimately returned an N x D2 matrix where D2 = (#bins)*windowSize*windowSize. The window size, or number of subdivisions for the x and y directions in the image, was 6 for both. This yielded a 1x324 feature vector for each crop.
I implemented this very similarly to the run_detector function. I defined the max number of crops per scene internally as simply the ceil of the number of crops / num of non-face scenes. Then, for each scene, I first retrieve the detections at each image scale, followed by the bounding boxes for these detections and lastly the corresponding crops. I then used randperm to pick at most max_crops_per_scene crops from the current scene. After looping over all scenes, I concluded by taking only num_crops crops, as defined in the proj4.m script. I also used parfor to speed up the mining stage.
For both computational as well as performance purposes, I decided to only mine for hard negatives once. Mining once led to significant improvements in accuracy compared to simply randomly sampling negatives (for the non-linear svm in one case, the improvement was approximately 60% better average precision!). I tried mining multiple times with a non-linear SVM once as well, however, and precision seemed to drop below what it was in the 1 stage case. Perhaps my parameter decisions did not play well with multiple mining stages.
Linear SVM, 4000 negative crops, lambda=50 |
![]() |
Changing the lambda value here appeared to give a sizeable boost - AP was approximately 10% lower with the stock lambda value of 100. Mining 4000 negative crops also increased performance. Below are detection results for the class photos:
Class Easy - At least 0 confidence |
![]() |
Class Easy - At least 0.5 confidence |
![]() |
Class Hard - At least 0 confidence |
![]() |
Class Hard - At least 0.5 confidence |
![]() |
Non-Linear SVM, 4000 negative crops, lambda=6, sigma=3 |
![]() |
Changing the lambda and sigma values here improved performance. With a sigma value of 3 and a lambda value at the default value, 1, average precision was approximately 23% worse. Below are detection results for the class photos:
Class Easy - At least 0 confidence |
![]() |
Class Easy - At least 0.5 confidence |
![]() |
Class Hard - At least 0 confidence |
![]() |
Class Hard - At least 0.5 confidence |
![]() |
We can see from the results on the class photos that at the lowest confidence rate, both models returned many fals positives - however, as confidence increased, these false positives were quickly pruned. Moreover, the Non-linear SVM outperformed the Linear implementation by approximately 8%, a number that could be increased with better choices for lambda/sigma as well as possibly by implementing the cascade architecture.
Overall, increasing the number of negative crops to 4000 from 1000 produced a significant performance boost. For the non-linear SVM, increasing sigma to about 3 and lambda to 6 also produced a performance boost. Both SVMs tended to return a high number of false positives which did not penalize the AP values for either. I suspect playing around with parameters more in the future, as well as trying a more sophisticated mining strategy (implementing the cascade + more mining iterations) would produce improvements in results.
Shoutout to vmoussav and psastras for helping me reduce the runtimes of my algorithms, as well as edwallac for helping me better navigate the support code.