We first extracted dense SIFT features from all of the image crops, and then from each image during classifier evaluation, using the vl_dsift function. In all cases, we used a binsize of 10; for the crops, we took the first descriptor returned by vl_dsift. We used the 'fast' option for vl_dsift, which dramatically improved execution speed with no apparent effect on performance.
The baseline algorithm (SIFT + linear SVM + random negatives) gave rather poor results. However, we observed significant variability in this AP given the random subset of images chosen. This variability made it quite difficult to compare between the linear and nonlinear SVM variants of the algorithm. However, mining hard negatives did seem to improve both the nonlinear and linear SVM -- dramatically raising the AP from 0.406 to 0.620 in the case of the linear SVM. The nonlinear SVM was generally better than the linear SVM (although, again it varied with the randomly chosen negatives). For example, in the runs below the linear SVM gave an AP of 0.406 and the nonlinear an AP of 0.655 even without mining hard negatives. With mining hard negatives, the AP of the linear SVM was still always lower than the nonlinear SVM.
Each pair (random negative vs hard negative) was done in succession. In all cases, we used 1000 positive and 1000 negative examples to train the SVM.
![]() |
![]() |
SIFT + Linear SVM + Random Negative | SIFT + Linear SVM + Hard Negative |
![]() |
![]() |
SIFT + Nonlinear SVM + Random Negative | SIFT + Nonlinear SVM + Hard Negative |