Mining Hard Negatives
A very nice addition to the basic model is to use an iterative training process with multiple rounds of training.
The first round uses random negatives, because you have to start somewhere.
Subsequent rounds of training use the model to detect "hard negatives", those negative examples
which the detector mis-classifies. Each round, the new hard negatives are added to the cache of negative examples.
This process significantly stabilizes the final performance of the detector.
I found that it is better to mine fewer negatives more times than to mine more negatives fewer times.
It would thus appear that each hard negative does a good job of informing the detector.