Image Quality Assessment and Data Scale
Sam Birch (sbirch)
May 18, 2011.Abstract
This project applied established algorithms for image quality assessment at a novel scale to empirically estimate the effect of training database size on image quality assessment. The features for the images are derived from Ke, et. al. [1] and reflect common compositional attributes of "good" photos (Ke et. al. attempt to distinguish between professional and amateur photographs). This work uses a new dataset derived from approximately a year's worth of Flickr data, using the most and least interesting photos from each day, amounting to about 1.4 million photographs.
Dataset
The dataset was collected in two stages. In a first pass the image meta-data was queried through Flickr's image search API. By searching for all the images in a given day and sorting up or down by "interestingness" I could skim the top and bottom 3000 images by their interestingness [1] (Flickr's API starts to return erroneous results after a couple thousand photos). The distribution of photos has a mean of approximately 1.3 million photos per day, so these extrema represent, on average, the top and bottom 0.23% of any given day. (Fig 1: full distribution of photos per day.) This is notably smaller than the margins Ke used, at 10%. After collecting the metadata, the photos were downloaded according to their URL in Flickr's medium size (at most 640px to a side).
Implementation
After collecting the photographs, I computed six features for each photograph:
- Average brightness (as in Ke, 4.5)
- Hue count (as in Ke, 4.3)
- Blur (as in Ke, 4.4; excepting the work of Tong et. al.)
- Contrast (as in Ke, 4.5)
- Edge bounding box area (as in Ke, 4.1)
- Edge difference from mean Laplacians (as in Ke, 4.1)

(Original.)
has the feature vector:
(0.284374668954535, 5, 0.958191720005291, 241, 8624, -0.0465446681658742)I split the dataset into 10,000 test images and left the remainder as potential training data. To test the performance, I collected the 1000 nearest neighbors by an unweighted Euclidean norm (I also tried normalizing it by the feature's variance, but it lowered 11-NN classification rates, so I went back to unweighted). These neighbors were then used to train a local support vector machine[2], which then classified the original point (lazy learning).
Results
Fig 2: performance versus training set size:
There's an upward trend in performance, which doesn't seem to plateau. The performance of this system is
not as good as that of Ke, et. al., but because the aim was to look at relative performance across scales,
this is not of much importance. Due to speed limitations the system was only tested with 256 examples,
which may contribute to noise in the performance numbers.
Raw results:
2048 (2K) training examples:
True positive: 162, true negative: 154, false negative: 94, false positive: 102
Precision: 0.6136, recall: 0.6328, accuracy: 0.6172, specificity: 0.6016
16384 (16K) training examples:
True positive: 164, true negative: 159, false negative: 92, false positive: 97
Precision: 0.6284, recall: 0.6406, accuracy: 0.6309, specificity: 0.6211
65536 (64K) training examples:
True positive: 161, true negative: 160, false negative: 95, false positive: 96
Precision: 0.6265, recall: 0.6289, accuracy: 0.6270, specificity: 0.6250
262144 (256K) training examples:
True positive: 161, true negative: 162, false negative: 95, false positive: 94
Precision: 0.6314, recall: 0.6289, accuracy: 0.6309, specificity: 0.6328
524288 (512K) training examples:
True positive: 169, true negative: 159, false negative: 87, false positive: 97
Precision: 0.6353, recall: 0.6602, accuracy: 0.6406, specificity: 0.6211
1048576 (1M) training examples:
True positive: 172, true negative: 156, false negative: 84, false positive: 100
Precision: 0.6324, recall: 0.6719, accuracy: 0.6406, specificity: 0.6094
References
Footnotes
- Flickr's "interestingness" metric is proprietary, but it seems to correlate with photos that we perceive as good.
- MATLAB's svmtrain function, using a Gaussian radial basis function as the kernel.
Acknowledgments
Special thanks to James Hays, for giving this project direction and entertaining many hours of questioning.