CSCI 1430 Project 2
Texture Representation: The texture measurement at a pixel in the image is represented by a vector of filter responses centered at that pixel. The input image is filtered separately with each filter of a pre-generated filter bank, composed of various rotations and scales of a Sobel filter convolved with a Gaussian filter, as seen below (brightnesss increased by .5 for visibility).
Once the texture vector for each pixel has been calculated, k-means is applied to group the pixel textures into k image texture clusters, referred to as textons. In this way, each pixel in the image may be assigned to one of k textons, so that gradient calculations may be performed on the image.
To calculate gradients of texture or brightness in the image, this algorithm makes use of a set of "half-disk masks." For each orientation and scale of mask there is a pair of mirror-image disks, which will allow for a comparison of the mask area across the line of symmetry for each pair. Each pair of half-disk masks is applied to the texture and brightness channels, and the values that fall under each half of the disks are formed into histograms, which are then compared using the chi-squared distance method (for histograms g and h, one for each side of each mask pair, and k bins in each histogram): chi_sqr(g,h)=.5*sumi=1:K( (gi-hi)2 / (gi+hi) ) However, for efficiency, this particular algorithm changes the order of operations, such that each image is first separated into k matrices, one for each bin. These are essentially membership matrices: if a pixel at location x,y in the image belongs to bin b, then there will be a 1 in position x,y in the b'th matrix and a 0 at x,y in all k-1 other matrices. This way, the histogram may be precomputed and reused for each gradient calculation. Since the half disk masks have values of 1 or 0 as well, applying the filter will essentially count the number of hits for that bin on one side and then on the other, and the resulting matrices can be fed into the chi-squared distance calculation. Thus, for each mask pair, a gradient
matrix is produced, representing the difference in channel values
between each side of the mask pair. These gradient values can be
combined in many ways to produce an overall matrix of gradient
measurements across all masks and all included image channels, a
matrix that essentially represents the "probability of a boundary"
existing at each of the pixels in the image. This matrix is then
combined with the baseline Sobel and Canny results to produce the
final pb-Lite response.
The left figure shows statistics for several baseline edge detection algorithms and the pB-Lite algorithm when benchmarked against the BSDS500 Dataset. The pb-Lite output was produced by taking the squared means of the sum of the texture and brightness gradients at each pixel and multiplying them element-wise with the sum of the Canny and Sobel results. The baseline Sobel produced an F-score of .45, the baseline Canny .58, and pb-Lite yielded a score of .58, but it clearly appears from the figure that pb-Lite outperforms both baseline algorithms. The shorter length of the pb-Lite line is expected, as edges are pushed farther to the extremes of existence or non-existence by the pB-Lite algorithm. A pB-Lite implementation using the raw means of the texture and brightness gradients multiplied element-wise with the sum of the Sobel and Gaussian results yielded an F-score of .59, but the results were less distinct, as can be seen in the figure to the right.
|