CSCI1430 Project 2 : Boundary Detection

Michael Price (mprice)



A penguin.


The goal of the project is to determine boundaries of an images, given as a starting point the output of two other edge-detection algorithms (sobel and canny).


Algorithm


Our version of pb ('pb-lite') takes as input an image as well as a preliminary set of edges as computed by the baseline sobel and canny methods. These images together constitute an upper bound of possible edges detected, as our algorithm will weight this output at each pixel based on texture, brightness, and color gradient information. This intelligent weighting will suppress false positives in the original baseline output.

To compute a gradient on one of these channels, the channel is first discretized into a number of histogram bins. Then the gradient image can be found as follows:
Each pixel of the gradient image is a Chi-squared distance between two smaller histograms taken over opposite halves of a circular disc centered at that pixel. This distance is proportional to the likelihood of an edge at the given histrogram scale and orientation of the split of the disc. These gradient images are computed for several histogram-circle sizes and orientations, and ultimately a single gradient image is formed as the average of the these scale and orientation specific gradients.


Our half-circle masks at one scale/orientation.


Ultimately this process is performed on brightness, texture information, and color channels in YCbCr space. The resulting gradients are combined via a weighted average, and this output is used to weight the sobel/canny baseline response.

The above describes the basic algorithm, but a major question has been left unanswered: how do we quantize 'texture information' or divide it into discrete bins?
We do not quantize the texture information in a way that allows for meaningful distances to be computed between different samples, however, for our Chi-squared histogram-distance approach, this is not necessary. It is sufficient to define a set of discrete bins without any discernable inter-bin relationships. In other words, it's enough to assign an 'id' number to a set of seemingly related samples. We do this by obtaining a number of gradient responses at each pixel for different scales and orientations, and then clustering this data using K-means. To collect these responses, we convolve the image with a number of filters. We use a filter bank consisting of, for a number of orientations and scales, both first and second derivatives of gaussian filter, as well as a center-surround (difference of gaussian) filter. The responses to each of these filters are kept as separate dimensions of a high-dimensional vector for each pixel, and this data is clustered to give us our "texton id" numbers. From there, the process of computing a texture gradient is as outlined above.


Our filter bank at one scale/orientation (scaled up and filtered for visualization).
From left to right: first derivative of gaussian, second derivative of gaussian, center-surround



Visualization of the texton map of a penguin using the complete filter bank.



Visualization of the various gradients for a penguin.
Top: Brightness, Texton. Bottom: Cb, Cr.
We use a reduced bin resolution for Cb and Cr as compared to brightness.


Based on the observation that the Cb and Cr gradients will sometimes reinforce each other, but sometimes weaken each other - based, I would imagine, on the colors involved, I decided to merge the channels by taking the maximum from either channel for each pixel:

If not useful, it at least looks pretty cool...


The result is used directly as a gradient.


Visualization of the final probability distribution of edges for the penguin.



Results


I was able to successfully beat the baseline Canny threshold with the default project requirements (first derivative of gaussian filters, no color information). I used 16 orientations, 2 scales, and 32 texton clusters.
My results were significantly better with more data, though I only ran this experiment once (and unfortunately never recorded the area under the PR curve, just the graph).


Edges found by baseline Canny method.



Edges found by pblite, note the supression of false positives and awesome eye.



Baseline Canny.



pblite.



Results from running pblite with only first derivative of gaussian filters and no color information.
Console output reports an area of 0.45 under the PR curve (versus 0.44 for Canny - a modest win)


Results from running pblite with only first derivative of gaussian filters and no color information on 200 images from the BSDS200 data set. Here we unambiguously beat the baseline methods.



Left: Default (derivative of gaussian) filters. Right: Complete filter bank. (10 images)
For 10 images, it seems that the extra filters do worse...



Left: Default (derivative of gaussian) filters. Right: Complete filter bank. (200 images)
For 200 images, they don't seem to hurt, though it's unclear if they help.



Left: No color information. Right: Combined CbCr. (10 images)
Color information certainly seems to help.


Note: Area_PR generally tended to be reported as around 0.45 for most of these experiments, even when the curves appeared more or less different.