The goal of this assignment is to implement a boundary detection algorithm that improves upon classical
edge detection algorithms like Sobel and Canny. These classical algorithms determine boundaries by
finding intensity discontinuities in an image. More sophisticated algorithms use sets of features that
incorporate additional information like texture and color. The algorithm in this assignment, pb-lite, is
a simplified version of the pb edge detection algorithm by
Arbelaez, Maire, Fowlkes, and Malik (TPAMI 2011).
The main steps in the pb algorithm are low-level feature extraction, multiscale cue combination with non-maximum suppression, and spectral clustering. My pb-lite algorithm extracts texture, brightness, and color features. The image textures are extracted by filtering with a set of derivative-of-Gaussian filters, then by running k-means to obtain a texton map. The brightness and color features are extracted by converting the image into CIELAB color space.
Next, gradients for each of the four feature channels are calculated using half-disc masks at varying scales. A simple chi-square distance is used for multiscale cue combination. The pb-lite algorithm does not do spectral clustering.
A more detailed description of the main steps of the algorithm follows.
In order to measure the texture properties of each image, I use a filter bank containing derivative-of-Gaussian filters. These filters are effective because they simultaneously find changes in brightness while smoothing out noise. I experimented with adding center-surround filters to capture a broader range of textures, but these filters did not improve results, so they were discarded. Shown below is the set of filters used for measuring results.A set of derivative-of-gaussian filters with 16 orientations and 2 scales. The top row uses a Gaussian with sigma = 1, the bottom row uses sigma = 2.
In order to efficiently compute gradients over different orientations and scales, I use a set of half-disc masks. For each feature, the image is split up into bins, then each bin is convolved with each pair of masks. The chi-square distance is then computed between the convolutions with the left and right masks. Below is the set of masks used for measuring results.A set of half-disc masks with 8 pairs of orientations and 3 scales. The top row has radius 5, the middle row radius 10, and the bottom row radius 20.
The first step in texton map generation is to filter an image with each filter in the filter bank. These responses give a feature vector encoding texture at each pixel. The next step is to simplify this representation into a more usable one. This is done by running k-means over the set of response vectors. k-means clusters the similar responses together into k groups, or textons. k was set to 64 for measuring results.Visualizations of texton maps from two of the test images. There are k = 64 textons, each of which is assigned a unique color.
A unified gradient calculation is used over each of the four feature channels. For each channel, responses are first grouped into bins. For the CIELAB color channels, I first iterate over all test images to find the min and max values for each channel, then evenly split each channel into 32 bins. For the other channels, the boundaries are known so binning is simple.
The gradient is calculated by comparing responses to left and right half-disc pairs at each pixel. The similarly between these distributions corresponds to the magnitude of the gradient at that pixel. The more similar the distributions, the smaller the gradient. This calculation is done by finding the elements in a given bin, convolving these counts with the left and right masks, then using chi-square distance to compare the left and right responses. The final chi-square metric is half of the sum over all bins of this calculation.Visualizations of mean responses in each channel for a penguin image.
|Texton||L (lightness)||A (color channel 1)||B (color channel 2)|
Running gradient calcutations for each of the four feature channels gives four sets responses per pixel. These response vectors are all simply averaged together to get a single number representing gradient at each pixel. The final pb measure is calculated by element-wise multiplying the baseline (Canny) result with these mean responses.
The pb-lite algorithm performs better than Canny, even when only texton and brightness gradients are used. The addition of CIELAB color information significantly also improves the results. Qualitatively, pb-lite improves upon Canny by suppressing of texture edges. This is particularly noticeable in the test image with two rhinos. The algorithm likely could be improved further by using a more sophisticated method of combining cues at different scales and orientations.
The quantitative results and test image outputs are reported below.
F-score comparison between Sobel, Canny, and pb-lite on 10-image test set.
|Baseline Sobel||Baseline Canny||pb-lite without color||pb-lite with color|
|Area under PR curve||0.21||0.50||0.51||0.54|
PR curves for 10-image test set. Left curve is pb-lite without color gradients, right curve is with color gradients.
F-score comparison between Sobel, Canny, and pb-lite on 200-image test set.
|Baseline Sobel||Baseline Canny||pb-lite with color|
|Area under PR curve||0.28||0.53||0.55|
PR curves for 200-image test set. Uses pb-lite with color gradients.
|Original||Sobel||Canny||pb-lite with color gradients|