Project 2: pb-lite Boundary Detection

Hobart Reynolds (hcreynol)

October 10, 2011

Boundary or edge detection algorithms detect edges between different regions in images. Many boundary detection algorithms, such as the Sobel and Canny edge detection algorithms that are used as baselines in this project, look for intensity discontinuties in the image. However, a textured surface (such as a furry animal) can create such discontinuties and cause these algorithms to generate edges within these regions. The pb-lite algorithm (pb standing for probability of boundary) employs texture and color gradients to decrease the probability of false positive edges.

Algorithm

Step 1: A filter bank and a set of half-disc masks are constructed.

Step 1a: The filter bank is generated. First, a set of Gaussian filters are generated with different sigmas (standard deviations of the distribution). Next, these Gaussian filters are convolved with a Sobel filter. The filter bank is then populated with these convolved filters rotated to various orientations.

Below is a Sobel filter:

1	0	-1
2	0	-2
1	0	-1

Below is an example of a filter bank, using 16 orientations (uniformly spaced through 360 degrees) and 2 sigmas (sigmas = [1, 2]). The first and second rows of correspond to the sigma = 1, and the third and fourth rows correspond to sigma = 2. As sigma increases, the filter size increases. The lighter portions of the image show positive values, while the darker portions show negative values.

Step 1b: The set of half-disc masks is generated. First, a set of half-disc pairs are created with different radii. The total set of half-disc masks is then populated with these pairs rotated into various orientations, as done with the filter bank.

Below is an example of a set of half-disc masks, using the first 8 orientations that were used by the filter bank and 3 radii (radii = [5, 10, 20]). (Because we are using pairs of half-discs, only half of the orientations are required.) The first and second rows correspond to a radius of 5 pixels, the third and fourth rows correspond to a radius of 10, and the fifth and sixth rows correspond to a radius of 20.

Step 2: For a given image, a texture map is created. This is done by convolving the image with each filter of the filter bank. The result of each convolution is another image, where the value at each pixel location is the result of applying the filter at that pixel in the original image. So, the set of convolutions can be thought of as a set of feature vectors for each pixel. These feature vectors are then clustered (in this case, the k-means clustering algorithm was used). Each cluster will represent pixels with similar textures, called a texton. A texton map can then be generated, where the value at each pixel location is equal to the number of the texton to which the original pixel location belonged.

Below is an example of a grayscaled image and two texton maps. The first texton map shows the result of using k = 64 (64 texton clusters), while the second texton map shows the result of using k = 16. Each pixel is colored according to the texton it belongs to, and so two pixels with the same color belong to the same texton. However, there is no relationship between the different colors (as they relate to different textons).

Image:
k = 64:
k = 16:

Step 3: Using the texton map and grayscaled image, texture and brightness gradients are generated. Gradient vectors can be generated by filtering the image with the set of half-disc masks. For each pair of masks, the filtered images are compared. At each pixel location, the filtered images represent the distribution of the surrounding pixels in the original image. If the distributions of the masks are the same, then the distance between the pixel values is small, and the gradient (indicating a change in the distribution) is also small. If the distributions of the masks are different, then the distance is large and the gradient is large. By using a set of rotated half-disc masks with various sizes, gradients in a variety of directions can be detected. The presence of a gradient indicates the presence of an edge, because the distributions on each side of the pixel are different.

Step 3a: The texture gradients are generated by filtering the texton map with the set of half-disc masks. The large gradients indicate a change between texture clusters near that pixel, possibly indicating an edge at the pixel.

Step 3b: The brightness gradients are generated by filtering the grayscaled image with the set of half-disc masks. The large gradients indicate a change in the brightness (grayscale values) of the image near that pixel, possibly indicating an edge at that pixel.

Step 4: Using the newly calculated texture and brightness gradients (each pixel's texture gradient vector and brightness gradient vector can be concatenated to form a single gradient vector) the pb-value must be calculated. In this algorithm, I used either the mean value or maximum value of the concatenated gradient vector. This pb-value is proportional to the probability of that pixel being a boundary. By using a baseline edge detection algorithm (such as the Canny or Sobel edge detection algorithms) and then multiplying each pixel in the baseline by its pb-value, the true edges (which should have higher pb-values) will remain, while the false positives (which should have lower pb-values) will be reduced and possibly eliminated.

Parameters

In the results below, the output of the pb-lite algorithm is compared to a "ground truth" dataset of human annotators to see if the algorithm could detect what humans consider edges and ignore what humans consider not to be edges. The metrics used to evaluate the algorithm at the various parameters are the F-score and the precision-recall graph. If a set of supposed edges were found, precision would be the proportion of the supposed edges that were true edges, while recall would be the proportion of the total true edges that were in the supposed edge set. The F-score is the harmonic mean of precision and recall. A good algorithm would have a high precision and a high recall, and so it would have a higher F-score (the F-scores range from 0 to 1).

For the baselines: the Sobel images had an F-score of 0.45. The Canny images had an F-score of 0.58. This indicates that the Canny baseline was far better than the Sobel baseline. This can be visualized in the precision-recall graphs below.

There were numerous free parameters that could be taken into account: k (the number of clusters in the k-means algorithm, which would be the number of distinct textures in the texton map), the number of orientations for the filter bank, the range of sigmas (standard deviations) used for the Gaussian filters, the range of radii used for the half-disc masks, and the method for combining the gradients and baselines into the pb-value.

I first considered different methods of combining the gradients and baselines to make the final pb-value. I tried 4 methods: using the mean of the gradient vector and the sum of the Sobel and Canny images as the baseline, using the maximum of the gradient vector and the sum of the Sobel and Canny images as the baseline, using the mean of the gradient and the Canny image as the baseline, and using the maximum of the gradient and the Canny image as the baseline.

Means and maximums were chosen because they were good indicators of the pb-value. If the gradient vector had a high maximum, then at least one of the mask orientations indicated that there was a large gradient at that pixel, possibly indicating an edge. If the gradient vector had a high mean, then several of the mask orientations indicated that there was a large gradient at that pixel, also giving a strong indication of an edge. For the baselines, the Sobel and Canny images were combined at first as a baseline. However, because the Canny images are generally better than the Sobel images (the Canny algorithm is better at detecting true edges and suppressing false positives), the Canny images were also tried as the only baseline.

Using the Sobel and Canny baselines with the means of the gradients had an F-score of 0.59, which was only slightly better than the Canny baseline itself. Using the Sobel and Canny baselines with the maximums of the gradients had an F-score of 0.60, which was slightly better than that. Using the Canny baseline alone with the means and maximums had F-scores of 0.60 and 0.61 respectively. However, their precision-recall curves indicated much better performance. This indicated that using the Canny baseline alone was better than using the Canny and Sobel baselines. It also suggested that using the maximums of the gradient vectors was somewhat better than using the means. While I continued testing with both means and maximums, subsequent tests used only the Canny baseline, and disregarded the Sobel baseline.

(Sobel + Canny) * means	(Sobel + Canny) * maximums

Canny * means	Canny * maximums

The number of clusters was another parameter to consider. If k was too small, then too few clusters would exist and pixels that should be in different textons would be placed in the same one. This would cause some texture gradients to be missed, which could lead to edges being missed. However, if k was too big, then too many clusters would exist, and pixels that should be in the same texton would be in different ones. This would cause some pixels to have texture gradients when they shouldn't, which could lead to false positive edges appearing. I used k = 16, 32, 64, and 128 (k = 64 was the value used above). Using these values, the F-scores remained the same as above, except for the maximums method on k = 16, whose F-score decreased to 0.60. Decreasing k further would have probably worsened the performance, but it did not appear that increasing k would improve it. I left k = 64 for subsequent tests, as it appeared to be a decent value.

The number of orientations of the filter bank could also be considered. Increasing this number would increase the orientations of both the filter bank and the half-disc masks, which would increase the number of features in the feature vector for texton clustering and the number gradients. This would seem to improve the performance, while decreasing the number of orientations would worsen it by reducing the features and gradients with which to use. I used 8, 16, and 32 orientations (16 orientations was the value used above). In testing, changes in the number of orientations did not change the F-scores of the algorithm. I continued to use 16 orientations for subsequent tests.

NOTE: below, when I discuss ranges of sigmas and radii, I indicate a range by saying [a, b]. This means, for example, that the filter bank had a set of filters that were rotations of a filter with sigma = a, and another set of filters that were rotations of a filter with sigma = b.

The range of sigmas for the Gaussian filters could also be considered. Using larger sigmas would create broader filters that would take more surrounding pixels into account. Using more filters would increase the number of features in the texton feature vector, which would seem to improve the clustering (because examples with more features are generally easier to cluster than examples with fewer). I attempted to use multiple ranges for sigmas of the filter bank. I used [1], [1, 2], and [1, 2, 3]. (I used [1, 2] for the tests above.) In testing, the F-scores remained the same, and so it does not appear that using more sigmas or larger sigmas improved the algorithm. (It is possible that the values obtained by filtering with larger Gaussian filters were similar to the values obtained with smaller filters, and so the larger filters did not actually add useful information.) I continued using [1, 2] as my range of sigmas.

The range of radii for the half-disc masks could also be considered. Using larger radii would take more of the surrounding pixels into account, and so would seemingly improve the algorithm. Using smaller radii would take fewer pixels into account. I attempted to use multiple ranges of radii. I used [5, 10], [5, 10, 20], and [5, 10, 30]. (I used [5, 10, 20] in the tests above. I did not want to use larger radii because that drastically increased the computation time required for the algorithm.) These changes had an interesting effect on the performance of the algorithm. Using [5, 10], the algorithms performance degraded significantly. Both the means and maximums had F-scores of 0.60, but visually (see the precision-recall curves below), the performance dropped to much closer to the performance of the Canny algorithm. Also, while using [5, 10, 30] did not improve the F-score of the algorithm, it did change the precision-recall curve. Below, one can see that the precision (when recall is low) actually increases. This is probably due to using a larger half-disc. However, the precision was not improved when recall was higher. For this reason, I continued using [5, 10, 20] as my range of radii.

[5,10] Means	[5,10] Maximums

[5,10,30] Means	[5,10,30] Maximums

After testing the parameters, it appears that the most important factors were the method of combining the baselines and the gradients (with the Canny alone being a better baseline than the Canny and the Sobel, and the maximums of gradients being slightly better than the means of gradients), the value of k (k could not be too small), and the range of radii of the half-disc masks (having more radii that included larger radii improved performance). However, for k and the range of the radii, I found that changing them from what I had been using mostly decreased their F-scores, rather than increasing them. For this reason, I continued using the parameter values I had started with.

Results

Below are some images for which the Sobel, Canny, and pb-lite edge detection algorithms have been applied, to display some differences in the results of the various algorithms.

Grayscaled Image	Sobel Result

Canny Result	pb-lite Result

For this image, the Sobel result finds many of the bird's edges, but it does not completely outline it. The Canny result more completely outlines the bird, but it also finds numerous edges in the background of the image and the interior of the bird. (A human annotator would treat the bird as a single region, and would not outline the face or beak.) The pb-lite result improves the Canny result by reducing the background edges and the interior false positives (such as the face), while leaving the bird's outline largely intact.

Grayscaled Image	Sobel Result

Canny Result	pb-lite Result

Once again, the Sobel result finds many of the bird's edges, but the outline is incomplete. The Canny image more fully detects the edges, but it also detects numerous edges in the background (such as on the rocks). The pb-lite result suppresses these background edges, while keeping the bird's edges intact. Also, note that the Sobel result finds the false positive of the eye (which a human annotator would ignore, since it is inside the bird). The Canny result has largely reduced those interior false positive edges, and the pb-lite result reduces them even more.

Grayscaled Image	Sobel Result

Canny Result	pb-lite Result

Once again, the Sobel image provides an incomplete outline of the bird in the image. For this image, both the Canny and pb-lite results succeed in outlining the bird quite well.

Grayscaled Image	Sobel Result

Canny Result	pb-lite Result

However, there are cases in which the pb-lite result is worse than the Canny baseline. Here, the bird's back is in shadow. The Sobel result largely fails to detect those edges of the bird's back. The Canny result finds those edges, but they are quite faint. Unfortunately, the pb-lite result suppresses these faint edges even more, even eliminating some of them. This means that it treats them as false positives, when they are actually true edges. This image also displays how Sobel and Canny differ on false positives. Note that the Sobel result has edges around the water droplets on the bird (which a human annotator would ignore) and the Canny result suppresses those edges.