![]() |
![]() |
This project is a simple version of the work done in the Arbelaez et al. paper Contour Detection and Hierarchical Image Segmentation. It uses texture information to improve the performance of boundary detection algorithms, by supressing so called "false positives" created where the image has a high gradient but where no actual boundary exists.
To measure texture features we must first generate a filter bank. The filter bank that I used for this project was a collection of oriented derivative of Gaussian and difference of Gaussian filters. I used not only the first derivative of the Gaussian, but also the second to create a more complex texture feature representation. The filter bank is show below at 3 scales and 8 orientations.
![]() |
The masks that I used to generate the histograms for the gradient calculatation are shown below at 3 scales and 8 orientations. In addition to the half-disc masks, I added an additional conenctric circle masks. This was intended to measure how different the texture near the center pixel in the masks differs from the texture in the surrounding pixels. The size of the inner radius was selected so that the area of the outer mask was eqaul to the area of the inner mask.
![]() |
Once we have the filter bank and masks we can calculate the texture gradient. The first step is to represent each pixel in the image by a feature vector, which is simply the response of that pixel and its surrounding area to each of the filters in the filter bank. Once we have the feature vectors we use k-means to group the pixels with similar responses to the filter bank. These groups of textons are used with the masks to generate pairs of histograms. Finally, we compare those histograms using the chi-squared distance metric and assign the result as the value of the gradient. In an attempt to increase the speed of this calculation I used and integral image table as described in the appendix of paper, but the speed increase was not significant.
The most important decision at this step is to determine how many texton groups we want to create during k-means for the histogram computation. Below is a table visualizing the texture gradient of an input image for several values of k (8, 16, 32, and 64).
Original | k=8 | k=16 | k=32 |
---|---|---|---|
![]() |
![]() |
![]() |
![]() |
Looking at the above images, you can see that even with 8 texton bins, the major edges are apparent. As you increase the number of bins more edges become apparent. Some of those edges however are not apparent in the original image--especially in the water of this particular image. This is because k-means does not take the spatial location of each pixel into account. If there are too many bins then k-means is forced to put pixels into separate bins even if they have similar feature vectors. I found that for the test set of ten images I was working with, k=16 was sufficient to capture the most salient edges and minimize false positives.
For my implementation of pb-lite I combined the luminance and color gradient calculation into a single step. First I create a color indexed image from the rbg channels. Choosing the number of colors in this image is equivalent to choosing the value of k in the texture gradient step. The first column of the table below contains indexed images with a different number of colors.
As you can see, the problem with this technique is that artifical edges are created between even very similar colors. The solution is to dither the indexed images, which smoothes out the transitions. This can be seen in the second column. The third column displayes the color gradient calculated from the indexed images. I have found that using 16 colors is sufficient for gradient calculation.
Indexed | Indexed and Dithered | Color Gradient | |
---|---|---|---|
2 Colors | ![]() |
![]() |
![]() |
4 Colors | ![]() |
![]() |
![]() |
8 Colors | ![]() |
![]() |
![]() |
16 Colors | ![]() |
![]() |
![]() |
Once we have calculated the texture and color gradients at multiple scales and orientations we must combine them into a single measure. The Arbelaez et al. paper suggests that a linear combination of the gradients across the scales is proportaional to the probability of there being a boundary. I simply took the average gradient accross each scale for a single orientation. To combine the gradients from multiple orientations, the paper suggests to simply take the maximum gradient at each pixel across all of the orientations. I found that taking the L2-norm of the gradients across all of the orientations provided slightly better results.
The final stage in the pb-lite process is the edge thinning or non-maximal supression. The strategy I used to accompish this was to simply preform an elementwise multiplication of the gradient with the canny baseline. The precision recall curve below shows the effectiveness of this approach.<\p>
![]() |
You can see that this effectively increased the accuracy of the canny edges by further supressing false positives. It however cannot increase recall as compared to the canny because it does not add in additional edge information. The results of the 10 test images next to the baseline images are shown below.
Sobel Baseline | Canny Baseline | pb-lite |
---|---|---|
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |