Project 2: pb-lite boundary detection

Nathan Malkin, October 2011
CS 143: Computer Vision

This project as a Twitter post

We use texture and brightness gradients to enhance edge detection. #bieber

Bieber's edges

The baseline: classical edge detection

There exists a multitude of different edge detection algorithms. Among the better-known ones (and the ones used as the baseline for this project) are the Sobel and Canny edge detection algorithms.

Sobel edge detection

The Sobel operator calculates the gradient of the image at each point: the difference between light and dark pixels. It is primarily local in scope, meaning that it relies only on the information of the pixels immediately surrounding the current pixel.

Canny edge detection

The Canny algorithm has multiple stages but, at its core, consists of filtering and edge tracing.

The filter applied to the image is a Gaussian filter. It reduces noise and provides smoothing.

The next major feature of Canny edge detection is hysteresis. The algorithm "traces" edges by making sure that the gradients are in a consistent orientation.

The result is an improvement on the Sobel methodi: edge detection is no longer just local, instead incorporating some reasoning about the continuity of (potential) edges.

The problem

Though the level of edge detection provided by the Canny method is quite good, it still does not match the level of edge detection achieved by human analysts. The algorithms are good at identifying prominent edges, but these can occur not only on the edge of the objects of interest, but also within them: (perceived) edges caused by the texture of the objects. Of these, only the former is (generally) of interest to us.

What if, during our edge detection process, we could ignore all the edges that occur as part of a texture, leaving only those that are actually object boundaries? This is our aim with this project.

pb-lite edge detection

This project attempts to solve the problems described above by augmenting the edge detection provided by the Canny algorithm with information about textures and brightness. This method is based on that described in the paper by Arebelaize et al. (2011).

An overview of the pipeline

Using masks

So we've decided to use texture information to help us with edge detection. How exactly will that work?

Recall the general principle behind the Sobel (and, to some extent, Canny) edge detection. Given a pixel, we look at the pixels surrounding it. If we notice that there's a sharp change (large gradient) in the intensity (brightness) going on, we interpret that as a boundary between objects and, thus, an edge.

We will use texture data in a similar manner. For each pixel, we will compare the texture on one side with the texture on the other. If the textures are similar (even if we observe a difference in brightness), we guess that this isn't an edge. But if the textures are different, then, we suspect, this really is an edge.

To be able to compare the textures (and brightness) at potential pixel boundaries, we therefore generate a set of masks. Masks come in pairs (displayed sequentially in the image below): if one masks the left side its pair masks the right side.

To account for pixel boundaries happening on different scales, we use several (three) different scales of masks. And since pixel boundaries can occur in any direction, we use multiple (eight) orientations of our masks.

(The masks used in this project, and in the Arebelaize et al. paper, are all half-discs.)

Masks

The filter bank

We've chosen a pixel (we'll choose many more as we go on) and masked off its sides. How do we go about finding the texture around it?

To aggregate regional information about both texture and brightness, we use derivative of Gaussian filters. We create these by convolving a simple Sobel filter and a Gaussian kernel. Like we rotated the masks, so we rotate our filters, to account for different possible orientations.

The filter bank

Applying the filters

After we have the filters (as described above), we apply them to the image. (i.e., we filter the image with each of the filters we have)

This creates D versions of the image; or, equivalently, D responses at each pixel (where D is the number of filters; in our case, this is 32: the number of scales times the number of orientations). These 32 values represent each pixel's texture information. However, this is a lot of high-dimensional data and, thus, hard to deal with.

We need to bring the texture information down to a more manageable space.

K-means

To do this, we use the k-means algorithm from machine learning. It treats each pixel in the image as a point in 32-dimensional space (where each dimension is the response to a particular filter) and organizes them into K discrete clusters. (K is a free parameter; in our project, we use K=64.)

Now, rather than having some 32-dimensional value representing the texture of a pixel, it is now represented by one value (between 1 and 64): the number of the cluster (aka the texton id).

Computing texture gradient

This is when the masks we created at the beginning come in. Now that we have the texton id for each pixel, we can compare their distributions using the principles discussed above. If the distributions are different, we suspect an edge.

To compare the distributions, we construct their histograms and use the chi-square measure:

chi_sqr(g,h)=.5*sum_i=1:K( (g_i-h_i)² / (g_i+h_i) ), where g and h are histograms with the same binning scheme, and i indexes through these bins. Note that the numerator of this expression is simply the sum of squared difference between histogram elements.

Algorithm note
To avoid iteration over pixels, we speed up the histogram generation and the chi-square computation by masking the image for each histogram bin (setting pixels equal to the bin value to 1 and all others to 0) and convolving the left and right masks with this image; then updating the chi-square measure with this information.

This procedure generates a 2D matrix of gradient values. When it is repeated for all scales and orientations, a 3D matrix of gradient value is produced.

Computing brightness gradient

Using the same procedure that generated the texture gradient, we can compute the brightness g

This can then be combined with the texture gradient to generate the final, "pb-lite" image.

Putting the parts together

After the gradients have been computed, the last step is to put together all this information into the "pb-lite" image.

The method used in this project was finding the mean texture and mean brightness value at each pixel, then multiplying them by each other and by the value from the Canny edge detection method. The resulting value is normalized to be between 0 and 1.

The effect of this is augmenting the Canny edge detection by emphasizing edges in places where texture and brightness "corroborate" the edges found by the Canny method.

Results

Using the technique and parameters described above, our pb-lite method was able to achieve a (slight) improvement compared to be baseline Canny edge detection method. Specifically, its ODF F-score was found to be 0.59, compared to 0.58 for the Canny method.

Precision-recall curve with default values

Tweaking the parameters affected the curve, but was not found to result in a significant improvement of the F-score. For example, the image below shows the results with an increased number of histogram bins (256 instead of 32). The associated F-score was 0.60.

precision-recall curve with more histogram bins

Extension

This technique described above was extended by adding color information into the mix. Just like we used k-means to cluster textures, we used k-means to cluster colors (using 3 dimensions instead of 32).

As seen in the precision-recall curve below, the result beat the Canny baseline for most of the length of the curve. However, its F-score was 0.58 (the same as the Canny baseline).

precision-recall curve with color

An alternative method (using red, green, and blue channels separately, rather than as clusters) was tried and resulted in an even lower F-score of 0.55.