Before I can generate a texton map for an image, I need to create the filter bank that will produce the texton map. The filter bank consists of a set of filters at different orientations and scales. The filters are produced by convolving a gaussian filter with a sobel filter. The result is an approximation of the derivative of a gaussian. Here is the filter bank I used:
I used two scales with eight orientations each. The top four rows are the smaller scale filters and the bottom four rows are the larger scale filters.
Next, I create a collection of pairs of half-disc masks which are used for calculating the gradients. After creating the filter bank, the masks are not too challenging considering they are essentially binary-valued versions of the filters in the bank. Thus, I created them by starting with convolved gaussian and sobel and then using im2bw
to convert the filter's values to either 1 or 0. The threshold for deciding 1 or 0 was obtained by running graythresh
on the filter. Here is my collection of half-disc masks:
Adjacent masks are complementary pairs. The radii of the masks increases down the chart.
Now that we have a filter bank, we can create a texton map. The procedure to generate a texton map for a given image is roughly as follows. For each filter in your bank, filter the image and store the result. Then, pass all the results into kmeansML. For any given pixel in the image, there are now n filter response values for that pixel where n is the number of filters in the bank. These n response values locate the pixel in n-dimensional space. The kmeansML function does its best to cluster these pixels into k distinct categories based on their location in n-space. K is a parameter that can be specified (in this case k=64). In the end, each pixel is assigned a value 1-64 which represents its texton id. In theory, pixels with similar texture will have a similar texton id. The visualization of a texton map might look like this:
Now we can compute gradients maps for various properties of the image. Let's start with finding the gradient for the texton map we just created. The gradient will measure how much the texture distribution is changing at a given pixel. We will use our half-disc masks to do this. At each pixel and for each pair of masks, filter the image with the mask pairs centered at that pixel. If there is a large texture shift, then the left and right mask will produce very different values and the gradient will be high. Otherwise, if the texture is fairly constant, then both masks will have similar results and the gradient will be low. By using all the pairs of masks at different orientations, we get a good reading of the pixels gradient in many directions. We use the chi-squared distance between the two masks' impulses to measure the gradient.
We can speed up this process by making use of the linear nature of the formula. Here is the algorithm I used:
create a 3D result matrix that is m*n*o where m*n is the image dimension and o is the number of masks for all the scales of masks: for all the orientations of masks: chi_sqr_dist = m*n array of 0's for all the binvalues i convert the image to a binary image having 1's where the pixel == i convolve the binary image with the left mask of the pair convolve the binary image with the right mask of the pair chi_sqr_dist += the chi-squared distance between the two mask impulses at each pixel set the corresponding layer of the result matrix to be chi_sqr_dist for this iteration return resultThe result of the above algorithm is the texton gradient of the image for each mask pair. The algorithm is general enough that it can compute the gradient for brightness values and color values too.
1) combine the texture gradients for all mask pairs into one average texture gradient 2) combine the brightness gradients for all mask pairs into one average brightness gradient 3) average the combined texture and brightness gradient into one composite gradient 4) use imadjust to normalize and strengthen the composite gradient 5) filter the composite gradient with a gaussian to smooth out noise 6) element-wise multiply the composite gradient with the sum of the baseline data 7) return the resultThe result of this algorithm is a map that combines strong edges from the texture and brightness gradients with the edges produced by the baseline sobel and Canny.
Original |
PB-Lite |
Composite |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Upon evaluation of my pb-lite implementation, I was fairly satisfied. Qualitatively, my procedure appears to do much better than the Sobel and slightly better than the Canny. My pb-lite is more confident of edges and doesn't mistake variation of texture within an object for a boundary as much as Canny does. Here are the results of the automated evaluator when running on a sample of 8 photos:
I was pleased with the qualitative results of my pb-lite, though I wish the F-score was a little bit better. In the future, I would try different ways of combining the texture and brightness gradients with the baseline data. I feel my current implementation is still too favorable to the baseline data. Also, it would be interesting to try different types of gradients such as a color gradient.