pb-lite Boundary Detection
Benjamin Leib
bgleib
This project implements boundary detection using a simple version of the potential boundary algorithm. It combines several methods to produce the final output, including a Canny baseline, a texton gradient image, and a brightness gradient image. The most interesting of these is the texton gradient. This is calculated by first filtering the input image with all of the filters in a filter bank of blurred sobel filters (i.e. a sobel filter convolved with a Gaussian filter) of various scales and orientations. My implementation uses two scales, with Gaussian sigmas of 1 and 2 and sizes 6x6 and 12x12. It also uses 16 different orientations, for a total of 32 filters in the filter bank. See the figure below for a representation of these filters.
The result of convolving each of these filters with the input bank is a vector of 32 responses for each pixel. These vectors are then fed into a k-means clustering implementation, with k=32. Each of the clusters is taken to represent a texture, and the cluster classification for each pixel is its texton. This generates an image like the one show below, where each color corresponds to one of the textures.
This texton map is then used to find the texture gradient at each pixel. This is done by generating histograms of the local distribution over textures, where "local" is defined by a binary half-disk mask centered at the pixel. The histograms for a mask and its pair, which is the same mask rotated 180 degrees, are then compared via chi-square distance, and the result of that comparison is the texture gradient at that pixel. This is done for each of a set of mask pairs of different scales and orientations-- specifically 8 orientations and 3 scales (disk radii of 5, 10, and 20 pixels), for a total of 24 mask pairs, which are shown in the image below. The final result is 24 images representing texture gradients for each of the different scales and orientations.
A similar gradient is also calculated for the raw grayscale intensities of the image, except that instead of each texture being a bin in the histograms, each bin in the histogram is a range of 8 intensities in the full 0-255 range. This generates an additional 24 images representing intensity gradients for each of the different scales and orientations.
We know have the Canny baseline and 24 images each of gradients for texture and intensity. These are combined to create the final output image via the equation for each pixel output = Canny + (mean(texture gradients) * mean(intensity gradients)). After normalizing this merged image so that all pixels are in the range [0,1], we have our final output.
Below is the precision-recall curve for my results on the 10 text images. As you can, I outperform both of the baselines, though as could be expected I do not outperform gpb, which is the state of the art.
See below for a few example results of my algorithm.