pb-lite Report CS143 Website

Vazheh Moussavi (vmoussav)

Project Description
Basic Results
Final Product
Extensions

multi-scale cue combination
LAB color
alternative filter bank
earth mover's distance

Project Description

The goal of this project is boundary detection via a scaled-down version of pb ("probability of boundary"), as described by Arbelaez et al. 2011. The pipeline of the project is as follows:

The "filter bank" refers to a set of oriented gaussian derivative filters, which are easily approximated by convolving a gaussian kernel with a simple gradient filter such as a sobel:

Convolving each member of the filter bank with the grayscale of the image gives a response at each pixel representing a set of features which we can use to classify different textures. Putting this into the k-means algorithm clusters each pixel with an texure id, or "texton". Once we have our texton map, we take it and the grayscale (brightness) image and convert them into a set of binary images by some binning scheme (we bin the texton map by the id numbers, and brightness by evenly-spaced intervals on [0,255]). We convolve every binary image with a set of half-disc pairs at different scales and orientations (mirroring discs unshown):

These act as neighborhood counts for each unique bin, implicitly constructing for each pixel and scale-orientation, a histogram over the bins. The gradient of the image can be approximated by how different mirroring responses are, which we measure using the chi-squared distance over the histogram bins. We then collapse the distances in some way---a mean over everything is reasonable---and combine it with an edge detector baseline of either canny or sobel, best done as an AND operation, giving us our pb-lite.

Basic Results

Let's use the penguin as an example:

Here's what the texton map looks like (I used k=32 clusters):

And the texture (left) and brightness (right) gradients:

Combining with the canny baseline as stated before, here is what we get for the pb:

Final Product

(please check "extensions" for further details on methods used)
I ran my implementation over the whole data set with LAB color space, the maximum-based multiscale cue combination, and the regular filter bank, and got an F-score on ODS of 0.626246

Extensions

multi-scale cue combination

In the paper, the authors mention a more intelligent way of combining texture/brightness/etc with scales and with orientations. They take a linear combination (I took the mean) over each cue and scale, giving #orientations measures, of which take the maximum. This makes sense since we are measuring the gradient and thus only care about the angle which gives the biggest response and not others. Trying this, my F-score improved from my basic implementaion of 0.596372 to 0.608087. Here is the resulting PR curve over the 10-image dataset.

LAB color

LAB color is closer to differentiating color by human perception via euclidean distance. Using the cardinal image as an example we can see that the second channel ("A") separates the image very nicely, giving reason for splitting them this way.

I got an F-score of 0.607095

alternative filter bank

I also tried a filter bank more similar to that used in the paper:

I got an F-score of 0.605854

earth mover's distance

I tried Earth Mover's Distance. Because it needs to be done at each element, it was very slow to evaluate and hard to experiment with. As a result, I only tried a few cost matrices (with weighting schemes) and only got worse results, the best getting an F-score of 0.576686.