PB-Lite

Based on a paper by: Arbelaez, Maire, Fowlkes, and Malik (2011)

Purpose:


We attempt to improve boundary detection in images. We begin with a baseline of a Sobel and Canny edge detection algorithms. The Sobel filter approximates the gradient, an effective but simple means for finding edges. Canny's algorithm improves this by including things like Non-Maximum surpression. While these are excellent edge detection algorithms we hope to improve boundary detection by considering additional image data.

Algorithm:


Main Idea

To improve upon the Canny and Sobel we consider additional image information to improve our per-pixel probability of boundary score. The general strategy we will use is to judge similarity to surrounding pixels, using filtering operations to get a better idea of whether an edge is part of a true object boudary or just a texture edge. The two main steps are generating a texton map and evaluating chi-squared difference between neighbors.

Texton Map

To generate a texton map, we first create a filter bank of derivative of Gaussian filter's rotated to different orientations. To create a good bank, we use several different sigmas and several different orientations. We measure the filter response at each pixel in an image and combine them into a vector of filter responses. We then use K-Means to cluster these repsonses into K-different texture id's. We then take this texture information into account when measuring chi-squared distance, to help account for the fact that neigboring pixels may be very different in color and intensity, but from a human's perspective they look very similar because they belong to the same texture. Here is a sample filter bank and texture map based on K=64.
Filter Bank:



Original Image & Texton Map:

Chi-Squared Distance

We measure similarity to neighbors by calculating the chi-squared distance of pixel's neighbors on either side of a half disk. We use multiple scales and orientations for these disks to get a more complete vision of the pixel's neighbor's similarities. The more orientations we have, the more vectors we are judgeing similarity around and the more scales we use the more different sizes of locality we are examining. We compute the chi-squared values for our texton map and brightness. Here is a set of masks we use for this computation:



Combining Results

Finally, once we have calculated the texton map and run brightness and texton map through a chi-squared test, we have to combine our results in a meaningful manner. Since we used many different masks, we average all of the values together to reduce a chi-squared score to one number for each image characteristic. Finally, we average the results of brightness and the texton map and multiply the values per pixel by a Canny baseline. We then need to normalize the image to ensure our probabilities are between 0 and 1. The resulting image has a per pixel value between 0 and 1, where 1 says that we are absolutely sure this is a boundary and 0 says we really don't think this is a boundary. To evaluate against human annotated images we simply must pick a threshold for what confidence we want to say we have a boundary. We can then measure how many human annotated edges we recognized and how many false edges we produced that humans did not recognize. The boundaries detected from the previous penguin image are:

Results:


To assess our results, we can run our algorithm on multiple images. We can then compare our results to a set of human annotated images. The following graph compares pb-lite (the black line) with other methods. The y-axis is precision, meaning how many of our annotated edges are are correct human annotated edges. The x-axis is recall, meaning what fraction of human marked pixels did we also mark as edges. The reason there is a curve rather than a point is because we score differently using different "confidence thresholds," or what probability of boundary do we consider to be an edge. The two graphs below show the results from: left - using just texton and brightness gradient data, and right - using texton, brightness, and rgb channel data. As we can see in the graphs, pb-lite is significantly better than canny edge detect. Interestingly, adding rgb channel information hardly improved our curve at all and the biggest difference can be noted in a slight improvement at the top left of the graph.
Check out this very rough around the edges and only chrome-tested way of visualizing the the texton map and pb-lite boundaries on top of an orininal image:

Result Visualizer

Another way to more quantitatively assess is with the F-Score. Below are results for just intensity and additionally with RGB information. The results are exactly the same as suggested by the curves above.

Results with just intensity:


Sobel:
Boundary
ODS: F( 0.38, 0.55 ) = 0.45 [th = 0.09]
OIS: F( 0.38, 0.55 ) = 0.45
Area_PR = 0.21


Canny:
Boundary
ODS: F( 0.66, 0.51 ) = 0.58 [th = 0.15]
OIS: F( 0.70, 0.50 ) = 0.59
Area_PR = 0.50


PB-Lite
Boundary
ODS: F( 0.66, 0.55 ) = 0.60 [th = 0.11]
OIS: F( 0.63, 0.58 ) = 0.60
Area_PR = 0.52

Results with RGB

Sobel
Boundary
ODS: F( 0.38, 0.55 ) = 0.45 [th = 0.09]
OIS: F( 0.38, 0.55 ) = 0.45
Area_PR = 0.21


Canny
Boundary
ODS: F( 0.66, 0.51 ) = 0.58 [th = 0.15]
OIS: F( 0.70, 0.51 ) = 0.59
Area_PR = 0.50


PB-Lite:
Boundary
ODS: F( 0.66, 0.55 ) = 0.60 [th = 0.09]
OIS: F( 0.64, 0.56 ) = 0.60
Area_PR = 0.51


Output Images

First column is original picture, second is canny and third is pb-lite.