CS 143: Pb-Lite: Boundary Detection

Evan Wallace

Boundary detection is an important problem in computer vision. It is distinct from edge detection, which finds the boundaries between light and dark pixels in an image. Boundary detection instead finds the semantic boundaries between what humans would consider to be different objects or regions of the image. For example, a zebra has many internal edges between black and white stripes, but humans wouldn't consider those edges part of the boundary of the zebra. A complete solution involves high-level semantic information about the scene in the image which computers do not yet have, so most efforts focus on learning an approximate boundary detection algorithm from training data.

This project implements a simplified version of the advanced boundary detector presented in the 2011 paper Contour Detection and Hierarchical Image Segmentation, by Arbeláez, Maire, Fowlkes, and Malik. The name "pb-lite" comes from "probability of a boundary," so the output of the algorithm is the probability that a given pixel is a boundary. This algorithm starts with Canny edge detection and "turns off" edges that are non-boundary edges.

Algorithm

This algorithm starts by running Canny edge detection on the original image. To improve the chance that Canny will find an actual semantic boundary, the Canny edge detector here is actually the average of multiple Canny edge detectors at different resolutions. This gives lower weight to edges that only appear at finer resolutions. The Canny thresholds were 0.1 through 0.7 stepping by 0.1 and the Canny sigmas were 1 through 4.

The original image

Canny edge detection

The boundary probability estimate combines a texture gradient and a brightness gradient. Texture is measured by assigning a per-pixel texton id, where areas of the image that have similar texture will have similar distributions of texton ids. Texton ids are computed by first creating a per-pixel vector where each element is the image convolved with a different filter, then clustering those vectors using k-means (with k = 64). This means textons are unique to the image they originated from (comparing textons across images doesn't make sense). The filter bank used gaussian filters with sigmas 1 and 2 convolved with the Sobel operator, then rotated by one of 8 evenly spaced directions.

The filter bank

The texton map

The texture gradient computes the rate of change in the local distribution of textons. It is the sum of many directional gradients over several scales and orientations, all computed in circular neighborhoods. Each directional gradient divides the circular neighborhood into two half-disks and uses chi-squared distance to compare the texton distributions under each half-disk. The chi-squared distance computation is chi_sqr = 0.5*sum((g_i − h_i)² / (g_i + h_i)) but can be optimized in MATLAB using binary masks and convolution:

% left_mask and right_mask are binary masks for the current scale and orientation
% they are opposites: when added up, they form a binary mask for a whole disk
chi_sqr = image * 0;
for i = 1:k
  temp = (image == i);
  g = conv2(image, left_mask, 'same');
  h = conv2(image, right_mask, 'same');
  chi_sqr = chi_sqr + 0.5 * (g - h).^2 ./ (g + h + eps);
end

This uses a temporary image and convolution to compare texton ids under a half-disk mask with just one MATLAB function. Note the use of epsilon in the denominator to avoid dividing by zero if neither half-disk contains any of the current texton id. Masks were created with radii 5, 10, and 20 pixels and were rotated by one of 8 evenly spaced directions.

The masks

The texture gradient

The brightness gradient computation is almost the same as the texture gradient computation. However, binned grayscale brightness values are used instead of texton ids. The bins used were every 8 values from 0 to 255.

The brightness gradient

The texture and brightness gradients are then combined with the Canny edge detection. Combination methods are discussed in the results section.

The result

Results

This algorithm was run on 10 images from the data set used in the paper.

Results were benchmarked using the MATLAB code provided with the paper, which compares the generated pb image to a manually annotated boundary averaged over a number of human subjects. The below graphs plot recall, or percent of actual boundaries found, against precision, or percent of boundaries annotated correctly. Good algorithms will have both high precision and high recall. The dotted lines are the performances of Canny edge detection and the pb implementation from the paper. The black line is the performance of the pb-lite implementation, and is the only value that changes between graphs.

pb = (sobel + canny) • (tg + bg)

pb = canny • (tg + bg)

pb = canny • bg

I tried a few different methods of combining edge detectors and gradients. Sobel edge detection significantly decreased performance, which makes sense since Canny edge detection contains more advanced features like non-maximal suppression. Surprisingly, the texture gradient didn't end up helping at all. For some images, the texture gradient would incorrectly turn off actual boundaries. Mouse over the image below to compare the texture gradient with Canny edge detection. Notice how the main horizontal edge behind the penguin is masked out (i.e. black) in the texture gradient image.

The texture gradient (mouse over for Canny edges)

The best metric I found was just to multiply Canny by the brightness gradient. This was measured using the ODS F-score, which improved from Canny at 0.58 to Canny times brightness gradient at 0.61. This is unfortunate however, since both Canny and the brightness gradient just operate on pixel brightnesses and don't take advantage of other information. This method can only improve over Canny in precision since it is only "turning off" edges provided by Canny that aren't actual boundaries. It can't hallucinate actual boundaries that Canny missed and therefore can't improve recall.