CS143 Introduction to Computer Visions

Project 2: pb-lite: Boundary Detection (by Margaret Kim, mk20)

Objective

To be able to detect edges or boundaries of different images using pb-lite, a simplified version of an algorithm discussed in Contour Detection and Hierarchical Image Segmentation by Arbelaez, Maire, Fowlkes, and Malik.

Boundary Detection

In the field of computer visions, boundary detection is an important, well-studied problem as it is related to feature detection and recognition. Many scholars have created algorithms to solve this distinct problems. The classical ones include Canny and Sobel, which only check for intensity discontinuities. The more recent algorithm, pb (probability of boundary), considers texture and color gradients along with intensity, thus detecting the boundaries much more accurately than its predecessor algorithms.

Overview of the Algorithm

The general overview of the pb-lite algorithm is:

Low-level feature extraction: brightness, color, and textons
Multiscale cue combination with non-maximum suppression
Spectral clustering

The following image is the pb-lite pipeline chart.

For this assignment, we trivialized the second and third steps and focused on the low-level feature extraction. Thus, our main tasks were to:

Create a filter bank of multiple scales and orientations
Create half-disc masks of multiple scales and orientations
For every image:
- Create a texton map by filtering and clustering the responses with kmeans.
- Compute per pixel texture gradient (tg) and brightness gradient (bg) by comparing local half-disc distributions.
- Output a per-pixel boundary score based on the magnitude of these gradients combined with a baseline edge detector (Canny or Sobel).
Compare output of the algorithm with the results of using other algorithms (Canny and Sobel).

Details

Creating Texture Gradients (tg)

To describe the tg, or how quickly the texture is changing at that point, first we need to represent texture as a local distribution of textons. Textons are discrete texture elements generated by clustering filter bank responses. To produce these textons, we used the following filter responses.

Next, we filtered the input image with each element of your filter bank, which resulted in a vector of filter responses centered on each pixel. In this case, as we have a 16 orientations with 2 scales, a 32-dimensional vector at every pixel describing the texture properties. We simplified this representation by replacing each 32-dimensional vector with a discrete texton id. We will do this by clustering the filter responses at all pixels in the image in to K textons using kmeans, where K is in the range of 1 to 64.

Here's an example of a texton map.

Example Texton Map

To actually compute tg, we compare the distributions in left/right half-disc pairs centered at a pixel.

Masks

The gradient increases as the similarity between the distributions decrease. Because our half-discs span multiple scales and orientations, we will end up with a series of local gradient measurements encoding how quickly the texture or brightness distributions are changing at different scales and angles.

We will compare texton distributions with the chi-square measure. It is defined as follows:

chi_sqr(g,h)=.5*sumi=1:K( (gi-hi)2 / (gi+hi) )

where g and h are histograms with the same binning scheme, and i indexes through these bins.

The following is the pseudo code to calculate the tg.

//img is the texton map in this case tg = matrix of size n*m*p //n = height of img //m = width of img //p = number of all left-right mask pairs for every left-right mask pair chi_sqr_dist=img*0 for i=1:num_bins tmp = 1 where img is in bin i and 0 elsewhere g_i = convolve tmp with left_mask h_i = convolve tmp with right_mask update chi_sqr_dist end store chi_sqr_dist appropriately in tg end

Creating Brightness Gradients (bg)

This is the same process as producing the texture gradients. However instead of a texton map, we change the image to grayscale, scale the grayscale image so that each pixel's value ranges from 0 to 255, and use that as the "brightness map".

Multiscale cue combination

To approximate the per-pixel probability of boundary, I combined the results of the canny edge dectector with the tg and bg.

For each pixel of tg and bg, I took an average of the stored vector of gradients. Then I multiplied the mean of tg and bg to the canny edge detector value.

pb = canny*tg*bg

The last step was to normalize the pb values of all pixels.

Comparative Results

The last part of this assignment is to evaluate the performance of the pb-lite vs canny vs sobel vs gpb (the best edge detector algorithm so far).

Comparison Curve

Looking at the above chart, pb-lite is clearly not any better than gpb, but it is better than two baselines (sobel and canny) for the most part.

Some Pictoral Results

Top-left: Original image Top-right: Pb-lite Bottom-left: Canny Bottom-right: Sobel

Airplane

Penguin

Cardinal Bird