CSCI1430 Project 2: pb-lite:Boundary Detection

Soravit Beer Changpinyo (schangpi)

1. Introduction

Boundary detection is an important problem in computer vision. After decades of research by computer scientists, boundary detection is still considered to be "vision-hard." Classical edge detectors such as those of Canny and Sobel focus on detecting the intensity discontinuities. More recent detectors also consider other types of discontinuities including color and texture discontinuities. One reason that these recent detectors outperform the classical ones lies in their ability to suppress edges resulting from textures. In this project, we will implement a simplified version of edge detector described in this paper. The goal of the project is to beat classical baseline performances.

2. Algorithm

Here are the main steps of the algorithm: (1) low-level feature extraction (2) multiscale cue combination with non-maximum suppression (3) spectral clustering.

Most of the work is already done for us. The rest of the work will be on the representations of brigtness, color, and texture gradients. To create such representations, we need to pre-define a filter bank as well as half-diskc masks of multiple scales and orientations. After that, for each image, we filter the image and cluster the response with k-means in order to derive a texton map. We make use of half-disc masks on the texton map to derive local half-disc distributions and compare them using chi-square distance. The result of this process will be texture gradient (tg). Then we do the same for brightness to derive brightness gradient (bg) (we use half-disc masks on brightness representation of the image instead of the texton map). Lastly, we combine per-pixel boundary score on the magnitude of these gradients combined with the baseline boundary detector. The details of the algorithm are in the following subsections.

2.1 Filter bank and half-disc masks

The filter bank that I used in this project is a collection of oriented derivative of Gaussian filters just as described in the handout. I created these filters by convolving a Sobel filter and a Gaussian kernel (with radius = 6*sigma) and then rotating the result. I created half-disk masks by first making a disk kernel, rounding it into binary integers, cutting it into half discs, and then rotating the result. The process of creating the filter bank and the half-disc masks was done for multiple scales. They are shown below:

2.2 Texton map and Local texton distributions

For each image, I created a texton map by first filtering all the pixels and then clustering the filter responses into K = 64 textons using kmeans. I created a local texton histogram, which counts how many each texton is observed, to represent local texture distribution.

2.3 Texture gradient (tg) and Brightness gradient (bg)

To get a tg for each image and for each half-disc, I compared the local texture distributions in the left and the right half discs using the chi square measure. To get a bg for each image and for each half-disc, I compared the local brightness distributions instead. The number of bins for bg is 256/8 = 32.

2.4 pb score

I assigned a pb score to each pixel by combining the baseline edges with the texture gradients and the brightness gradients. The formula that I used is pb = canny .* (max(tg) + max(bg))/2, where max(x) is the maximum values over all scales and orientations; that is, pb score is simply the product of the baseline edges and the averaged gradient values.

3. Extra Credit

3.1 Filter bank

I tried several richer filter banks using code from this. Surprisingly, nothing beat the performance of our filter bank with circular filters (I even tried this on 200 images). Thus, I simply made our filter bank richer by adding circular filters.

3.2 Color gradient

Noticing that RGB features give similar results to brightness features, I decided to convert an image into Lab space. Each channel is adjusted to fit 0-255 scale. Note that the Lab space is also used in Arbelaez et al. 2011. Since the L channel already represents brightness of an image, the old brightness gradient is ignored. Color gradients will be shown in the next section along with texture and brightness gradients.

4. Results

Note: pb-lite-extra denotes the algorithm that uses a new filter bank and incorporates color gradients.

4.1 Texture, Brightness, and Color Gradients

We can see that adding color features tremendously help improve the performance in some cases.
From top to bottom, left to right: original, texture, brightness (L), color1 (a), color2 (b), combined feature strength.

4.2 Sample results

It is quite clear that pb-lite is better than Canny, and there is a little additional improvement in pb-lite-extra.
Original Sobel Canny pb-lite pb-lite-extra

4.3 F-Score

The F-score values are shown below. They correspond well with the above results.
On 10 sample images:
Sobel Canny pb-lite pb-lite-extra
ODS 0.45 0.58 0.61 0.62
IDS 0.45 0.59 0.61 0.62
Area_PR 0.21 0.50 0.53 0.51
pb-lite pb-lite-extra
On 200 test images:
Sobel Canny pb-lite
ODS 0.51 0.60 0.63
IDS 0.54 0.61 0.64
Area_PR 0.28 0.53 0.53
pb-lite

5. Discussion

We successfully created a boundary detector that beat the baseline performances. Here are some observations and challenges:

First, the fact that there are many free parameters makes this approach difficult. For example, assigning the pb score is probably one of the most interesting and challenging tasks in this project. I found that using mode instead of mean to represent each type of gradient can improve ODS by 0.01. This is not surprising because only for some orientations and scales do edge pixels show discontinuities. Moreover, I believe that if we can learn how much each type of gradient contributes to the pb score, we can improve the results significantly. Due to the time constraint, I only tried a few linear combinations. However, it is even possible that the perfect formula to combine features is not a weighted average model. Furthermore, it occurs to me (by looking at the gradients of several images) that the weights should not be the same for every image (i.e. we should learn them for each image).

Second, another thing that makes this approach difficult is the fact that clustering is not perfect. (1) K-mean algorithm requires knowing the number of clusters in advance. One might try to explore other clustering algorithms that are more flexible (such as nonparametric models). (2) Even if we know the number of clusters, the accuracy of K-mean is still not perfect, especially when it deals with data in high-dimensional space. One can try probabilistic K-mean to solve this problem.

Third, other richer features still need to be explored. According to the result above, color features improve the performance of the detector. However, texture, brightness, and color features are not sufficient in detecting image boundary, as we can see from poor performance in some sample pictures. In fact, I believe there is no such a perfect set of features, as humans themselves cannot perfectly agree on the results of this task.

6. Acknowledgment

Many thanks to Vazheh Moussavi for suggestions on extra credit. Also Evan Wallace for answering a lot of questions.