Name: Chen Xu
login: chenx
In the basic part, I follow the given pipeline, using multiscale features, which include brightness, texture and results from canny edge detector. To obtain texture map, I create a series of filter banks of odd-symmetric gaussian derivatives by convolving gaussian kernel with a sobel kernel, and rotating it with different angles. This process is used for different scales of gaussian kernel. Mask pairs are created according to different orientations and radius. By convolving the grayscale image with the filter bank, a vector feature is derived for each pixel and k-means is used to cluster the features, thus forming the tmap. Tmap and the brightness image are used to calculate the chi-distant with the masks. Finally, I use the simple mean of the features to multiply the canny results.
Specifically, the filter bank is a 16-by-2 cell matrix, corresponding to 16 orientations and 2 scales. And the masks is a 8-by-3-by-2 cell matrix, corresponding to 8 orientations in [0, 180] and 3 scales [5 10 20]. To compute chi-distance, using a single for loop to for every pixel, which is indicated by the project webpage. The size of the gaussian kernel is chosen as 6 * sigma + 1, which has the zero values at the edges. The result shows that, the pb-lite has beaten the other two baselines, canny and sobel.Fig. 1 shows the comparation of baseline canny and pb-lite, and Fig. 2 shows the comparation before and after improvement.
![]() |
![]() |
![]() |
Fig. 1 From left to right: original image, baseline(canny) output, pb-lite output.
![]() |
![]() |
Fig. 2 Left: basic pb-lite, right: pb-lite after improvement.
There are a lot of things can be done to improve the basic result performance, to earn extra credits, I have improved algorithm by the following steps:
In the basic part, I use 6 * sigma + 1 as the size of the kernel, which makes the values of the edges zero. But when I reduce the size to 3 * sigma + 1, there is better performance. In Fig. 3, the performance is better especially when recall is less than 0.1. So I decide to use kernel size 3 * sigma + 1.
![]() |
![]() |
Fig. 3 Left: kernel size = 3 * sigma + 1, right: kernel size = 6 * sigma + 1.
In the basic part, I just calculate the means of all the features for each pixel. This is actually a linear combination of the features with a constant weight of 1/n. To improve it, I calculate the means of the features of every orientation, and find the maximum value from different orientations of each pixel, using the equation: mPb(x,y) = max(mPb(x,y,theta)). This process improves the performance a lot(Fig. 4). And now smaller kernel size results in better performance for lower recalls.
![]() |
![]() |
Fig. 4 Choosing the maximum value of the means from different the orientation improves the performance.
Left: kernel size = 3 * sigma + 1, right: kernel size = 6 * sigma + 1.
In the first step, I improved the filter banks by using 8 odd-symmetric gaussian derivatives, corresponding to 8 different orientations in [0, 180], 2 even-symmetric gaussian derivatives by convolving sobel filter with gaussian kernel twice, with 2 different orientations, and one difference of gaussian filter, which satisfies Gaussian(u, sigma) - Gaussian(u, 0.25 * sigma). Fig. 5 shows the improved filter bank.
Fig. 5 Improved filter banks: 8 odd- and 2 even-symmetric Gaussian derivative filters and one Difference of Gaussian Filter.
The improve of the performance is great, in lower recalls, the performance is better than gPb.(Fig. 6)
![]() |
Fig. 6 Richer filter bank, lower recall performance is better than gPb. (kernel size = 6 * sigma + 1)
In the second step, I use a much richer filter bank. I use 8 even-symmtric gaussian derivative filters instead of 2 (Fig. 7). However, the performance drops down.(Fig. 8) So it is not true that the richer filter bank is, the better performance will be. We should carefully choose the filter bank.
Fig. 7 Improved filter banks: 8 odd- and even-symmetric Gaussian derivative filters and one Difference of Gaussian Filter.
![]() |
![]() |
Fig. 8 Using richer filter bank to improve the performance.
Left: kernel size = 3 * sigma + 1, right: kernel size = 6 * sigma + 1.
I use two color models to compare the performance:(1)RGB;(2)HSV. When using RGB model, the three channels(red, green, blue) are calculated separately as three additional feature channels. When using HSV, because the v-channels represents the value of brightness(the same as grayscale image), just calculate the H and S channels as addtional feature channels.
![]() |
![]() |
![]() |
Fig. 9 Left: HSV, Middle: RGB, both are with a kernel size of 6 * sigma + 1, and Right: RGB with the filter bank in "Richer filter bank step 1"
The improvement by color is not that obvious as the previous improving methods. Maybe color information should not just be used as three separate channels. Better ways should be investigated.