For this assignment, I followed the recommended pipeline to arrive at my pb-lite implementation. The while I achieved F-scores that beat the baseline methods for both sets of data I ran the tests on, running on a larger data set produced an improvement in the relative performance of my pb-lite algorithm.
To create the filter bank, I first precomputed a Sobel kernel. Then, for each sigma value, I computed a gaussian kernal and convolved it with the Sobel. I then looped over each pre-defined orientation and used imrotate to rotate the convolution result by the specified number of degrees. Too ensure consistency in the size of the masks, I fed the 'crop' argument to imrotate so that the resulting image would not be bigger than the original before rotation.
I made the masks in much the same way I created the filter bank. I first started by using fspecial('disk', radius) to create a disk mask. I then set anything with a value greater than 0 (i.e. in the disk) to 1 for half of the disk, and 0 otherwise. After creating a pair of discs for a given radius, I looped over the number of orientations and rotated each disk by the amount specified in the supplied orientations vector.
Computing the texton map was fairly straight forward. Here, I simply filtered my image with each of the filters in the filter bank, reshaping the filtered result into a 1 x N vector and storing this as an entry in the matrix I will eventually run k-means on. After obtaining a result from k-means, I reshaped the data back into the original image dimensions to arrive at a texton id for each pixel.
To compute the gradients, I first created a 3d matrix of image_x by image_y by num_bins. I stored a histogram for each bin value in the 2d "slices" of the 3d matrix. I then filtered this 3d matrix of histograms, separately, twice - once by each half-disk filter comprising a given pair. After computing two filtrations for a given pair of half-disks, I computed the chi-square distance over all of the bins, storing the result as a "slice" in my 3d gradient matrix.
After computing the texture and brightness gradients, I simply took the mean over the third dimension of both resulting matrices to "flatten" into 2 2-d matrices. I added both gradient means together to form a single matrix, then took the element-wise product of this matrix and the canny baseline. I concluded by returning a normalized version of the pb image, using the normalization code provided by the TAs.
10 test images in directory: testset/ | 60 test images in directory: testset_60/ |
![]() |
![]() |
Sobel F-score: 0.45, Canny F-score: 0.58, My pb-lite F-score: 0.60 | Sobel F-score: 0.52, Canny F-score: 0.58, My pb-lite F-score: 0.61 |
We can see from the charts above that there is a difference in the relative performance of my pb-lite algo depending on the size of the dataset. For the 10 test images provided by the TAs, my pb-lite algorithm beats the sobel and canny baselines, but the curve dips slightly below the canny baseline for the lowest recall value. For the 60-image dataset (generated after I killed the multithreaded process that was building the output against the entire 200-image BDS500 gallery), my pb-lite algorithm consistently outperforms both baseline metrics (as well as the paper's baseline canny). The images used / image output is included in the data/ directory of the submission. Below are some images from both testsets that provide a visual comparision of the three boundary detection algorithms.
Image 112056.jpg | |||
![]() |
![]() |
![]() |
![]() |
Original image | Sobel baseline | Canny baseline | My pb-lite |
Image 35049.jpg | |||
![]() |
![]() |
![]() |
![]() |
Original image | Sobel baseline | Canny baseline | My pb-lite |
Image 106005.jpg | |||
![]() |
![]() |
![]() |
![]() |
Original image | Sobel baseline | Canny baseline | My pb-lite |
Image 130014.jpg | |||
![]() |
![]() |
![]() |
![]() |
Original image | Sobel baseline | Canny baseline | My pb-lite |
Image 208078.jpg | |||
![]() |
![]() |
![]() |
![]() |
Original image | Sobel baseline | Canny baseline | My pb-lite |
Overall, I saw a 50% improvement in my pb-lite F-score versus the canny baseline (up to 0.61 vs. 0.58 instead of 0.60 vs. 0.58) when running it on a larger data set. I suspect this is because pb-lite had more of an opportunity to "prove" itself as an improvement over the baselines when given more data to work on. In the future, it would be interesting to try out different ways of combining gradient information (instead of just taking the mean over the chi_square distances computed for each mask pair) as well as different filters when computing the texton map. Using the MATLAB version of kmeans may also lead to improvements.
Shoutout to vmoussav for advising me on some simple optimizations and explaining F-scores to me.