Edge detection is one of the most basic and important jobs in all of computer vision. Despite this, it is still a challenging field that has yet to be perfected. In this project, we analyze one type of edge detection, in particular, a modification of the pb algorithm called pb-lite. The main idea is that at every pixel we compare how similar neighbor on one side are to neighbors on the other. This is done by taking a series of half circular iflters at various rotations and sizes, and applying them to each pixel. That pixels edge candidacy is then based on the chi-squared distance between the distribution of pixels on one side of the filter and the other. To prevent detecting edges inside of textures, we first create a texton map by applying filters to every pixel and clustering these filtered images together. Hopefully, patches with the same texture should have roughly the same texton mapping.
For my base algorithm, I made a few small changes from the recommendation in the setup. Most importantly, I did NOT just average over all my various masks and multiply that with a base canny edge detector. First, I grouped my masks by orientation and averaged over each orientation. I then took the max value at each pixel over the average of each orientation. The advantage of this is that it makes the probability a confidence tht there is an edge in one specific direction, not just the average in all. Previously, if there was a small change in all directions (a radial gradient), that would be called an edge because each orientation presents some probability, whereas things that were strong edges in one direction would only contribute from one orientation, giving them a lower mean. In addition, this probabiliy is not just multiplied by the base canny case, it is first added to it and the sobel with a multiplicative offset (optimized by hand). This allows things that pb thinks are edges to present themselves when canny and sobel do not believe they should exist, and reinforces things that they all agree should be edges. Also of note, I decreased the number of textons from 64 to 32, as well as removed the mask with a radius of 20. This was done because it allowed me to speedup my code by a combined factor of nearly 5 fold while sacrifising only a marginal amount of efficiency
Lets now look at how well this algorithm runs. Below is an image of the presion-recall curves for our pb filter as well as the base canny and sobel filters.
The Canny filter starts off with an F-score of 0.4509, the Sobel is at 0.5770, and our pb curve improves this to 0.6115. This is not an amazing gain, but still a noticeable difference. The increased texton map size and mask bank size improved this by 0.003 or so, which was not enough to merit the extra time. Lets see what this different methods actually gave us. With this base image.
We get the following results with sobel, canny, and pb (respectively)
As you can see, sobel detects most of the penguins edges, and a lot of background noise edges. Canny detects the penguin and the sky/ground border in the entirety, but also gets a lot of noise on the ground. Our pb image looks similar to the canny (as is expected), but the addition of the pb adds a lot of noise, most of which is low probability, while the other noise seemes to have gotten lighter, aka less probable.
In our first simple attempt to improve our performance, we add a color gradient and combine that knoledge into the pb score. We calculate the gradient in each color, and then find the max gradient in each pixel across all 3 channels. This is done in an attempt to represent the fact that changing colors should not depend on which channels you change over, i.e. (0, 0, 0) should be the same distance to (255, 0, 0) as (255, 255, 255). The resulting precission-recall curve looks like this.
This has an F-score of 0.6151, which is a mild imporvement, but not too much. However, since the bins are so wide, calculating the color gradients is a very cheap operation making this imporvement not too much more expensive. The penguin image run through this looks like.
This looks very similar to the original pB, except the noise is a little brighter. This makes sense because adding another gradient filter adds more confidence to edges caused by gradients, such as the false positives that come in the background. However, by keeping a low threshold for accepting edges, we can ignore this noise, and it will reinforce true edges even more.
Finally, we take a look at an attempt to use soft counts for our textons. Normally, we assign clusters by saying each pixel is its closest texon center. Instead, we take the inverse exponential of the distance from each pixel to each center, and say the probability of being that pixel is proportional to this value. An inverse exponential is done to simulate the shape of a gaussian curve. Actual GMM would have been used, but it converged too slowly to be usable. The resulting recall curve looks like this.
As can be easily seen, this is still better than canny, but not as much so as our original pb. Our F-score has gone down to 0.5824. Upon further analysis, the fact that this does not work welll makes sense. Due to the curse of dimensionality, most of our pixels are roughly the same distance from the centers. This causes the closest point to only kind of be the closest, so the probability is shared as a small value between several different classes. After waiting for a GMM to actually fit, the result with full covariances actually assigned 99% of most pixel probaiblities to just one texton. Therefore, there is no real need for soft counts since most pixels are very definately just in one texton. For reference, here is our penguin using this filter.
As expected, this looks a lot like normal pB, except the noisy edges on the ground are much lighter. We can infer from this that our pb step using soft counts was not as effective at detecting edges since many of the textons have very similar low probability across the whole image.