May 17, 2010

- Description
- Adding Patches from the source image
- Weighting by confidence and interest
- Results
- Analysis
- Further Work

The goal of this project is to fill in a masked region of an image. Previously in Computational photography, we looked at the scene completion algorithm, and many people implemented it in project 4. I used a different method in project 4, using a nearest neighbor approach on small image patches that had been converted to PCA space. This project is an extension of that method, with the goal of producing better results.

In order to improve upon my previous approach, I experimented with a number of different changes. I looked at how patch size, and number of bases effects the results. I also tried an advanced weighting scheme which puts emphasis on unusual patches when finding matches. While I was definitely able to get some improved results, there are still many ways in which this could be improved.

Below I will explain in more detail the different approaches I tried, as well as present and analyze those results. For more information on the method I am building on, see Project 4

For this project I only used the first 16 matched images due to constraints in computation time. PCA components were computed from a random selection of patches from the 16 matched images, and unless otherwise specified, 16 bases were used. 16 bases was selected through trial and error, to give good results in a resonable amount of time.

One of the simplest but most significant improvements I made on previous approach was including patches from the source image (only unmasked region). Without patches from the source image, results can be very limited. Below is the nearest neighbor of each patch in the source image (not its neighbors) from the top 10 matched images. Without using source patches this is an upper bound on the results we could achieve:

This is a comparison of the results using patches from the source image (right) and not using them (left).

In project 4 I used an 'onion' approach, filling in the outer layer in a random order, and then moving in. One of the more interesting improvements I made was to introduce a weighting scheme, which affected the order in which patches were completed, as well as the weights given to the errors of neighbor patches. The weighting scheme is similar to that described in Criminisi, et al..

There are two variables which are used to compute the weights. The first is the confidence term. All unmasked patches in the original image are given a confidence of 1. Patches matched after are given a confidence which is equal to the average of the confidence of neighboring patches times an error term:

*Confidence(i) = sum(all neighbors n) Confidence(n) *
Error(patch(i) - patch(n))*

The error term I used was a constant over the weighted difference
of two patches (used in computing the nearest neighbor) plus a
constant: * C / (Error + C) *

The second weighting term is the "Interest" term. To compute the interest weight of a patch I took the distance from the mean (in PCA space) divided by the variance of that component. This favors unusual patches, which vary on components which are typically less variable. Below is an image with the patch brightness adjusted by the interest term at each patch. As can be seen in this example, the interest metric tends to emphasize patches with strong edges and corners, or unusual textures.

The confidence term and the interest term affect both the order in which patches are filled in, and the weight in matching patches. To compute the fill order, my algorithm computes the priority of every fringe patch according to the formula:

*Priority = Sum(over all neighbors n) Confidence(n) * Interest(n)*

This results in more interesting patches being completed first, as well as avoiding filling in areas with little confidence until more evidence (neighbors) can be used. This also has the favorable side affect of preferring patches with more neighbors.

The second way in which the confidence and interest terms are used is to weight the patches in the matching process. Where the previous method computed the nearest neighbor for a patch based on its neighbors, this method weights the distance of neighboring patches as follows:

*Weight(p) = Sum(all neighbors n) (dot(patch(p) - patch(n), (variance)^C) * confidence(n) * interest(n)*

Where C is a constant, and *(dot(patch(p) - patch(n),
(variance)^C)* represents the difference in PCA space, with each
component weighted by its variance. This algorithm favors more
informative neighbors to find a matching patch

Below is a comparison of the previous method to the weighted method:

Below I included a selection of the results using differnt size patches, and the methods described above.

8 pixel patches | 16 pixel patches | 32 pixel patches | 64 pixel patches |

Further results using an image patch size of 16 pixels, 16 bases, and 16 matched images

In general this method does not perform as well as scene completion. However, I think with some further work it could potentially perform better in certain situations. The method works particularly well when there is a lot of structure in the scene. I think with further work it could be made to work well at filling in texture.

One of the advantages scene completion has over this method is that due to the large patch size used (the whole masked region), you are garanteed to get a coherent region even if it does not match the surrounding region well. One way in which this advantage could be incorperated into the patch based method would be to include more neighbors, and use patches farther away when computing matches. To keep the regions coherent, a bias could be introduced which encourages neighboring patches from a single image to be used together in the result image.

There are also a lot of parameters in my algorithm, ranging from adjustments to the weighting terms, to patch size, number of PCA bases, etc. Due to computational and time limitations I decided on most of these values through trial and error. However, a more principled approach to selecting parameters could potentially yield significantly better results.

**Improving the confidence decay function** I am not
particularly happy with the confidence decay function. It is a hack,
and I would like to replace it with a more principled decay function
based on the quality of the match found.

**Variable Patch size** One observation from experimenting with
different patch sizes was that different sizes work better in
different situations. I would like to try to implement a method that
uses variable patch sizes. Two ways this could be done are to find
matches with several different patches, (i.e. 16, 32, 64), and then
determine which gave the best results. The other method would be to
factor in confidence, for example using larger patches when confidence
is low.

**Cleaning up the results** There are a number of ways the
results could be cleaned up, such as using better blending between
patches (such as graph cut), and searching around a patch for the best
possible offset.