Project 4 Writeup

David Dufresne (ddufresn)
March 13, 2010

This was an attempt to do a multiple image scene completion algorithm. There were 51 images partially blocked by a mask. I created a "important region" by dilating the mask, and subtracting the mask from it. This was an are not covered by the mask, but close to it. Each of these images had up to 120 matches. For each match image, I did a recursive gaussian pyramid search to fit the best offset of the match image. The metric I used for determining this was the sum of squared differences under the important region. Given prior knowledge that the images were roughly aligned already, I added the distance from the center to this metric in order to bias the search towards smaller shifts. I then used the graph cut algorithm on the incomplete image and match, constraining pixels under the mask to come from the shifted match image. Pixels at least 100 pixels from the mask were constrained to come from the incomplete image. Top 20 results are returned, where the composite image's score is based on the weight of the graphcut.

5 Matches Ranked

20 Matches Ranked

It is clear my approach has some serious shortcomings. My implementation of graphcut has a tendency to cut off people's heads. I might have acheived better results if I formulated better constraint arcs. Also, instead of using edges of infinite weight to constrain pixels to come from one image or another, I could have used large but finite values that depended on distance from the mask edge. The pyramid image seach tended to return offsets where the edge of the match image became visible. In cases where this edge is under the mask, it is reasonable that the algorithm may choose such a drastic shift, especially to match up a horizon line or a building. However, to be aesthetically appealing to a human, the image would need to be cropped. The other alternative would be to constrain the image search not to consider offsets that would make an edge visible.

This algorithm also does not take perspective into account. Thus, some images where the color and line composition seem appropriate will have objects in different perspectives. Nor does the algorithm have any knowledge of the relative size of objects in the real world, leading to images where small mountains sit beside airplanes, and a giantess peers behind a corner at a woman half as big as her head. Overall, the algorithm seems to perform better on landscapes or city scenes with fairly uniform colors.