CS 195-G: Scene Completion

Evan Wallace

The goal of this project was to implement the local context matching phase from the paper Scene Completion Using Millions of Photographs.


This algorithm completes unwanted regions of an image using material from another similar image.  In the example below, the unwanted region is shown in black. 

The original image

The scene to be completed

The similar image to use material from

The trivial composite

The first step in completing an image is to align the two images as best as possible.  The similar image is aligned to the hole using the information just outside the hole.  Alignment error is calculated as the sum of squared differences between the two images weighted by the alignment mask, which is one just outside the hole and zero 50 pixels away from the hole.  Chessboard distance and an image pyramid are used for speed. 

The alignment mask

The masked original image

The masked aligned image

The next step in completing an image is to expand the hole using a graph cut.  Graph cut was restricted to the area where the alignment mask was non-zero (from one to 50 pixels away from the original hole).  Both regular sum of squared differences and sum of squared gradient differences were used as error metrics, and the one with the minimum flow was chosen.  Comparing the gradients instead of the pixels was proposed as a better error metric in the paper, because it cuts through smooth regions where color changes are less noticeable. 

The original hole

The graph cut hole

The smooth graph cut hole

The original hole composited

The graph cut hole composited

The smooth graph cut hole composited

The final step is to hide the seams at the edge of the hole using poisson blending. 

Poisson blending the original hole

Poisson blending the graph cut

Poisson blending the smooth graph cut

This algorithm was tried on three different scales of the similar image (0.81, 0.9, and 1.0 as suggested in the paper) and the one with the best score was chosen.  The score for each completion was one tenth the alignment error plus ten times the graph cut flow. 

The final result


This algorithm does have some disadvantages.  Scale is not considered at all, which sometimes creates mountain sized people or brick sized cars.  Also, since humans are very sensitive to straight lines, a very slight change in perspective can make what would otherwise be very good scene completion look horribly wrong.  In addition, the scene completion algorithm does not make any attempt to preserve whole objects, which often results in disconcertingly headless people. 

Mismatched scales

Mismatched perspective

Partial people


Each row below contains the original image, a simple poisson fill, and the best three scene completions.  The scene completions are more visually pleasing than the poisson fill in virtually all cases because the novel textures provided by scene completion are far better than the absence of texture due to the poisson fill.