scene completion

Project 4: Scene Completion

Scene completion using millions of photographs (Hays, et al.; Website).

Due Date: 11:59pm on Friday, March 12th, 2010



You are required to implement the following:


MATLAB stencil code is available in /course/cs195g/asgn/proj4/stencil/. You're free to do this project in whatever language you want, but the TAs are only offering support in MATLAB. The TAs have supplied a few files to get you started.

For this project you will create an image completion method which fills in regions of missing or unwanted image content (this task is sometimes referred to as inpainting, region filling, or hole filling). There are many possible image completion strategies:

Scene completion is an instance of this last approach, except that the similar scenes are found automatically from the Internet.

Scene completion has two steps: first retrieve images that match based on some feature set ("Scene Matching") and second find the best matching region from these retrieved images ("Local Context Matching"). If you're curious on how either of these steps were done by the people involved, you can read the paper or just email the professor.

In this project, we will provide the similar scenes for each image completion task and your job will be to use those similar scenes to make a convincing composite. You will want to leverage all three of your previous projects to align images, find good seams, and blend images. While the graph cut and Poisson blending methods from previous projects can be used with minimal modifications, the pyramid alignment method will need to be extended to handle the following issues:

In common with project 1 is the fact that the images are likely to be roughly aligned already. You may want to bias the search towards small shifts or avoid searching the full range of possible shifts.

test set

Template Matching

An alternative (although related) strategy is not to align the entire images, but instead to do a sliding window search to calculate distance between the "local context" around the hole and the matching image at different offsets. This method will work (Scene Completion uses this approach), but it will be slow if you search the whole image. If you want to run your code on more and larger images, you may need a faster method.

One way to speed this up, in the case of the SSD metric, is to formulate the sliding window distance computations as a series of correlations. This will allow you to use the fast imfilter() function in Matlab or to perform the correlation in the frequency domain using a Fourier transform.

If you examine the sum of squared differences between images A (the matched scene) and B (the template) at offset i, sum((Ai - B).2), you'll see that it can be multiplied out as the sum( Ai.2 - 2*Ai.*B + B.2 ) for all possible shifts i. Note that all operations are element-wise (e.g. dot products), not matrix multiplications. If we additionally weight each element by a mask, M (where 1 = source, 0 = region to be filled) you get the following: sum( M.*Ai2 - M.*2*Ai.*B + M.*B.2 ). If we assume that B has already been multiplied by the mask (the missing pixels are zero, etc) this is sum( M.*Ai.2 - 2*Ai.*B + B.2 ) The reason why we add M is to prevent the masked, missing region from contributing to the SSD. Since B never changes, the B.2 term will not affect the location of the minimum offset. What you can do from here is treat the aforementioned formula as the sum of a couple of convolutions.

Using the above formulation with imfilter() will still be slow on large images. So it will be hard to get many results on the test images because they are all large (around 600x800). Other approaches may be necessary to speed up your code. You could try doing convolution on a scaled down version of the image and then using the lowest distance there as a starting point. Then you can apply a small sliding window to the image at larger scales. This is the same multiresolution intuition behind the image pyramid alignment. You could also try doing convolution in the frequency domain. This part we leave up to you. You would need to take care with the patch boundaries and padding.

The choice of alignment strategy is up to you. If your algorithm is very slow, you may only be able to examine a few images and that's fine, but this will hurt the quality of your results. If you read the paper, you'll see that the best matching images are typically somewhat aligned already, so you may not need to search through the whole image anyway. As a sanity check, you might first create the algorithm with no alignment process.


You'll find everything you'll need in the project 4 data directory: 51 pairs of source images and masks, and for each pair the top 120 or so matching images from a 6.5 million photo database. Currently, there is not a way for you to query the database with your own images to get matching images. However, if you provide us with query images by Friday, March 5th we will retrieve the matching scenes for you.


For a single image completion task, one can generate many possible compositing results derived from the matching scenes. How do you decide which is best? In Scene completion, a user is presented with the 20 best composites as ranked by the cost of the graph cut, the quality of the image alignment, and the distance of scene match. It may be the case that these 20 results contain a diversity of convincing image completions, or it may be the case that they are all terrible. You can develop your own strategy for ranking / choosing results, but make this clear in your write-up and try to show many of the results, including the top ranked result even if it is not good.

Write up

For this project, just like all other projects, you must do a project report in HTML. In the report you will describe your algorithm and any decisions you made to write your algorithm a particular way. Then you will show and discuss the results of your algorithm. Also discuss any extra credit you did. Feel free to add any other information you feel is relevant.

Extra Credit

You are free to do whatever extra credit you can think of, but here are a few examples and they are worth quite a few points.

Graduate Credit

You are required to implement at least 10 points of the above extra credit. In the case of "Intelligently searching images", simply testing multiple scales of the image or applying gradient filters will not earn you graduate credit. You are required to explore, in depth, ways of improving the scene completion and report how what you did affects the results.

Handing in

This is very important as you will lose points if you do not follow instructions. Every time after the first that you do not follow instructions, you will lose 5 points. The folder you hand in must contain the following:

Then run: cs195g_handin proj4
If it is not in your path, you can run it directly: /course/cs195g/bin/cs195g_handin proj4


Final Advice


Project derived from Hays, et al. (website) with permission.

Good Luck!

*You won't actually get an A+ for the course, just 20 points.