Project 4: Scene Completion

Scene completion using millions of photographs (Hays, et al.; Website).

Due Date: 11:59pm on Friday, March 12th, 2010

Brief

This handout: /course/cs195g/asgn/proj4/handout/
Stencil code: /course/cs195g/asgn/proj4/stencil/
Data: /course/cs195g/asgn/proj4/data/
Handin: cs195g_handin proj4
Required files: README, code/, html/, html/index.html

Requirements

You are required to implement the following:

Scene completion using other images to fill a region

Details

MATLAB stencil code is available in /course/cs195g/asgn/proj4/stencil/. You're free to do this project in whatever language you want, but the TAs are only offering support in MATLAB. The TAs have supplied a few files to get you started.

For this project you will create an image completion method which fills in regions of missing or unwanted image content (this task is sometimes referred to as inpainting, region filling, or hole filling). There are many possible image completion strategies:

Diffusion based methods, which propagate information inward from the boundary of the missing region. You can create your own diffusion method with a simple change to your Poisson blending code.
Texture filling methods, which copy textures from around the missing region. These methods are closely related to pixel-based and patch-based texture synthesis algorithms.
Multiple image methods, which copy information from additional, related photographs (e.g. an overlapping panorama).

Scene completion is an instance of this last approach, except that the similar scenes are found automatically from the Internet.

Scene completion has two steps: first retrieve images that match based on some feature set ("Scene Matching") and second find the best matching region from these retrieved images ("Local Context Matching"). If you're curious on how either of these steps were done by the people involved, you can read the paper or just email the professor.

In this project, we will provide the similar scenes for each image completion task and your job will be to use those similar scenes to make a convincing composite. You will want to leverage all three of your previous projects to align images, find good seams, and blend images. While the graph cut and Poisson blending methods from previous projects can be used with minimal modifications, the pyramid alignment method will need to be extended to handle the following issues:

The incomplete, query image will not be the same resolution as the retrieved, matching scenes. The missing region may be larger than some matching scenes. A simple, band-aid solution for this would be to resize every matching scene to the resolution of the query image.
For this project, the alignment score should not consider the masked (missing) pixels.
Your project 1 code weighted the entire image equally, but for this project it might help to give more weight to the alignment quality around the missing region. Different metrics such as weighting edges, explored for project 1, may be useful here as well.
For project 1, the color channels were the same spatial scale. But in this project, you may find better matches by searching slightly higher or lower resolution versions of the matching scenes.

In common with project 1 is the fact that the images are likely to be roughly aligned already. You may want to bias the search towards small shifts or avoid searching the full range of possible shifts.

Template Matching

An alternative (although related) strategy is not to align the entire images, but instead to do a sliding window search to calculate distance between the "local context" around the hole and the matching image at different offsets. This method will work (Scene Completion uses this approach), but it will be slow if you search the whole image. If you want to run your code on more and larger images, you may need a faster method.

One way to speed this up, in the case of the SSD metric, is to formulate the sliding window distance computations as a series of correlations. This will allow you to use the fast imfilter() function in Matlab or to perform the correlation in the frequency domain using a Fourier transform.

If you examine the sum of squared differences between images A (the matched scene) and B (the template) at offset i, sum((A_i - B).²), you'll see that it can be multiplied out as the sum( A_i.² - 2*A_i.*B + B.² ) for all possible shifts i. Note that all operations are element-wise (e.g. dot products), not matrix multiplications. If we additionally weight each element by a mask, M (where 1 = source, 0 = region to be filled) you get the following: sum( M.*A_i² - M.*2*A_i.*B + M.*B.² ). If we assume that B has already been multiplied by the mask (the missing pixels are zero, etc) this is sum( M.*A_i.² - 2*A_i.*B + B.² ) The reason why we add M is to prevent the masked, missing region from contributing to the SSD. Since B never changes, the B.² term will not affect the location of the minimum offset. What you can do from here is treat the aforementioned formula as the sum of a couple of convolutions.

Using the above formulation with imfilter() will still be slow on large images. So it will be hard to get many results on the test images because they are all large (around 600x800). Other approaches may be necessary to speed up your code. You could try doing convolution on a scaled down version of the image and then using the lowest distance there as a starting point. Then you can apply a small sliding window to the image at larger scales. This is the same multiresolution intuition behind the image pyramid alignment. You could also try doing convolution in the frequency domain. This part we leave up to you. You would need to take care with the patch boundaries and padding.

The choice of alignment strategy is up to you. If your algorithm is very slow, you may only be able to examine a few images and that's fine, but this will hurt the quality of your results. If you read the paper, you'll see that the best matching images are typically somewhat aligned already, so you may not need to search through the whole image anyway. As a sanity check, you might first create the algorithm with no alignment process.

Data

You'll find everything you'll need in the project 4 data directory: 51 pairs of source images and masks, and for each pair the top 120 or so matching images from a 6.5 million photo database. Currently, there is not a way for you to query the database with your own images to get matching images. However, if you provide us with query images by Friday, March 5th we will retrieve the matching scenes for you.

Results

For a single image completion task, one can generate many possible compositing results derived from the matching scenes. How do you decide which is best? In Scene completion, a user is presented with the 20 best composites as ranked by the cost of the graph cut, the quality of the image alignment, and the distance of scene match. It may be the case that these 20 results contain a diversity of convincing image completions, or it may be the case that they are all terrible. You can develop your own strategy for ranking / choosing results, but make this clear in your write-up and try to show many of the results, including the top ranked result even if it is not good.

Write up

For this project, just like all other projects, you must do a project report in HTML. In the report you will describe your algorithm and any decisions you made to write your algorithm a particular way. Then you will show and discuss the results of your algorithm. Also discuss any extra credit you did. Feel free to add any other information you feel is relevant.

Extra Credit

You are free to do whatever extra credit you can think of, but here are a few examples and they are worth quite a few points.

(up to +3) Poisson image fill for all test cases. Compare to scene completion in your writeup.
(up to +5) Intelligent searching of images (multiscale, gradients, mirroring, etc.)
(up to +5) Change the graph cut seam finding heuristic. E.g. try the advanced metric suggested in Graph Cut Texture Synthesis (encourage cuts along edges) and the metric suggested in Scene Completion (encourage cuts through smooth regions), and compare to the simple SSD cost. Feel free to try other heuristics.
(up to +10) Region filling using multiple patches from the same image. Determining the synthesis order is non-trivial. You could choose the unknown pixel with the most neighbors (onion peel order), but exploring other heuristics for order will get you more points. Criminisi, et al. contains one such heuristic. The paper can feel pretty dense, so ask a TA if you want to do the algorithm described in the paper.
(up to +10) A hybrid approach where multiple image completion strategies are used, e.g. diffusion for thin parts of the mask and scene completion to fill in the rest.
(up to +10) Stitching together and blending multiple patches from multiple matching seams to fill a hole.
(A+ for the course)^* Implement, in its entirety, Scene completion using millions of photographs.

Graduate Credit

You are required to implement at least 10 points of the above extra credit. In the case of "Intelligently searching images", simply testing multiple scales of the image or applying gradient filters will not earn you graduate credit. You are required to explore, in depth, ways of improving the scene completion and report how what you did affects the results.

Handing in

This is very important as you will lose points if you do not follow instructions. Every time after the first that you do not follow instructions, you will lose 5 points. The folder you hand in must contain the following:

README - text file containing anything about the project that you want to tell the TAs
code/ - directory containing all your code for this assignment
html/ - directory containing all your html report for this assignment (including images)
html/index.html - home page for your results

Then run: cs195g_handin proj4
If it is not in your path, you can run it directly: /course/cs195g/bin/cs195g_handin proj4

Rubric

+80 pts: Scene Completion by compositing matching scenes
+20 pts: Write up
+20 pts: Extra credit (up to twenty points)
-5*n pts: Lose 5 points for every time (after the first) you do not follow the instructions for the hand in format

Final Advice

Get started very early on this. If your implementation is slow you'll never get results
This is a relatively recent area of research. Don't expect things to work great, even if you're using reasonable approaches. If you think of something new, it could turn into something publishable!

Credits

Project derived from Hays, et al. (website) with permission.

Good Luck!

^*You won't actually get an A+ for the course, just 20 points.