Project 3: Image Completion

Patch-based image completion in the spirit of Criminisi et al. with graph cut seam finding to composite texture patches (Kwatra et al., website).

Due Date: 11:59pm on Wednesday, March 2nd, 2011

Brief

This handout: /course/cs129/asgn/proj3/handout/
Stencil code: /course/cs129/asgn/proj3/stencil/
Data: /course/cs129/asgn/proj3/data/
Handin: cs129_handin proj3
Required files: README, code/, html/, html/index.html

Requirements

You are required to improve the baseline image completion algorithm provided by the stencil code. You must also apply your algorithm to several of your own images to get full credit.

Details

MATLAB stencil code is available in /course/cs129/asgn/proj3/stencil/. You're free to do this project in whatever language you want, but the TAs are only offering support in MATLAB. The stencil code for this project is unusually extensive, offering a (poorly) working image completion algorithm which you will have to improve in specific ways. One of the motivations of this extensive starter code is to give you an algorithm that runs reasonably fast, thus encouraging experimentation with different improvements. There are numerous test cases (photo + mask pairs) provided from the Scene Completion test set, as well as some synthetic test cases, and you can make your own test cases using the interactive function getmask.m.

Image Completion Overview

Image completion is one of the most elementary yet challenging image manipulation operations. Numerous photo editing operations (deleting, moving, or resizing an object; removing imaging artifacts; filling gaps in panoramas) require us to convincingly hallucinate image data. Research on image completion has been active since 2000, sometimes under names such as inpainting and hole filling, and in the last year mainstream tools such as Photoshop have offered their own implementations.

This project will develop a fairly standard greedy image completion pipeline as in Criminisi et al. The algorithm will incrementally fill the incomplete image region using patches of texture found elsewhere in the image. The inner loop of such an algorithm must perform three critical operations:

1) Deciding which region to fill next.
2) Deciding which image patch to use to fill that region.
3) Compositing the matched patch into the partially filled region.

The focus of Criminisi et al. is largely on operation 1. The focus of Kwatra et al. is largely on operation 3. We will combine these approaches. Unlike Criminisi et al. which uses tiny image patches, we will use relatively large image patches and composite them with the graph cut seam finding operation of Kwatra et al. We will address each of these steps in detail below.

1. Hole-filling Priority

Deciding the fill order is a surprisingly challenging aspect of image completion. This is because the fill order has a strong influence on the layout of synthesized scene elements. How do we pick which region to fill at any given iteration?

One reasonable constraint is to only fill regions on the border of the hole -- you certainly wouldn't want to try and fill a region that was completely masked (you'd have no known pixels to condition the search on, so it would basically amount to picking a random patch) and you wouldn't want to fill a region that was completely known. The baseline method in the starter code picks the image region that is most complete but whose center pixel is unknown.

Another baseline, sometimes referred to as "concentric" or "onion peel" ordering, is to fill regions in order based on their distances from the initial boundary. As this ordering is a simple function of the initial mask, it doesn't need to be computed in the main loop. It can instead be predetermined.

Criminisi fill priority terms

However, as shown by Criminisi et al. (see Section II), one can improve image completion quality significantly with an image aware priority algorithm. Specifically, Criminisi et al. prioritizes the filling such that structures are propagated first. The structures we're interested in are lines (isophotes) abutting the incomplete region. These lines might correspond to edges of buildings, roads, or the image horizon. Regardless, it is perceptually important that these features remain unbroken and synthesizing them first encourages this.

More formally, Criminisi et al. define the patch priority as the product of two terms, C(p) and D(p), where C(p), the "confidence" term, indicates how complete and trustworthy the neighborhood around p is, and D(p), the "data" term, indicates how much region p contains linear structures pointed towards the unknown region boundary. For each hole-filling iteration, the priority is computed for all potential image fill regions and the region with the highest priority is selected. See Figure 6 in Criminisi et al. for a visualization of these two terms.

C(p) is defined within the entire unknown region. Criminisi et al. define C(p) to encourages concentric ordering, but their method requires storing and updating additional variables. One simple alternative is to define C(p) as the proportion of already known pixels for each possible patch center p (exactly what the starter code does).

D(p) is only defined along the image boundary. It can be computed as the magnitude of the dot product between the boundary direction and the direction of maximum image gradient. See Figure 5 in Criminisi et al. Measuring the image gradient near the boundary is tricky -- don't let the boundary itself influence your computations. You should only trust measurements over neighborhoods that are entirely known (not under the mask).

For this project, graduate students are required to implement some form of image-aware, structure-based hole filling priority and compare it to the baseline priority algorithm. Undergraduate students may do so for extra credit.

2. Search for Matching Patch

Finding the most appropriate matching patch for a query region is fundamentally hard -- it is implicitly a "vision hard" recognition problem. But within the restricted domain of a single photograph SSD matching can do quite well. The baseline search method implemented in the stencil code is a coarse-to-fine SSD search just like project 1. You are not required to improve this aspect of the algorithm, although there are possible improvements to make (see extra credit).

3. Compositing the Patch to fill the Hole

Each iteration, your image completion algorithm must composite a matching patch and an incomplete query region to fill in missing image data. The simplest way to do this is to copy pixels from the patch in to the hole, unmodified, and make no changes to the known pixels in the query region. This is the strategy adopted by Criminisi et al. and the starter code. It sometimes works for easy test cases and small patch sizes, but you can do better! Larger patch sizes and more sophisticated compositing algorithms will produce more convincing results.

If you've taken a graphics course at Brown, you've probably heard of Seam Carving. You may have even implemented it. That algorithm uses dynamic programming to find the minimum error seam of an image along a specific dimension, and then remove it. Dynamic programming has been used in texture synthesis, most famously in Image Quilting, but you're in for a world of hurt if you try to use dynamic programming for image completion. It is difficult to use dynamic programming to find optimal seams when you need to extract closed regions or when the topology of the constraints (which pixels are known and which are missing) is complex.

We take inspiration from texture synthesis method Graphcut Textures (website), which extends textures by overlaying patches and finding the minimum error seam with a max-flow/min-cut operation on a specially constructed graph. It might be easier to think of this graph cut operation less as a seam finding process and more as a labeling process. The output of this labeling process will tell us, at every overlapped pixel, whether we use the existing pixel or we keep the pixel from the matched patch. Depending on the shape of the known and unknown regions and depending on how you construct the graph the solution to this labeling might not be a closed contour, but that's fine.

The graph cut compositing algorithm works by creating a (mostly) planar graph, with a node corresponding to each pixel, and with weighted edges connecting all 4-way neighbors (Kwatra et al. calls these "constraint arcs"). The min-cut through this graph will directly imply the labeling / seam.

The neighbor edge weights encode how important it is for two neighbors to share the same label. For instance, if an edge weight is small, that means the min-cut can break the edge and pay little penalty. If the edge weight is large, the min-cut is likely to go between other nodes, instead. When should the weight be small? When a seam passing between two pixels wouldn't be perceptible, because the two images agree at that location.

As with all max-flow/min-cut formulations, there are two special terminal nodes -- the source and the sink. Pixels that must come from the matched patch (e.g. the pixels in the unknown region) are connected to one of these terminal nodes with very high weight, and pixels that must come from the existing, known pixels (e.g. those under the patch boundary) are connected to the other terminal node. For these special cases the label is predetermined, but for all of the other pixels the graph cut algorithm will tell us which label to use.

Kwatra et al. suggests making edge weight proportional to the following sum of squared difference formulation:

s = current pixel coordinate
t = neighboring pixel coordinate
A = first image
B = second image

You will need to represent these edge weights in an adjacency matrix somewhat reminiscent of what you used in project 2 -- the matrix will have patch_width^2 rows and patch_width^2 columns, and each entry will encode the affinity or edge weight between two pixels. But the vast majority of entries will be zero -- only neighboring pixels will have non-zero affinity.

You will also need to represent the source and sink edge weights in this problem (sometimes referred to as unary weights). One solution is to use a mask that signifies that any pixel under the mask must come from that image. The terminal weight for those pixels is set to infinity (or a large value). Once formulated, you can solve the max-flow/min-cut problem and the output should be a labeling of the pixels as belonging to one side or the other. This is effectively a new mask for the image that you can use for compositing.

The stencil includes a mex compiled max-flow/min-cut algorithm that runs fast and requires a specific type of input. The source code for this can be found here: Maxflow. You are not responsible for implementing a max-flow/min-cut algorithm; this course is not cs157

Other Avenues for Improvement (Possible Extra Credit)

There are a litany of possible improvements over the baseline algorithm beyond the required improvements (graph cut compositing for undergrads, and additionally image-aware prioritization for grad students). Here are some suggestions, broken down by which part of the algorithm they (hopefully) improve.

0. Miscellaneous Improvements

Vary patch size instead of having it be predefined and constant. The starter code includes a simple version of this -- randomizing the patch size with each iteration. This seems to improve results by suppressing grid artifacts caused by a fixed patch size. But you can go beyond this. Maybe the patch size should be proportional to the hole size? Maybe it should be related to the image content -- are there fine structures which need small patches, or are there large structures which large patches would better preserve?
Add manual intervention to improve the results. For instance, the interface in Sun et al. lets users draw lines which indicate where the algorithm is allowed to search for matching textures.

1. Improvements to Priority

Both the baseline and Criminisi et al. priority algorithms are fairly local, greedy heuristics. You could try to explicitly detect, extend, and even connect linear structures in the image and synthesize those first. This might resemble the Sun et al. method mentioned above.
In addition to implementing the image-aware "data" term described in Criminisi et al., you could implement the decaying "confidence" term as described in Section 3 of the paper. Their notion of confidence is more than just how "complete" a neighborhood is -- it also encodes how much error the matching process incurred.
Instead of making decisions one patch at a time, you could try to incorporate some form of look-ahead or backtracking.

2. Improvements in Search

Use textures from other images, not just the input image. Last year we had a Scene Completion project. This year you can get extra credit by using matching scenes in addition to (or instead of) the input image texture. The starter code can point you to pre-computed matching scenes.
Instead of using SSD distance in RGB color space, you could try other features such as the SIFT or HoG descriptors which work well for recognition tasks.
Regardless of feature space, it is often the case that an image simply doesn't contain appropriate texture to fill a hole. Instead of trying to find such texture in other images (Scene Completion), you could enrich your input texture by mirroring the input image and searching at different scales and rotations.
On the other hand, it can sometimes help to restrict the search. In many natural scenes, it is safe to move texture horizontally but not vertically. Why? Because content along an image row is likely to be at similar depth and scale. You could restrict your search to a limited vertical range, or you could add a more continuous penalty based on y distance.
The baseline coarse-to-fine search is very greedy -- it finds a best match at low resolution, then searches only that neighborhood at higher resolutions. There is no guarantee that the best matching high-resolution patch will be found. Doing the patch search at full resolution is slow. You can do the search in the Fourier domain, as described in Section 5 of Kwatra et al, but this presents its own difficulties (boundary effects, slow FFT and inverse FFT). You might implement other search strategies based on principles such as locality (useful texture tends to be nearby in image space) and coherence (useful texture tends to be near where your neighboring texture patches came from).

3. Improvements in Compositing

You can help hide seams by feathering -- a localized averaging along the boundary.
Graph cut seam finding helps, especially in textured regions, but for smooth regions the color differences are impossible to hide. Poisson blending, as you implemented for project 2, can help. You could apply it after each graph cut, but it might be better to apply a single Poisson operation at the very end of the process. You'll need to keep track of all of the still-visible graph cut seams, and force the gradient across all seams to be zero.
Along those same lines, in each iteration the compositing assumes that the existing source texture and the matched patch texture are internally seamless. But the existing texture can have seams left over from previous compositing operations, and you could keep track of that and encourage the graph cut labeling to cover old seams. See section 3.1 in Kwatra et al.
You can use a different edge weight metric than the one shown above. The graph cut paper offers an alternative (Equation 5) which encourages transitions along existing edges. The intuition is that we're not very sensitive to the magnitude of edges, so long as it is reasonable for and edge to exist. On the other hand, Scene Completion Section 4 suggests the opposite strategy -- cut through smooth regions, because Poisson blending will be able to hide those differences, but Poisson blending leaves artifacts when edges are crossed.

Write up

For this project, just like all other projects, you must do a project report in HTML. In the report you will describe your algorithm and any decisions you made to write your algorithm a particular way. Then you will show and discuss the results of your algorithm. Also discuss any extra credit you did. Feel free to add any other information you feel is relevant.

Graduate Credit

As stated above, graduate students must implement an image-aware priority algorithm.

Handing in

This is very important as you will lose points if you do not follow instructions. Every time after the first that you do not follow instructions, you will lose 5 points. The folder you hand in must contain the following:

README - text file containing anything about the project that you want to tell the TAs
code/ - directory containing all your code for this assignment
html/ - directory containing all your html report for this assignment (including images)
html/index.html - home page for your results

Then run: cs129_handin proj3
If it is not in your path, you can run it directly: /course/cs129/bin/cs129_handin proj3

Rubric

+70 pts: Graphcut implementation
+10 pts: Using at least 5 new images you have taken
+20 pts: Write up
+15 pts: Extra credit (up to fifteen points)

Final Advice

Just like Poisson blending, a naive implementation will run slowly because of indexing into a sparse matrix. Test your algorithm on smaller images first.
That part of the algorithm can be made significantly faster by finding all the adjacency matrix values ahead of time and then constructing the sparse matrix; see sparse(i,j,s,m,n,nzmax)
Don't spend too much time trying to adjust free parameters of the algorithm (e.g. patch size) to try and improve results. Image completion in real scenes is very hard, and your algorithm will fail most of the time unless you do an exceptional job.

Credits

Project inspired by Criminisi et al. and Kwatra, et al.. Project description and code by James Hays and Pat Doran.