Project 3 Writeup: Graphcut Images
Basia Korel (bkorel)
February 26, 2010
This project entailed compositing along the minimum seam between two images. Specifically we had to implement an algorithm that finds a seam, or a connected path of low energy pixels, that runs between pairs of pixels; the optimal seam represents the smallest difference between two images. The images are then combined along the optimal seam, with all pixels from one side of the seam taken from the source and all pixels on the other side of the seam taken from the target image. We use a technique called Graphcut, based on the Kwarta paper ("Graphcut Textures: Image and Video Synthesis Using Graph Cuts"), to find the optimal seam. The Graphcut algorithm works by representing the image as a connected graph of nodes, such as in the image below. Edges in the graph represent the difference between neighboring pixels from the source and target. Specifically edges are defined by a matching quality cost M between two adjacent pixels s and t that are from source image A and target image B using the following equation:
![M(s,t,A,B) = [A(s)-B(s)]^{2} + [A(t)-B(t)]^{2}](images/EXTERN_0000.png)
In the graph below, the Patch A node represents pixels that must be taken from source image; edges between node A and the pixels neighboring A are assigned an extremely high cost indicating that these neighboring pixels must also come from the source image. Once this graph has been formulated, we can find the optimal seam by solving the max-flow/min-cut graph problem to find the minimum cost cut of the graph that separates nodes A and B.
Implementation
To constrain each image, a user interactively selects masks for both the source and target image. Any pixels under the mask for each image represent pixels in either node A or node B in the above graph. The non-mask neighbors to mask pixels are constrained as well. All the pixel indices that are not under either mask are considered to be in the "overlap" region in which pixel pairs are compared to find the optimal seam.
The max-flow algorithm was provided for this project, thus the main task at hand was to construct two matrices that represent the image graph of connected pixels. The first matrix A is an m*n x m*n adjacency matrix. Each entry in matrix A is the cost metric M assigned to the directed edge between pixels s and t where s is the row index in A, t is the column index for A and M is the value at A(s,t). Entries in A are only added for pixels that are not under either source or target mask. The second matrix T is an m*n x 2 matrix. Entries in this matrix represent the high cost constraints for each pixel under a mask and the neighboring pixels to the mask. Column one represents pixels constrained to the source image and column two contains pixels constrained to the target image.
In my implementation, three vectors are maintained per matrix which are used to populate each matrix. Row and column vectors contain the index values for the matrix and a value vector that contains the entry of the matrix at row and column. This decision was made for efficiency reasons, and just before calling the maxflow algorithm the sparse matrices A and T are constructed using these vectors. The vectors for matrix A are preallocated to a size of m*n*4, which is the max number of entries A will contain given there are no pixels under a mask and each pixel has 4 neighbors. The vectors for matrix T are preallocated to a size of m*n, which is the max number of entries given all the pixels are under a mask. Prior to entering a loop that populates A and T vectors, four M matrices are constructed that contain the sum of squared difference between the source and target at pixel s and the ssd between the source and target at a left/right/top/bottom neighboring pixel. For instance, to construct the M_left matrix, both the source and target matrices are shifted by one column to the right and the first column is padded with 0's (since calling the Matlab function circshift actually does a circular shift and will wrap the last column to be the first). M_left, containing the cost M between a pixel s and a neighbor pixel t, indexed at s, is computed as:
M_left = sum((source - target).^2, 3) + sum((left_src - left_tar).^2, 3);
To populate A and T entries, my implementation iterates through each pixel index. If a pixel i is under a source or target mask, the value i at index i is added to the T_row vector and the value 1 or 2, depending on if i is from the source or target mask, is added to T_col at index i; the same applies for each neighbor index of i. If a pixel i is not under either mask, for each neighboring pixel that is also not under the mask, an entry is added to each A vector: i to the row vector, the neighboring index to the column vector, and M_left(i) as the value in A at the current index (for the left neighbor).
The value vector for T is constructed after this loop; it length is the number of non-zero entries in the rows vector and all the entries contain a max_int value. The sparse matrices A and T are constructed before passing them into the maxflow algorithm. The resulting labels returned from maxflow() contains a labeling of pixels belonging to either source or target image.
One addition I made in my implementation to improve the end result is to automatically constrain all pixels to the target mask for which there are no pixels in the source image. Because the source image is often smaller in size and a black border is padded around the source during resizing, this may result in black regions to appear in the composited image if the maxflow algorithm labels these "missing" pixels to be taken from the source.
Images
Source Image
|
Target Image
|
Composite Image
|
Composite with Seam
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|