CS1290 Final Project: Image Quilting from Matching Scenes

Geoffrey Sun (lbsun)

May 18, 2011

Image quilting is first introduced by Efros and Freeman in [1]. The idea is to synthesize a novel image with a different appearance by transfering texture from another source image via seam finding among source patches. The algorithm works very well for texture synthesis as well as texture transfer. However, most of the results shown in [1] deal with stationary textures or simple images. I consider the problem of transfering texture between natural images, which can be much more complicated due to the diversity in texture appearance, scale and structure. In project, I explore how the method in [1] can be applied to structured scenes.

Problem Statement

An image is first downsampled and upsampled (8x) to form a blurry texture-less image A. This image contains structural information such as edges and contours, as well as colors. However, textures are generally removed by this destructive process. Our goal is to inject appropriate texture from a source image B into the A such that it looks realistic and remains semantically similar to the original image, namely, a mountain is still a mountain, a building still appears as a building. We do not require complete faithfulness to the original image. For example, it is acceptable to insert a tree if appropriate. This defers from the task of super-resolution in that we do not consider the reconstruction error: the output image does not need to downsample to image A.

Overview

A key difference here from [1] is that we require appropriate textures to be inserted, which means transfering textures from an orange to a potato is not desired (but considered acceptable and interesting in [1]). This requires the algorithm to make some decisions about what to insert, given a large pool of possible patches from B for a given location in A. Clearly, this calls for some similarity between A and B: if A is a beach scene, B better not be an office. To ensure appropriate textures exist in B, it is natural to consider matching scenes for image A. In this project, all source images are the top scene matches for the input image (at low resolution) following [2].

Features and Representations

Texture synthesis algorithms often operate at the patch level and represent each patch by pixel intensities. In this project, image A is deprived of high frequencies, which makes the patch level correspondence ambiguous. As a result, I use pixel intensities (low frequency only) and HOG features computed over increasingly larger neighborhoods around a given patch. HOG features capture contextual information and helps identify correct image content for insertion. Given a patch P from image A, we can find the nearest neighbor patch Q in B based on L2 distance calculated from a weighted sum of the two feature components. To further exploit the fact that A and B often share similar layout, I constrain the search by the relative y-coordinate of Q so that P and Q differ by less than 20% in terms of relative height.

Image Quilting via Scanline Order Synthesis

The algorithm proceeds in scanline order, first picking a patch based on low frequency features alone. Then the next patch, half-overlapping with its left or top neighbor, will be chosen by considering both the low frequency constraint from image A as well as the high frequency constraint from existing patches in the neighborhood. Due to the scanline ordering, neighbor patches will be a subset of top, left, top-left patches. The distance now becomes a weighted sum of low frequency error (datacost) and high frequency overlap error (smoothness cost). The latter is simply the SSD error for overlapping pixels in the patches.

Since patches are half-overlapping, most patches will be conditioning on 3 patches. The pixels in such overlapping regions cannot be simply averaged to serve as the smoothness constraint, since averaging produce overly blurry patches and defeats the purpose of conditioning on high frequency information. Instead, when each patch is chosen, graphcut and poisson blending is applied to produce an intermediate image that is sharp, which gives useful high frequency information for smoothness cost to be computed.

Results

It turns out natural textures as such as trees, grass and mountain can be well textured using this technique. However, highly structured scenes and man-made scenes are much harder to work with. For all resuls below, I show the input image at 128max resolution, then each row is scene match, output, followed by visualization showing all patches being averaged. It is always smoother but sometimes might be more visually pleasing, since it hides some unwanted high frequency textures.

Natural Scenes

bandlands
coast
islet

Manmade Scenes

lighthouse
monastery

badlands

In general, the results here are pretty good, many cases the algorithm produced rather realistic images, except for the messed up patches in the sky in several cases. It does however really generate an appearance that is consistent with the scene match image, for example, inserting snow, pine trees, etc.


some better ones:

failures:

back to results

coast

when the scene match is good, the results are quite decent.


some better ones:

failures:

grass textures were inserted over water. however, this is inevitable because the algorithm has no other choice given the scene match.

back to results

islet


some better ones:

failures:

grass textures were inserted over water. this is hard for the algorithm becauss grass is too small to show up as a low frequency signal. but still it looks acceptable.

back to results

lighthouse

the first result is interesting: the scene match does not offer any dark colored patches on top of the lighthouse and the algorithm does not know the top of the lighthouse in the scene match is actually the right texture to use. however it does a good job placing a window (corner-y) patch at the right location. the failure case is also interesting: the algorithm picks cloud patches to construct a lighthouse shape.


some better ones:

failures:

back to results

monastery

the first scene match provides great textures and structures to be copied over. but there is no 'roof against sky' patch for the algorithm to use, so the roof top in the output is messy.


some better ones:

failures:

even when the scene match does not provide sufficient texture resources, the algorithm does a fine job propagating straightlines and sharp edges when possible.

back to results

Discussion

It is rather clear that scene matches often provide the ideal textures to be copied over. However, there are at least two shortcomings we can observe from the results: (1) structured scenes cannot be faithfully reproduced easily with this scanline setup, (2) a single scene match might be insufficient to provide all the desired textures and edge variations for a given input image.

For the first problem, one can consider larger patches and more complicated setup such as MRF to reach a more global agreement. The scanline order dependency is clearly bad in terms of preserving structure.

For the second problem, one easy fix is to simply use many scene match images as patch resources. However, neighboring patches should be chosen from the same image to avoid incoherent appearance, since all scene matches might appear very different while having similar structures. Another fix is to merge these outputs via graphcut to form a single output, picking the best regions in each.

References