CS129 Final Project: Image Analogies

Melissa Byun (mbyun)

Premise

Image Analogies is an algorithm that is used to infer the filter from a training pair and apply it to a second image, rather that attempting to program the individual filters themselves. Supplied with the appropriate learning pair, image analogies allows the user to approximately filter a second image in the same manner. Image analogies covers a broad range of NPR applications, including super-resolution, texture synthesis, artistic filters, texture-by-numbers, and image colorization.

Algorithm

  1. Starting with a pair of images A and A' and a third image B, we first convert the images from RGB to YIQ colorspace. Converting to luminance is used as a feature selector because it is a more descriptive colorspace for human recognition. Using luminance as a metric also allows us to compress the image data from 3 channels to just 1, making it a bit simpler to process, as well.
  2. For every pixel q in image B (in scanline order), find the best matching pixel p in image A. Set the pixel q in image B' to the pixel p in image A'.
Finding the best matching pixel is challenging. The first approach is to find the best approximate match, which is what I did. By looking at the luminance neighborhood around the pixel q, you can use a simple L2 norm between that neighborhood and every possible neighborhood in A, an approximate-nearest-neighbor search.

Alternatively, the paper also describes a method of coherence matching, inspired by Ashikhmin's prior paper. Coherence matching uses the already synthesized portion of B' adjacent to q in order to make a better estimate, based on where those nearby synthesized portions came from. The paper uses both of these matching methods and uses the luminance features to describe an error, which determines which returned point to use.

Discussion

Results are enanced by using a Gaussian pyramid for each construction (as in the paper), but was not used here because results were already very slow to produce (most of the setup is already in the code, however).

ANN was very slow in my implementation - blkproc was not as successful as I would have hoped. Additionally, depending on the size of the neighborhood, results can be noisy (tried running it pixel by pixel vs. 5x5 pixel blocks). This search is also very susceptible to the type of training data given. Despite trying to match up the images A' and B colorwise, the RGB colorspace is too vast and the sampling usually too small.

I probably should have used something object oriented - would have been much simpler in some respects (C++?). Coherence matching and comparison against the approximate match would have made the results a lot better, I think, because it would have improved perceptual distances, even though the matches would be less similar in the L2 norm.

Results

Images

: :: :

: :: :

Results in this image are very noisy because of a lack of coherence, but it is possible to see the general shape of B in B'. The was done by matching ANN, and the second set was done by matching pixels. The ANN search is a little more coherent, but still not very good.

: :: :

Identity - can recreate itself.

March 17, 2011