Image Analogies

Lyn Fong (lfong) - May 17 2010

Description

Analogies allow people to describe things by their relationships. While computers can handle mathematical relationships, many of what humans consider relationships are more vague and beyond the capacity of a machine to understand. However, it is a very natural and convenient way to describe things. The goal of image analogies is to take two input images, A and A', and use their relationship to synthesis an image B', given the first pair. In other words we are given A:A':: B:_ and we want A:A':: B:B' For example, if I wanted to produce water color paintings, i would take a photo of an apple, produce a water color painting of it, and then get a photo of something i wanted painted, liek a bunny, and i would get a watercolor bunny! Uses also include texture synthesis, texture transfer, texture by numbers, and super resolution.

Algorithm

Given a pair of images for training data, A and A', and B, the input image we are trying to find an analogy for, we begin by converting the image from RGB to YIQ color space. The reason for this is that humans are sensitive to luminance (the Y channel for YIQ), and using just luminance as a descriptor is more efficient than using three numbers to describe each pixel. In addition, there is too much variance if all three colors are used, which makes it harder to compare windows of pixels from one image to another.

Next a gaussian pyramid is constructed. Infomation from smaller scales in the pyramid will be used to weigh in on the synthesis of the next level. The following steps are done for each level of the pyramid, starting from the coarsest.

The next step is a finding feature vectors for every pixel. These feature vectors are essentially 5 x 5 windows around a pixel, containing the luminance information. The vectors are found for A and B images, and used to initialize an approximate nearest neighbors algorithm, which will be needed later.

Then, for every pixel in B, we find a best match in A. We will then use this best match position to find information at the same index in A' and transfer relevant information over to B'. The best match is the core of the algorithm and consists of a competition between approximate nearest neighbors(ANN) and coherence.

Approximate Nearest Neighbors

The first method for finding the best match is to simply take a 5x5 window around the pixel being synthesized from image B, and then find a 5x5 window that is most like it in from image A in L2. For this an ANN algorithm was used. I used a mex compiled version of an implementation by David Mount and Sunil Arya.

Coherence

Since the L2 norm is an imperfect measure of perceptual similarity, coherence was also used. In coherence, we look at every synthesized pixel in a 5 x 5 window around the pixel we are synthesizing in B'. For each pixel in the window, we find it's original position in the A Image, and the position of the pixel being synthesized relative to it. We then get a measure for how well that location fits the already synthesized portion of the image. The distance function used for this step is explained below. Once we have looked at all synthesized pixels, we pick the location that returned the smallest distance (best match). See the original Image Analogies paper by Hertzmann et. al. for the mathy formulation of coherence.

Distance

To measure how well a position worked for the analogy, the following metric was used: the squared difference between a 5x5 window at the guessed pixel location in A was compared with a 5x5 window of the position of the pixel being synthesized in B, with a Gaussian kernal providing weights. The same was done on for A' and B', although positions in B' where the pixels had not yet been synthesized were ignored. When multiple scales were used, a 3x3 window was used around the immediately coarser scale. All these squared differences were then summed up, resulting in a total squared distance across original images A and B, the analogies A' and B', and the coarser scales.

Finally the distance for the positions from ANN and coherence was calculated using the above distance metric. A weight was applied to the distance from ANN, because coherence generally returns a result with a greater distance but which makes more perceptual sense, thus we want to bias our choice towards coherence. The final position was the minimum of dcoh and k*dann, where dcoh is the error of the position given by coherence, and dann is the error of the position given by ANN, and k is a weight.

Once a position from A was chosen for synthesizing the new pixel, luminance information from A' in that position was copied to B'. In cases where texture synthesis was being done, the I and Q values were also copied since color information was required.

Results

In general it is not too difficult to get reasonable looking results, but it takes a bit of "getting the hang of." Image Analogies is really just texture synthesis used in a clever way. The limitations of this is however, that certain information needs to be provided in the original pair, A and A', in order to produce a reasonable B'. Figuring out what makes a good training pair can be tricky.

Another note of interest is that while the paper uses a gaussian pyramid, I found that my single scale results were not significantly worse than multiscale. In some cases, it was unclear which was better.

Blur

These were done mostly as tests to make sure the implementation was working. The following uses a blur Photoshop on a rose, and attempts to reconstruct it with a dandelion. Also provided are the results using just ANN, and single scale with coherence.

A A'
B B'
ANN only Single Level

Emboss

These were also done as implementation tests. The following uses an emboss Photoshop on a rose, and attempts to reconstruct it with a dandelion. Also provided are the results using just ANN, and single scale with coherence.

A A'
B B'
ANN only Single Level

Texture Synthesis

For texture synthesis i tried synthesis on a scaled down version of a texture as well as the original size to see if the size affects quality, It does not appear to. The A and B images are just blank images. In the implementation, my ANN kept giving me the same index, since they all had the same value (blank). Instead of using ANN in this case, i just randomly chose an index. This produced significantly better results.
Original Synthesized

Super-resolution

Super Resolution probably wins the "sketchiest" award for Image Analogies. It only seems to work well on images with repeated textures. A portion of the image is provided in low resolution and then in high. This is used to superres the whole image from low res to hi res. The small sample needs to have sufficient information. In images lacking in repeated textures, the results are disastrous (see second result). Also shown are the ANN-only and single level results. Super resolution was one of those cases where i could not say with any assurance that the multiscale result was better than the single scale. The suggested use was as a compression scheme, rather than to superres arbitrary low res images. This is probably because of how specific the training data needs to be.

A A'
B B'
ANN only Single Level

The following test had a lack of repeated texture elements and the results were incoherent.

A A'
B B'
ANN only Single Level

To confirm my suspicions about the necessity of repeating textures, i tried on a brick texture. Suspicion confirmed.

A A'
B B'
ANN only Single Level

Texture Transfer

Texture transfer was another one of the finicky ones. The following is one of the test cases from the original paper.
A A'
B B'
And then i tried it on my own test:
A A'
B B'
I believe the problem is that there was a lack of range in the luminance of the original texture, providing insufficient information. Adding luminance remapping and artifically creating a gradient in Photoshop produced significantly better results.
A A'
B B'
Another problem i found was choosing the weight for the coherence distance. Guessing was a pain since it image analogies is quite slow.

Artistic Filters

Not much to say here, these were definitely the most rewarding. The fishing boats were test cases from the original paper, the bunny was mine. Luminance remapping was used for artistic filters, just as in the original. Line art was not done because i did not want to restructure my code to work with larger feature vectors.
A A'
B B'
B B'
A A'
B B'
B B'

Texture By Numbers

I found texture by numbers to be a bit picky about the coherence weight as well. Picking the wrong weight would result in the wrong texture eating its way in where it should not. An improvement would have been to lower the weight on an edge in the input image B. I didn't get around to trying that though.
A A'
B B'
B B'
A A'
B B'
B B'

References

Image Analogies by Aaron Hertzmann, Charles E. Jacobs, Nuria Oliver, Brian Curless, and David H. Salesin