Final Project: Enhanced Searching for Image Completion
Yun Zhang (yunzhang)
May 18, 2011
Introduction
This project is an extention of Project 3: Image Completion. Instead of focusing on improvement of filling order and compositing, we explore several ways of improving the patch matching result. The things we tried here include:
- Better representation
We implemented SIFT and HOG descriptor which encode information in gradient domain, and use it as a complement to color based representation.
- More visual diversity
We use technique like mirroring to introduce more visual content in our search space.
- Constraint search location
We learned the fact that more probable search candidate doesn't not appear at each location in equal chance. Based on this assumption, we penalize those search candidates which are too far from the filling region.
We still constraint our search space within the single input image rather than using Internet-scale image databse. However, we will do some transformation to the original image to improve the visual diversity.
Representation enhancement
We implemented HOG descritpor here and used a 3rd party SIFT descriptor package here, trying to adding gradient information to our image representation. Difference between these two descriptor is SIFT is more rotation invariant. Thi means that by using SIFT descriptor, we cannot directly use the search result as the input in compositng step because it may not be in a right angle. Thus we need to apply another transformation to adjust the pose of the matching patch. It is not easy to figure out what the right angle is in this case, as result we use HOG in our final submission.
Use HOG only
In our first try, we use HOG only as our patch representation. which gives very upset results. For example, in one of our test cases which shows a Cathedral, HOG descriptor sometimes failed to distinguish between building and bushes. This is because they have similar gradient distributions although the colors are completely different.
Combine HOG with RGB
We finally implemented a strategy as follow:
- Use pyramid to do KNN search over RGB channel, instead of using K=1 in project 3, we use K=5 here
- For each of the search candidate, we calculate the HOG descriptor and find the one with shortest HOG distance, use which as best match.
By using this strategy, we put more weigh on RGB. This is beause our poisson blending in compositing step can help reduce the gradient discontinuity thus matching in color is the most important thing here.
Visual diversity enhancement
We learned the fact that symmetry is ubiquitous in the real world. In many cases, what we want to do is to fill in one side of a symmetric pattern where the other side is available. We did two mirroring in our implementation, both horizontally and vertically. After several experiment, we realize that symmetry in vertical direction is not so strict most of the times. For instance, many of the symmetries are caused by reflection of water surface. In those case people do not expect to see exactly the same texture on both side. Furthermore, applying a upside-down patch incorrectly will bring in strong artifact. Thus, we finally discards the up-down flipping and use mirroring only in horizontal direction. we will see in results section that this help recover symmetric patterns in many cases.
Searching constraints
This enhancement is based on the assumption that it is more probable to find an good match at the simple depth and scale of the image. We first tried a DOF reconstruction method described in Derek Hoiem's Photo popup paper. We wanted to constraint our search only at locations with the similar depth. However, recovering DOF from single still image is a difficult task itself, we didn't get enough promising results from this approach. We ended up with considering only the horizontal distance between match patches and fill patch. Our implementation describes as below:
- In step 2 of algorithm above in "Combining HOG with RGB" section, we calculate another type of error, which is the distance in Y direction between filling patch and matching patch
- We combine normalized HOG distance and Y-direction distance using the parametric linear equation:
dist = w * normlized(hog_dist) + (1-w) * normalized(y_dist)
where we tried several w values ranging from 0-1, and finally assigned w=0.5
Results
The left ones are original results of our project 3, the right ones are results after searching improvements.
Our gradient based information help finding matches with the edge in this case.
We see the symmetric pattern is recovered and gradient based encoding help propergate the structure of the Cathedral.
Again the symmetric pattern is recovered, we can see the decoration on the corner is correctly added.
Searching over constraints helps maitain the gradient of original image.
Ridge of the mountain is correctly recovered.
Pattern of the beach line are recovered.
Failure cases and discussion
We realize that constrainting our search will not always give better result. There are some cases that best matches occur far away from the filling region.
In the following case, it is expected to use the patches of the sea. However, because the sea patches are too far away from the filling regions in Y direction, bushes are incorrectly introduced. By using only the color cue, we can recover the image more reasonably.