Automated
Panorama Stitching
By
Yan Li (yanli)


Introduction
The purpose of this project is to automatically stitch two
or more images into one panorama. Each image should be related with each other
with some repeated local features. Compact camera FOV is 50x30 degree, while
our humans FOV is 200x135 degree. That means, what we capture using camera is
not the same with what we see in terms of the spatial range. However, we can take
more images from different orientation to the same objects, and then use panorama
stitching to stitch each images together to form a panorama. The task is not so
hard because what we need to do is to simply find the repeated local features
and then use these features to find a projective mapping matrix, transforming
one image on top of the other one. If we have more than two images, we just
stitch first two together and then stitch other images to the panorama already
generated.
My Works
·
Detecting and matching
feature points
·
Robustly recovering homographies
·
Creating at least 2
unique panoramas from my own images
·
Add rotation invariance
to the descriptors ( Extra Credit )
·
Graph cut & Poisson
blend ( Extra Credit )
·
Automatic Panorama
recognition ( Extra Credit )
Brief View of Algorithm
There are six steps in the project. The first and foremost
one, is to detect the feature in each image. We use Harris Detector to detect
all the corners and use them as a local feature. The second step is to do
non-maximal suppression. Non-maximal suppression is aiming at preserve those
local features that are not dense. We do that to reduce the number of feature
points, which can speed up the processing later. The third step is to build
descriptor for each feature points. We cannot simply use the color of feature
points as the feature descriptor since it’s hard to match them. We use a 40*40
square patch around the feature points and sample it into 8*8 patch. We use
this 8*8 patch as the feature descriptor. The fourth step is to compute the
distance between features in two images. We find the lowest SSD match for each
feature, and use Lowe’s ratio test to
exclude those feature points that don’t appear in both image. The fifth step, to
make the processing more robust, we use RANSAC
algorithm to compute the projective matrix. At last, we do compositing using
Graph cut or Poisson blend to make the result image seamless.
Step 1: Use Harris
detector to detect corners

Step2: Non-maximal suppression

Step3: Build
features

Step4: Extract
Features:

Step5: RANSAC (It’s
hard to visualize the process)
Step6: Compositing:

Result Images
Here I want to show some
of my results. The test cases are either from my camera or already given by the
project.



















Rotation invariance
I have tried adding
rotation invariance to Harris detector. Instead of sampling a 40x40 square
region, I use the gradient orientation and rotate the square region for some
degrees. Because the pixels in the square region are no longer integer value, I
use interp2 to interpolate the pixel values.

Poisson Blend & Graph Cut
I have tried two
ways to make the result image seamless, it turns out the Poisson Blend is
better. Graph cut cannot deal with the situation that the brightness is
different in the original images. Here I made a comparison for Poisson Blend,
Graph Cut and naïve way of compositing. The left column shows the results of naïve
way compositing, the middle column shows the results of Poisson Blend, and the
right column shows the results of Graph Cut.



Panorama Recognition.
I implement two ways of panorama recognition.
The strategy of my first method is to firstly
load all the images into memory, and do all the steps except for compositing. I
then set a threshold of the inliers to determine which images can form a
panorama. Those images that are possible components of one panorama should have
inliers larger than the threshold.
This way is robust,
but very time-consuming, since for each image, we must traverse all the other
image in one directory, and do feature detection, non-maximal suppression,
building feature descriptor and RANSAC. Suppose we have 50 images available,
and we must firstly do all the steps above for 50*50 = 2500 times.
Even though this
method cost too many time, the recognition results turn to be very accurate,
because it’s based on local feature comparison between images.
The strategy of my
second method is to incorporate global matching metrics. I use tiny image of
size 64x64 or 128x128 to be the feature descriptor of each image.
For each image, I compute the SSD with
all the other images, and then I build n pairs of image, where n is the number
of images. The pair contains the two images that have lowest SSD value. I
assume that one pair of image can form a panorama.
After I get the
pairs, I do the steps above and check if the number of inliers between two images
in a pair is large enough. If not, then this pair is a false pair and I just
throw them out.
The second strategy
is not robust since the global feature is not reliable, because one image may
be totally different from the other one, but they can form a panorama because
they have some repeated local features near the boundary. However, this method
is very fast. The main enhancement of speed is using tiny images, and computing
SSD is very also fast.