Project Overview
This project is motivated by the problem of global image geolocation. Specifically given an input image, estimate its geographic location by scene matching against a database of millions of geo-tagged photos taken from the Internet. Obtaining geolocation estimates is useful for numerous geographic information applications such as estimating population density, cultural differences (1) and land coverage. This project is based off of the IM2GPS (2) paper; in this effort many of the test input images contain a mixture of scenes with landmarks, of generic scenes that provide little geographic information and of randomly sampled photos. What the majority of these images have in common is they represent what the scene
should actually look like. That is, "this is what the picture of this scene would look like given your average day" (no parades on the streets of Paris, a sand storm in the Sahara or a giant cruise ship blocking the view of a beautiful port city in the Greek isles). This motivated me to consider an approach to IM2GPS for images that contain "noise"; in this context I will define noise as people, cars or any other object that is not a permanent part of the scene.
Given a set of such input images, I investigated the problem of segmenting out the non-permanent portions of a scene (which is user-specified) and filling in the missing region to try and improve the accuracy of IM2GPS. I use two different image completion techniques to generate a photo of what the scene might look like without the person/car/object: scene completion (3) and inpainting (4). Scene completion is also a data-driven approach that uses other semantically similar images to composite the input, while inpainting uses existent content from the photo to fill in the hole. I analyze and compare the results of running IM2GPS on: the original input, the scene completed input and the inpainted input images.
This process is dependent on the user to define the region to be filled in, and also to select the appropriate composited photo after scene completion that could resemble what the true scene looks like without the person or car. Ideally, an enhanced IM2GPS application would automatically perform image segmentation through object detection and a higher-level understanding of what objects or portions of the scene should be eliminated from the photo, and image completing the scene.
Approach
IM2GPS
IM2GPS estimates the geographic location of an input image using a data-driven scene matching approach. The approach leverages 6+ million geo-tagged photos collect from Flickr.
In order to perform scene matching of images, different feature descriptors are extracted from the photos to determine how semantically similar two images are.
Although the implementation uses a combination of many feature descriptors, the following are the main feature descriptors used:
-
Tiny Images: 16 x 16 color images.
-
Color histograms: Joint histograms of color in L*a*b* color space.
-
Texton histograms: Build texton dictionary using a bank of filters.
-
Line Features: Histogram of statistics of line segments in the image.
-
Gist Descriptor + Color: A measure of how much edge energy is in the scene, taken at different frequencies and orientations.
-
Geometric Context: The probability for any segment in the image to be ground sky or vertical.
Features are precomputed for every image in the database. For a new query image, the same feature vectors are computed and the distances in each feature space are calculated between the input and every image in the database. Each feature is weighted appropriately so each features has roughly the same influence and all of the distances are aggregated to find the nearest neighbor scenes in the image. At the completion of running the algorithm, there is a set of 40-NN and 100-NN images. To compute accuracy, I use the 1st NN in the set of 40-NN, and I also use the entire set of 40-NN to determine if a resulting nearest neighbor image in this set has the correct geolocation of the query image. The Flickr database, precomputed feature database, the code to compute features and find nearest neighbors was already provided to me, since implementing each of these steps was outside the time-constraints of this project.
Scene Completion
After running IM2GPS on an input image, the 100-NN set of images is used as the nearest neighbor scenes for Scene Completion. I used my Project 4 code to composite photos by computing an alignment within the local mask region, graph cut and Poisson blending to seamlessly combine images. For this step, the user must define the mask region to be taken out of the input image. Furthermore, the user is presented with 30 composited images. Because I only use the alignment cost as the scene completion cost, this used alone is a rather weak metric for measuring the resulting photos. Thus instead of automatically selecting the composited photo with the lowest cost, the user must specify which photo to use as the input for the second pass of IM2GPS.
In general, my results of scene completion were not as good as they could be, simply because I compute nearest neighbor scenes are selected using features extract over the entire input image. To perform scene completion it would be better to compute feature descriptors with the mission region excluded, however I believe this would not have been a trivial change. I decided that the results were good enough, because the goal of scene completion was not to have a seamlessly composited photo, but rather to have a photo filled in with content that could legitimately exist.
Inpainting
I also explored inpainting as an image completion technique. This is based on "Object Removal by Exemplar-based Inpainting", which fills in a missing region by repeating both texture and structure contained in the original image. A description of the algorithm and the source code is available here: http://www.cc.gatech.edu/~sooraj/inpainting/.
Evaluation
Test Set
81 images were used in the test set to evaluate the performance of each im2gps approach. Many of the images were used from the Flickr database, and some were images from my own photo collection. I deliberately used photos that had people in them or contained other objects that are not a permanent component of the scene, just as cars or buses. A number of the images contain recognizable landmarks or provide some geographic information because I was particularly interested in the accuracy of such scenes; a smaller set of the images are "generic" scenes (e.g. a beach, mountain or desert landscape). Overall, the test set is not an even distribution over the entire globe and I ideally would like to further test this with many more input images, including a larger set that contains little geographic information, that are taken from all over the globe.
Quantitative Results
Below are the accuracies of a correct geolocation estimate (within 200km) for both the first nearest neighbor and the set of 40 nearest neighbors returned from im2gps.
IM2GPS:
14.81% 1-NN correct (12 out of 81)
54.32% 40-NN correct (44 out of 81)
Scene Completion:
11.11% 1-NN correct (9 out of 81)
55.56% 40-NN correct (45 out of 81)
Inpainting:
16.05% 1-NN correct (13 out of 81)
62.96% 40-NN correct (51 out of 81)
Image Results
The first 3 nearest neighbors are displayed for each input image.
1-NN correct
40-NN correct
im2gps: India
|
Scene Completion: India + India
|
Inpainting: India
|
|
|
|
|
|
|
im2gps: Italy
|
Scene Completion: Italy + Minnesota
|
Inpainting: Italy
|
|
|
|
Pisa
|
Mexico City
|
St Petersburg
|
|
|
|
|
Paris
|
Mexico City
|
Los Angeles
|
|
|
|
|
|
im2gps: Croatia
|
Scene Completion: Croatia + Italy
|
Inpainting: Croatia
|
|
|
|
|
|
North Carolina
|
Berlin
|
Greece
|
|
|
|
|
im2gps: Turkey
|
Scene Completion: Turkey + Turkey
|
Inpainting: Turkey
|
|
|
|
|
|
|
im2gps: Paris
|
Scene Completion: Paris + Paris
|
Inpainting: Paris
|
|
|
|
|
|
|
im2gps: Utah
|
Scene Completion: Utah + Utah
|
Inpainting: Utah
|
|
|
|
|
|
|
im2gps: Utah
|
Scene Completion: Utah + Utah
|
|
Inpainting: Utah
|
|
|
|
|
Netherlands
|
Colorado
|
Wyoming
|
|
|
|
|
|
|
Washington
|
Africa
|
Argentina
|
|
|
|
|
im2gps: Croatia
|
Scene Completion: Croatia + Malta
|
Inpainting: Croatia
|
|
|
|
Spain
|
Rome
|
Czech Republic
|
|
|
|
|
|
|
im2gps: Fiji
|
Scene Completion: Fiji + Monaco
|
Inpainting: Fiji
|
|
|
|
|
|
Fiji
|
South Africa
|
Tunisia
|
|
|
|
|
im2gps: Peru
|
Scene Completion: Peru + Peru
|
Inpainting: Peru
|
|
|
|
|
|
|
im2gps: Spain
|
Scene Completion: Spain + NYC
|
Inpainting: Spain
|
|
|
|
Germany
|
Thailand
|
San Francisco
|
|
|
|
|
|
|
im2gps: Toronto
|
Scene Completion: Toronto + Toronto
|
Inpainting: Toronto
|
|
|
|
Spain
|
Albania
|
San Francisco
|
|
|
|
|
|
|
im2gps: Malta
|
Scene Completion: Malta + Spain
|
Inpainting: Malta
|
|
|
|
|
|
|
im2gps: Poland
|
Scene Completion: Poland + Chile
|
Inpainting: Poland
|
|
|
|
Beijing
|
Rio de Janeiro
|
Guatemala
|

|
|
|
|
Rio de Janeiro
|
Washington
|
Hong Kong
|
|
|
|
|
India
|
Hong Kong
|
South Africa
|
|
|
|
|
im2gps: Samoa
|
Scene Completion: Samoa + Samoa
|
Inpainting: Samoa
|
|
|
|
|
|
|
im2gps: Vegas
|
Scene Completion: Vegas + Tokyo
|
Inpainting: Vegas
|
|
|
|
|
|
|