Final Project Writeup: IM2GPS

Final Project Writeup: Image Completion + IM2GPS
Basia Korel (bkorel)
May 17, 2010

Project Overview

This project is motivated by the problem of global image geolocation. Specifically given an input image, estimate its geographic location by scene matching against a database of millions of geo-tagged photos taken from the Internet. Obtaining geolocation estimates is useful for numerous geographic information applications such as estimating population density, cultural differences (1) and land coverage. This project is based off of the IM2GPS (2) paper; in this effort many of the test input images contain a mixture of scenes with landmarks, of generic scenes that provide little geographic information and of randomly sampled photos. What the majority of these images have in common is they represent what the scene should actually look like. That is, "this is what the picture of this scene would look like given your average day" (no parades on the streets of Paris, a sand storm in the Sahara or a giant cruise ship blocking the view of a beautiful port city in the Greek isles). This motivated me to consider an approach to IM2GPS for images that contain "noise"; in this context I will define noise as people, cars or any other object that is not a permanent part of the scene.

Given a set of such input images, I investigated the problem of segmenting out the non-permanent portions of a scene (which is user-specified) and filling in the missing region to try and improve the accuracy of IM2GPS. I use two different image completion techniques to generate a photo of what the scene might look like without the person/car/object: scene completion (3) and inpainting (4). Scene completion is also a data-driven approach that uses other semantically similar images to composite the input, while inpainting uses existent content from the photo to fill in the hole. I analyze and compare the results of running IM2GPS on: the original input, the scene completed input and the inpainted input images.

This process is dependent on the user to define the region to be filled in, and also to select the appropriate composited photo after scene completion that could resemble what the true scene looks like without the person or car. Ideally, an enhanced IM2GPS application would automatically perform image segmentation through object detection and a higher-level understanding of what objects or portions of the scene should be eliminated from the photo, and image completing the scene.

Approach

IM2GPS

IM2GPS estimates the geographic location of an input image using a data-driven scene matching approach. The approach leverages 6+ million geo-tagged photos collect from Flickr.
In order to perform scene matching of images, different feature descriptors are extracted from the photos to determine how semantically similar two images are.
Although the implementation uses a combination of many feature descriptors, the following are the main feature descriptors used:

Tiny Images: 16 x 16 color images.
Color histograms: Joint histograms of color in L*a*b* color space.
Texton histograms: Build texton dictionary using a bank of filters.
Line Features: Histogram of statistics of line segments in the image.
Gist Descriptor + Color: A measure of how much edge energy is in the scene, taken at different frequencies and orientations.
Geometric Context: The probability for any segment in the image to be ground sky or vertical.

Features are precomputed for every image in the database. For a new query image, the same feature vectors are computed and the distances in each feature space are calculated between the input and every image in the database. Each feature is weighted appropriately so each features has roughly the same influence and all of the distances are aggregated to find the nearest neighbor scenes in the image. At the completion of running the algorithm, there is a set of 40-NN and 100-NN images. To compute accuracy, I use the 1st NN in the set of 40-NN, and I also use the entire set of 40-NN to determine if a resulting nearest neighbor image in this set has the correct geolocation of the query image. The Flickr database, precomputed feature database, the code to compute features and find nearest neighbors was already provided to me, since implementing each of these steps was outside the time-constraints of this project.

Scene Completion

After running IM2GPS on an input image, the 100-NN set of images is used as the nearest neighbor scenes for Scene Completion. I used my Project 4 code to composite photos by computing an alignment within the local mask region, graph cut and Poisson blending to seamlessly combine images. For this step, the user must define the mask region to be taken out of the input image. Furthermore, the user is presented with 30 composited images. Because I only use the alignment cost as the scene completion cost, this used alone is a rather weak metric for measuring the resulting photos. Thus instead of automatically selecting the composited photo with the lowest cost, the user must specify which photo to use as the input for the second pass of IM2GPS.

In general, my results of scene completion were not as good as they could be, simply because I compute nearest neighbor scenes are selected using features extract over the entire input image. To perform scene completion it would be better to compute feature descriptors with the mission region excluded, however I believe this would not have been a trivial change. I decided that the results were good enough, because the goal of scene completion was not to have a seamlessly composited photo, but rather to have a photo filled in with content that could legitimately exist.

Inpainting

I also explored inpainting as an image completion technique. This is based on "Object Removal by Exemplar-based Inpainting", which fills in a missing region by repeating both texture and structure contained in the original image. A description of the algorithm and the source code is available here: http://www.cc.gatech.edu/~sooraj/inpainting/.

Evaluation

Test Set

81 images were used in the test set to evaluate the performance of each im2gps approach. Many of the images were used from the Flickr database, and some were images from my own photo collection. I deliberately used photos that had people in them or contained other objects that are not a permanent component of the scene, just as cars or buses. A number of the images contain recognizable landmarks or provide some geographic information because I was particularly interested in the accuracy of such scenes; a smaller set of the images are "generic" scenes (e.g. a beach, mountain or desert landscape). Overall, the test set is not an even distribution over the entire globe and I ideally would like to further test this with many more input images, including a larger set that contains little geographic information, that are taken from all over the globe.

Quantitative Results

Below are the accuracies of a correct geolocation estimate (within 200km) for both the first nearest neighbor and the set of 40 nearest neighbors returned from im2gps.

IM2GPS:
14.81% 1-NN correct (12 out of 81)
54.32% 40-NN correct (44 out of 81)

Scene Completion:
11.11% 1-NN correct (9 out of 81)
55.56% 40-NN correct (45 out of 81)

Inpainting:
16.05% 1-NN correct (13 out of 81)
62.96% 40-NN correct (51 out of 81)

Image Results

The first 3 nearest neighbors are displayed for each input image.

1-NN correct
40-NN correct

im2gps: India

Scene Completion: India + India

Inpainting: India

Inda	London	Oklahoma

Spain	Austria	Barcelona

Spain	Austria	India

im2gps: Italy

Scene Completion: Italy + Minnesota

Inpainting: Italy

Pisa	Mexico City	St Petersburg

Paris	Mexico City	Los Angeles

Pisa	Mexico City	Pisa

im2gps: Croatia

Scene Completion: Croatia + Italy

Inpainting: Croatia

Greece	Italy	Berlin

Greece	Venice	Nepal

North Carolina	Berlin	Greece

im2gps: Turkey

Scene Completion: Turkey + Turkey

Inpainting: Turkey

Turkey	Venice	Barcelona

Turkey	Rome	Rome

Turkey	Turkey	England

im2gps: Paris

Scene Completion: Paris + Paris

Inpainting: Paris

NYC	Italy	Paris

Paris	DC	UK

Paris	London	Romania

im2gps: Utah

Scene Completion: Utah + Utah

Inpainting: Utah

Mendoza	Uruguay	Berlin

Utah	Utah	USA

Utah	Nevada	Scotland

im2gps: Utah

Scene Completion: Utah + Utah

Inpainting: Utah

Netherlands	Colorado	Wyoming

Utah	USA	Italy

Washington	Africa	Argentina

im2gps: Croatia

Scene Completion: Croatia + Malta

Inpainting: Croatia

Spain	Rome	Czech Republic

Hyderabad	Spain	Oman

Thailand	India	Spain

im2gps: Fiji

Scene Completion: Fiji + Monaco

Inpainting: Fiji

Hawaii	Fiji	Panama

Hawaii	Fiji	Tunisia

Fiji	South Africa	Tunisia

im2gps: Peru

Scene Completion: Peru + Peru

Inpainting: Peru

Ireland	Peru	Namibia

Ireland	Peru	Greece

Peru	Ireland	Greece

im2gps: Spain

Scene Completion: Spain + NYC

Inpainting: Spain

Germany	Thailand	San Francisco

Germany	New York	Paris

Spain	Lybia	Paris

im2gps: Toronto

Scene Completion: Toronto + Toronto

Inpainting: Toronto

Spain	Albania	San Francisco

London	Toronto	Barcelona

Albania	Paris	Paris

im2gps: Malta

Scene Completion: Malta + Spain

Inpainting: Malta

Colorado	Taiwan	Gambia

Maldives	Japan	Gambia

Venice	Aruba	Vermont

im2gps: Poland

Scene Completion: Poland + Chile

Inpainting: Poland

Beijing	Rio de Janeiro	Guatemala

Rio de Janeiro	Washington	Hong Kong

India	Hong Kong	South Africa

im2gps: Samoa

Scene Completion: Samoa + Samoa

Inpainting: Samoa

Fiji	Fiji	Jamaica

Fiji	Cuba	Taiwan

Fiji	Fiji	Brazil

im2gps: Vegas

Scene Completion: Vegas + Tokyo

Inpainting: Vegas

Rome	Beijing	Egypt

Palestine	Egypt	Cairo

Egypt	Taipei	Paris