Lab: Stereo

Note: This session has a complementary soundtrack, because we know you all like Francophilic early 90s avant-pop.

Learning Objectives

  1. Understand the principles of stereo vision.
  2. Use two images taken from slightly different perspectives to find the depth of objects in an image.

Background

Feel free to read and follow the slides at the end of the corresponding stereo deck for more detail on the epipolar geometry described below. PPTX | PDF

A photograph flattens the three-dimensional world into a two-dimensional image plane. Information about the 3D structure of the word, like the size and distance of different objects from the camera, is distorted or lost.

How can we recover this lost information? We might take inspiration from our own visual system. Humans perceive depth using a number of mechanisms, including the fact that our vision is binocular: we have two eyes that see the world from slightly different places.

We can emulate our binocular vision by using two photos taken from slightly different perspectives. How far corresponding points move from one photo to the other will tell us how far away the objects are. Objects closer to the camera will move more.

When we take two images of the same scene from different perspectives, then points along lines in the first photo will lie along different lines in the second photo.

Fig 1. By stereo geometry, points in one image are constrained to lie on a line in a second image. This line is called an epipole.

These "epipolar lines" are very helpful; they tell us where we should look for corresponding points in our two images. These lines are computed using the parameters of the two cameras, but in this lab we will ignore this process and provide you with images that have already been rectified, meaning that the images have been transformed such that corresponding points lie along horizontal lines across both photos.

Fig 2. Rectification projects stereo images to be coplanar if their sensors were originally not aligned.

Corresponding Points

To find the depth of a point in the image, we want to find out how much that point moves in the sensor plane between its projection onto our two stereo images. This is called the disparity. Disparity is related to depth by: $$z = \frac{ft}{x_l-x_r},$$ where \(z\) is the depth, \(t\) is the baseline distance between the two optical centers of the cameras, \(f\) is the (shared) focal length of both cameras, and \(x_l-x_r\) is the disparity.

To do this, we need to find the corresponding point in the second image. Since we are working with rectified images, we can restrict our search to the same horizontal line in both images. Our approach will be to slide a small window across the second image and record the patch that is most similar to the patch in the first image. The most similar patch contains the corresponding point, and the disparity between the location of these points in the rectified images tells us the depth of the points.

Data

Our stereo pairs have been taken from the Middlebury Stereo datasets and evaluation benchmark. Specifically, four scenes from the 2006 collection.

In this dataset, the focal length of the camera is 3740 pixels, and the baseline is 160 mm.


Fig 3. Bowling test stereo image pair (left/right).
Fig 4. Flowerpots test stereo image pair (left/right).
Fig 5. Lampshade test stereo image pair (left/right).
Fig 6. Midd2 test stereo image pair (left/right).

Task

We will implement a block matching algorithm for finding corresponding points in two rectified images. The stencil code has an matrix the same size as the input images called disparity. For each pixel (y, x) in the first image, we wish to set disparity(y, x) to be the horizontal distance between the pixel in the first image and its corresponding point in the second.

We've seen a few block matching approaches; let's parameterized by pixels in the window \(x,y\) and disparity \(d\) and try our old favorites, plus some new approaches:

What are the advantages and disadvantages of these new measures compared to the more familiar distances that we have seen?

Stencil code

function [ disparity ] = stereo_match(left, right, window_size, search_size)

% Using the stereo image pair (left, right), calculate the disparity map
% (related to depth map) by using a block matching algorithm.
% window_size : the size of the kernel (e.g., 7)
% search_size : the maximum search offset (e.g., 30)

% Hint: We resized our input images to 244 x 300 for speed.
im_left = rgb2gray(imread(left));
im_right = rgb2gray(imread(right));
[h,w] = size(im_left);

disparity = zeros(h, w, 'uint8');

% Hint: We could write loops over window offsets; OR, we could offset whole images in advance into a 3D array...

% For every scanline in the left image (ignoring border)
half_window_size = floor(window_size/2);
for y = half_window_size+1 : h-half_window_size
    % For every pixel in the scanline (ignoring border)
    for x = half_window_size+1 : w-half_window_size
        
        %% Your code for calculating disparity for each pixel (y, x) goes here %%
        %% Remember: window_size, search_size

    end
end

% Scale the distances to get a image where white=close, black=far
scale = 255 / search_size;
disparity = uint8(disparity .* scale);
imshow(disparity);
end    
Listing 1. Stencil code for stereo disparity matching.
Fig 7. Simple block matching disparity with SSD metric.

More experiments (inspiration)

Stretch Goal: Dynamic Programming scanline stereo

We saw in class how we can add ordering and uniqueness constraints to our solution by setting up stereo disparity estimation as a dynamic programming problem. This is heavily related to the boundary seam creation methods from the textures and seams coursework project.

Fig 8. Scanline stereo as a directed graph, which can be solved via dynamic programming. Figure © Boykov.
Fig 9. Directed graph cost matrices in scanline/scanline space (left) and in disparity/scanline space (right). Figure © Brown, Burschka, Hager, 2003.

Further reading


Submission

Please upload your MATLAB code, input/result images, and any notes of interest as a PDF to Gradescope. Please use writeup.tex for your submission.


Acknowledgements

This lab was developed by the 1290 course staff. Thanks to Andrea Fusiello.