Lab: Light Fields and Camera Arrays

Learning Objectives

  1. Consider spatial and angular sampling in light fields and its connection to blur in images.
  2. Practically refocus a light field using angle integration and shear transforms.
  3. Realize how depth relates to EPI image structure, and estimate depth from light fields by optimizing variance over shear transforms.

Background

A (discrete) light field is a sampling of the plenoptic function, as we discussed in class. We can capture a light field with a camera array, a lens array, or for static scenes we could move a camera on a gantry or a linear stage.

Fig 1. Left: Stanford light field camera; Right: Adobe (large) lens array.
Fig 2. 3D light field capture with a linear stage (synthetic scene using computer graphics).

Today we're going to see how to use images from a 3D ray sampling of a light field captured by a gantry to provide control over the focal plane and focus blur. For this, we will consider slices of the light field called epipolar images (EPIs). We can create an EPI image by slicing images in our linear-stage-captured light field along a scanline. This operation is like stacking all our 1D images into a rectangular prism, and slicing it horizontally:

Fig 3. Constructing an EPI by taking a scanline from each image (top; middle) and placing them into their own image in different rows (bottom; in order left/top to right/bottom). The resolution here has been reduced to make it easier to see.

Relate the EPI image to the horizontal position in the camera array (Fig 4. high res):

Fig 4. Full-resolution EPI from the example light field scene in Fig 2.
Tasks:
  1. Load an image set from below into a multidimensional array. For our 3D light field, this will be a 4D array. From the data below, start with 'Train' with 100 images; if your computer doesn't have a lot of memory, then reduce the number of images that you load in.
  2. Remember to sort your list of filenames so that they remain in order when loaded.
  3. Write a function which creates an EPI image. Remember: it is a slice of the data.
  4. Visualize your EPI to confirm that it looks correct.
  5. Create an animation that sweeps through each EPI in the light field (from top to bottom)—hint: in Python OpenCV, repeatedly calling cv2.imshow("window",data) will work fine - just remember to cv2.waitKey(30) afterwards.

Data

Train: 2D images x 1D linear stage with 500 views. ZIP with 100 images [start with this]. ZIP with all images [be careful; ~400MB; you might not be able to fit them all in memory]

Fig 5. Four views from the MERL Train light field.

Chess: 2D images x 2D linear stage with 289 views. ZIP with 1D - just 17 images [start with this]. ZIP with all images [be careful; ~400MB; you might not be able to fit them all in memory]

Fig 6. Four views (1D) from the Stanford Chess light field.

Lego: 2D images x 2D linear stage with 289 views. ZIP with 1D - just 17 images [start with this]. ZIP with all images [be careful; ~400MB; you might not be able to fit them all in memory]

Fig 7. Four views (1D) from the Stanford Chess light field.

Refocusing by Reparameterization

From our study of projection, we know that perfect pinhole cameras do not blur the image of the world—everything is in focus. However, as we increase the aperture, objects start to blur. We can introduce a lens to focus some of these rays, but rays from objects that are not at the focal plane converge to a depth-dependent point spread function on the sensor.

From our light field datasets above, each one is (almost) all in focus; the aperture is small and the focal plane is set to infinity. Let's assume that they are pinhole cameras. Now, by shifting the position of the camera, we see rays that come from directions that were not visible with just a single view ('parallax'). This is very similar to what happens when we open the aperture, except now these rays are not averaged together—we have sampled them individually (or, at least, more individually!).

This individual ray sampling is powerful. Remember how we integrated rays over time with multiple short exposures to simulate a long exposure without noise in the night lab? Now we're going to integrate rays over space to simulate a larger aperture than we ever had before. Let's average across the vertical dimension of our EPI image—what will we see?

Tasks:
  1. Average your EPI image along its columns—what happens to the EPI image?
  2. Reconstruct one full image of the scene by averaging every EPI scanline—which part of the image is in focus?

Next, we're going to virtually change the focal plane. We can see that the slope of the line in the EPI relates to how far away the scene point is from the focal plane. In our case, the scenes are focused at infinity, so the slope is directly proportional to the depth: a larger slope is closer to the camera, because we see a greater disparity from camera view to camera view.

So, what happens if we change the slope of the line and then integrate? We know how to do this—skew matrices!




Fig 8. Skewing our EPI more and more.
Tasks:
  1. Learn how to skew an image using skimage.transform.warp and skimage.transform.AffineTransform. Use any test image to begin, and make sure you can skew correctly.
  2. Next, skew (or shear) each EPI by some amount.
    1. Your shear matrix should look something like \([1, s, 0; 0, 1, 0; 0, 0, 1]\), where s is the shear amount.
    2. Shearing between [0,-2] is a good range. If you loaded your images in descending order, flip the sign on the shear amount.
    3. You can also use the skimage.transform.AffineTransform(shear=s) form.
  3. Then, average the EPIs in the vertical dimension. Then, reconstruct your image. What happened?
  4. Test different skew values.
  5. Make an animation as you cycle through different skew values. What do we see?

Definitely a stretch goal: Depth Estimation

To estimate depth, we can use the knowledge we just gained: a point in focus will be at the focal plane, and the skew of the matrix determines the slope of the line (or the depth!!!) of that focal plane. Let's look at how the image content varies across skew values, and optimize for a skew value per pixel.

Fig 9. Tao et al. formulation for recovering depth from a light field.
Tasks:
  1. Find the best skew value per pixel by testing a range of skews and considering the 'cost' at each pixel once we integrate across \(y\) angular views.
  2. Estimate the 'defocus' cost from Figure 9 (purple), which is the gradient of the integrated image. Think about why this works.
  3. Estimate the 'correspondence' cost from Figure 9 (cyan), which is the variance across the EPI. Think about why this works.
  4. Visualize the two outputs—where does each succeed?

Conclusion

Light field cameras and camera arrays are powerful computational tools because they let us sample and integrate rays in new ways to simulate optical and lens effects in a principled way. Lytro is the most famous light field camera company (now defunct; bought by Google). Why do you think they went under? Imagine I had to build a light field camera. What is a practical way to build a portable camera? I only have access to the same image sensor technologies as every other camera manufacturer. What do I gain and what do I lose in which rays I sample, and how does this affect the final image quality? As a photographer, or as a consumer, what is most useful to me?

Talk to James to know more : )

Further reading


Submission

Please upload your Python code, input/result images, and any notes of interest as a PDF to Gradescope. Please use writeup.tex for your submission.


Acknowledgements

This lab was developed by the 1290 course staff. Thanks to The (New) Stanford Light Field Archive, and the MERL Light Field Repository.