Lab: Compositing and Morphology

Learning Objectives

Implementing a Laplacian pyramid to composite two image regions.
Using simple techniques to separate foreground from background from pairs of images.
Understanding morphology as a technique for editing binary images, and how to use it in Python.

Prerequisites

Basic knowledge of programming in Python.
Implementation of Gaussian pyramids in Python (from Project 1).

Introduction

Compositing is the process of copying or inserting a part of one image into another image. Good compositing is hard for many reasons: because the image content must match in perspective, lighting, and in scene sense; because we must handle pixels at the edge of an image part which integrate light from the unwanted and wanted parts (e.g., the background and foreground); and because some objects are translucent or transparent and show the background.

**Fig 1. Good compositing, sort of: Helicopter Shark**

**Fig 2. Bad compositing, maybe: Adobe Photoscot**

Matching the perspective and lighting are a bit too complex for this lab, and we'll get to them at the end of the course. Today, we're going to implement scale-aware blending with Laplacian pyramids, and look at some simple background/foreground separation techniques.

Key Papers

Burt and Adelson, A Multiresolution Spline with Application to Image Mosaics, 1983. PDF

Test Images

Please download these yourself from this webpage.

**Fig 3. Apple (with scanned paper texture), from Burt and Adelson 1983.**

**Fig 4. Orange (with scanned paper texture), from Burt and Adelson 1983.**

Feathered alpha blending

As we saw in lecture, we will blend these two images together along a central vertical line. First, we'll use the 'simple' method of alpha blending with a variable overlap width.

Tasks:

Write a function which, given an image size, outputs a alpha matte image $[0,1]$ to blend the apple and orange image along a vertical line. Declare two input parameters to the function which define the width and location of the region to feather, i.e., the transition from one image to another.
Write a function which, given two images and an alpha matte, produces a blended image by the formula: $$I_O = \alpha I_1 + (1-\alpha) I_2$$
Experiment with the matte feathering parameters to find a good width for these two images.

Your alpha matte should look something like this:

**Fig 5. Alpha mattes with narrow and wide feathering.**

Your blended images should look something like this:

**Fig 6. Blended images with narrow and wide feathering.**

Laplacian pyramid blending

Picking the right feathering width for each individual image takes time, and we'd like to develop a method which is able to consider image frequencies across scales within our blending. Laplacian pyramid blending lets us accomplish this.

**Fig 7a. Laplacian pyramid construction. Image © Stanford Exploration Project, 2002.**

**Fig 7b. Laplacian pyramid reconstruction. Image © Stanford Exploration Project, 2002.**

Note: Just for easier visualization, the edges here in $h_0$ and $h_1$ are visualized by adding mid gray (e.g., 0.5 if we're working in [0,1], or 128 if [0,255]) and multiplying their magnitude.

Tasks:

Factor your implementation of Gaussian pyramid construction from Project 1 into a function, and use/modify it to implement a function which constructs a Laplacian pyramid. At the smallest pyramid layer ($f_2$ in Figure 7), we keep the intensity image and not the detail image (what would be $h_2$).

Then, implement Laplacian pyramid blending:

Build Laplacian pyramids $L_1$ and $L_2$ for images $I_1$ and $I_2$ respectively (Figure 7a).
Build a Gaussian pyramid $G_M$ for the image mask $M$—note that $M$ can be a mask and not a matte. It does not need to be feathered, because we're constructing the pyramid!
Form a combined pyramid $L_O$ from $L_1$ and $L_2$ using nodes of $G_M$ as weights. That is, for each layer $l$ and pixel $(i,j)$: $$L_{Ol}(i, j) = G_{Ml}(i, j)L_{1l}(i, j) + (1 - G_{Ml}(i, j) )L_{2l}(i, j)$$
Obtain the blended image $I_O$ by expanding and summing the levels of $L_O$ (Figure 7b).

Useful notes: We can create a non-feathered mask using our parameterized blend width function from earlier. You can also create any $M$ in your favourite image editing software.

Pyramid Visualization might help us understand what is happening, or to debug. We provide the create_pyramid_image() function here, which assumes that your pyramid is stored in a multidimensional numpy array, which helps us cope with the different image sizes across the pyramid. The create_pyramid_image() function will produce an image which is similar to Figure 10 in Burt and Adelson.

Result:

Here's Burt and Adelson's result (with some color and texture artifacts from scanning). How does your result compare?

**Fig 8. Laplacian blended images from Burt and Adelson 1983.**

Question: What construction variables might affect your result? Can we make these pyramid function parameters, and vary their values to see a difference?

Tasks:

Time to capture your own images and blend them!

Try to capture images that do not look good with a feathered blend, but look good with Laplacian blending. Think about what kinds of image content we would need across the blend.
What masks can we create for an interesting effect? (e.g., recall the eye in hand example from lecture.)

Simple Green Screen Segmentation

In class, we saw how the matting problem of extracting a foreground object from a background of known color can be solved in closed form under some assumptions via a system of linear equations. However, for this lab, we're going to take the less principled approach ("a hack", you say?) of difference imaging and then clean up the mask with a new kind of image processing: morphology.

Task:

If you have access to a green screen 'studio' and a tripod, take a pair of photographs: one of the green screen background with no foreground object in it, and then one with a foreground object against the green screen (e.g., yourself, your colleague, or Percival the Elephant). Otherwise, feel free to use our image pairs (below).
In Python, compute the difference image by subtracting one image from the other, and create a mask by thresholding this difference. Try different thresholds to find an acceptable mask.
- Remember to work in a data type that can cope with numbers as large as the maximum difference we will see.
- In principle, difference imaging doesn't need a known color background—any pair should work. Can you exploit this extra information? What happens if we operate on only one channel, or in a transformed color space like HSV?
- One option might be to consider modeling the entire green screen as a Gaussian distribution in a color space, then measuring how far away each new pixel sample is in the subject image from this distribution.

Our image pairs:

**Fig 9. Image pairs for segmentation.**

**Fig 10. Antagonistic image pairs for segmentation.**

Masking difficulties:

Typically this technique will only work if the images are taken close together in time, or in controlled lighting conditions. Even then, minor variations in scene lighting between shots from the subject themselves occluding and scattering light will cause artifacts in the result. Here, we just can't quite get the right threshold to segment just the object—either we remove all the background, or some of the foreground.

Fig 11. Even with non-antagonistic image pair, it is hard to pick an appropriate threshold.
Threshold values (left to right): 0.30, 0.35, 0.40.

Interlude: Alternative Simple Background Modeling Approaches

A brief interlude: if you are interested in seeing other approaches then we've included a few simple background segmentation references here. These work by the basic process of difference imaging, too, but with more sophisticated pixel appearance models.

Gaussian per pixel: The scene content (and noise!) is represented by a Gaussian distribution in RGB per pixel. The foreground and background are separated by thresholding per pixel the likelihood that a foreground pixel comes from the background. Wren et al., Pfinder: Real-time Tracking of the Human Body, 1999.
Gaussian mixture modeling: Using a per-pixel model where the scene content (and noise!) is represented by $k$ Gaussian color distributions. See Stauffer and Grimson, Adaptive Background Mixture Models for Real-time Tracking, 1999.

Stretch Goal: Cleaning up the segmentation with morphology

The segmentation mask computed from the simple difference image has many problems. Some large regions are either not selected or incomplete, but many small single pixels along the foreground/background edge remain, too. Morphology lets us clean up binary masks with the mathematics of sets and structure elements. Think of morphology as a kind of filtering but, instead of using continuous addition and multiplication for the real domain, we use set notation for the binary domain.

**Fig 12. Binary mask cleaned by the morphological operation *erosion*.**

Reading: An introduction to morphology, by Danny Alexander @ UCL. Please read this slide deck with your colleague and see what operations and effects morphology provides. Slide 44 and beyond are less of a concern for today's lab.

In Python:

OpenCV has support for many kinds of morphological operation (docu here), including:

cv2.erode() erodes an image. See page 13 (slide number 20).
cv2.dilate() dilates an image. See page 15 (slide number 22).
cv2.morphologyEx(img, cv2.MORPH_OPEN, kernel) erodes and then dilates an image. See page 38–45 (slide number 45–52).
cv2.morphologyEx(img, cv2.MORPH_CLOSE, kernel) dilates and then erodes an image. See page 38–45 (slide number 45–52)
cv2.getStructuringElement() gets a structuring element. See page 17 (slide number 24).

Task:

Investigate how morphological operators (and other simple pixel manipulations) can be used to improve your segmentation mask! Include your script within your writeup which explains your approach.

Final task

Composite your foreground onto an interesting background...

**Fig 14. Blueno is my best friend.** Background photo © Tony Pacitti, Providence Monthly

Submission

Please upload your Python code, input/result images, and any notes of interest as a PDF to Gradescope.

Acknowledgements

This lab was developed by the 1290 course staff.