Automatic color aligning and compositing of the Prokudin-Gorskii photo collection
Project 1: Image Alignment with Pyramids
- Files: Github Classroom
- Extra disk space: /course/cs1290_students
- Part 1: Questions
- Questions + template in the repository: questions/
- Hand-in process: Gradescope as PDF. Submit anonymous materials please!
- Due: Weds 30th Sept. 2020, 9pm.
- Part 2: Code
- Writeup template: In the repository: writeup/
- Hand-in process: Gradescope. Upload your repo directory to Gradescope. There is an option to import a repo on submission.
- If this fails: required files are in code/, writeup/writeup.pdf —use the
- Submit anonymous materials please!
- Due: Weds 30th Sept. 2020, 9pm.
Sergei Mikhailovich Prokudin-Gorskii (1863-1944) was a photographer ahead of his time. He saw color photography as the wave of the future and came up with a simple idea to produce color photos: record three exposures of every scene onto a glass plate using a red, a green, and a blue filter and then project the monochrome pictures with correctly coloured light to reproduce the color image; color printing of photos was very difficult at the time. Due to the fame he received from his color photos, including the only color portrait of Leo Tolstoy (a famous Russian author), he won the Tzar's permission and funding to travel across the Russian Empire and document it in 'color' photographs. His RGB glass plate negatives were purchased in 1948 by the Library of Congress. They are now digitized and available on-line.
Take the digitized Prokudin-Gorskii glass plate images and automatically produce a color image with as few visual artifacts as possible. Your program should:
Divide the image into three equal parts and align the three color channels (blue is top, green is middle, red is bottom).
Implement a single-scale sliding window alignment search with an appropriate metric, and record the displacement vector used to align the parts.
The high-resolution images are quite large—for efficiency, implement a multiscale image pyramid alignment algorithm. Report any difference between the single- and multi-scale alignment vectors, plus the speedup factor of the multi-scale approach.
Try your algorithm on other images from the Prokudin-Gorskii collection.
Potentially useful functions:
skimage.transform.resize() or equivalent.
Forbidden functions: Anything that builds and image pyramid for you —write your own function which blurs an image and then subsamples the pixels.
- Assume the images are in BGR order from top to bottom.
- Assume the negatives are evenly divided into 3 plates (i.e., each plate is in exactly 1/3 of the negative).
- Assume that a simple x,y translation model is sufficient for proper alignment.
Python stencil code is available in
code/. You're free to complete this project in any language, but the TAs will only offer support in Python.
Describe your process and algorithm, show your results as images and output values (e.g., alignment vectors, speed-up factors), describe any extra credit and show its effect, tell us any other information you feel is relevant, and cite your sources. We provide you with a LaTeX template in
writeup/writeup.tex. Please compile it into a PDF and submit it along with your code. In class, we will present the project results for interesting cases.
We conduct anonymous TA grading, so please don't include your name or ID in your writeup or code.
Although the color images resulting from this automatic procedure will often look strikingly real, they are still not nearly as good as the manually restored versions available on the LoC website and from other professional photographers. However, each photograph takes hours of manual Photoshop work, such as adjusting the color levels, removing blemishes, or adding contrast. Can you come up with ways to address these problems automatically? Feel free to devise your own approaches or talk to the Professor or TAs about your ideas. There is no right answer here, just try out things and see what works.
Here are some ideas, but we will give credit for other clever ideas:
- Up to 4 pts: Automatic cropping. Remove white, black, or other color borders. Don't just crop a predefined margin from each side—actually try to detect the borders or the edge between the border and the image.
- Up to 3 pts: Automatic contrasting. We could rescale image intensities such that the darkest pixel is zero (on its darkest color channel) and the brightest pixel is 1 (on its brightest color channel). More drastic or non-linear mappings may improve perceived image quality.
- Up to 5 pts: Automatic white balance. This involves two problems: 1) estimating the illuminant and 2) manipulating the colors to counteract the illuminant and simulate a neutral illuminant. Step 1 is difficult in general, while step 2 is simple (see the Wikipedia page on Color Balance and section 2.3.2 in the Szeliski book). There exist some simple algorithms for step 1, which don't necessarily work well. Assume that the average color or the brightest color is the illuminant and shift those to gray or white.
- Up to 3 pts: Better color mapping. There is no reason to assume (as we have) that the red, green, and blue lenses used by Produkin-Gorskii correspond directly to the R, G, and B channels in RGB color space. Try to find a mapping that produces more realistic colors (and perhaps makes the automatic white balancing less necessary).
- Up to 3 pts: Better features. Instead of aligning based on RGB similarity, try using gradients or edges.
- Up to 5 pts: Better alignment. Instead of searching for the best x and y translation, additionally search over small scale changes and rotations. Adding two more dimensions to your search will slow things down, but the same course to fine progression should help alleviate this. Alternatively, try to find sub-pixel alignments.
- Up to 4 pts: Aligning and processing data from other sources. In many domains, such as astronomy, image data is still captured one channel at a time. Often the channels don't correspond to visible light, but NASA artists stack these channels together to create false color images. For example, here is a tutorial on how to process Hubble Space Telescope imagery yourself. Also, consider images like this one of a coronal mass ejection built by combining ultraviolet images from the Solar Dynamics Observatory. To get full credit for this, you need to demonstrate that your algorithm found a non-trivial alignment and color correction.
Use Gradescope to submit your repo directly. When creating a submission, you will be asked to upload your repo. Please do not commit the image files back to the repo! They are too big and will cause problems. Instead, put your results in the writeup pdf, compile that, then submit that too.
As such, the repo you hand in must contain the following:
- code/ - directory containing all your code for this assignment
- writeup/writeup.pdf - your report as a PDF file generated with Latex.
You will lose points if you do not follow instructions. Every time after the first that you do not follow instructions, you will lose 5 points.
- +55 pts: Single-scale implementation
- +35 pts: Multi-scale implementation
- +10 pts: Write up
- +10 pts: Extra credit (up to ten points)
- -5*n pts: Lose 5 points for every time (after the first) you do not follow the instructions for the hand in format
The easiest way to align the parts is to exhaustively search over a window of possible displacements (e.g., [-15,15] pixels), score each one using some image matching metric, and take the displacement with the best score. There are several possible metrics to measure how well images match:
- Sum of squared differences:
sum( (image1-image2)^2 )
- Normalized cross correlation:
dot( image1 / ||image1||, image2 / ||image2|| ) —or consider the zero-normalized cross-correlation variant with the mean subtracted as seen in class. Note that numpy.dot() will do matrix multiplication for 2D arrays; we want the sum of the element-wise product.
Note that in this particular case, the images to be matched do not actually have the same brightness values (they are different color channels), so other metrics might work better.
Exhaustive search will become prohibitively expensive if the displacement search range or image resolution are too large. This will be the case for high-resolution glass plate scans. To avoid this, you will need to implement a coarse-to-fine search strategy using an image pyramid. An image pyramid represents the image at multiple scales (usually scaled by a factor of 2). Start from the coarsest scale (smallest image) and update your displacement estimate as you go down the pyramid.
data/ holds the digitized glass plate images in low- and high-resolution versions, so consider trying your alignment algorithm on the low-resolution version first to test its performance more quickly.
- Try not to become bogged down tweaking input parameters. Most images will line up using the same parameters, but not all. Your final results should be the product of a fixed set of parameters (if you have free parameters). Don't worry if one or two of the handout images don't align properly using the simpler metrics suggested here.
- The input images can be in jpg (uint8) or tiff format (uint16). Remember to convert all the formats to the same scale
- When building your Gaussian pyramid, filter and downsample iteratively—you cannot downsample without aliasing from the largest image size directly to the smallest image size.
- You can create the coordinates of the window you are shifting over by using
np.meshgrid and turn that into a list of (x,y) pairs.
- The borders of the images will probably hurt your results; try computing your metric on the internal pixels only.
- Output all of your images to high-quality jpg as it'll save you a lot of disk space.
- Finally, remember Numpy's coordinate order—Numpy arrays are accessed (y, x).
Project derived by James Hays from Alexei A. Efros' Computational Photography course, with permission. Converted to Python by Trevor Houchens.