Project 1 - Images of the Russian Empire

Ethan Chen

Description

This project aims to programatically apply image processing techniques on photos from the Prokudin-Gorskii collection, primarily with a naive approach then a pyramid approach, to produce an image with color.

Naive Approach

First, we consider two optimization objectives:

Normalized Cross-Correlation (NCC): the dot product of two normalized vectors
Sum of Squared Differences (SSD): we use what's stated in the name to calculate this metric

In the python notebook main.ipynb, we can see that the displacement pairs for both aligning red and green channels with the blue one are the same across all 3 JPGs. As expected and as a result, we don't see any difference between the qualities of the three image pairs. We will proceed with SSD as the metric since it requires fewer computations per pair of images than NCC requires (also so we can follow Occam's razor).

Implementation

We can approach the 3 JPGs naively because they are small images so the search space isn't too large for this brute force method and the algorithm won't take long.

First, we extract an image for each of the red, green, blue channels from the JPG. Then, we crop each with a 10% margin to avoid the edge artifacts from impacting the brute force search. The borders don't contain useful information and we don't want some pixels in our search box of all possible shifts to find a best metric score, and thus alignment, with a pixel in the edge.

Then, we search over the mentioned above box of (-max_shift, max_shift), computing the score on each possible shifted image and ultimately, choose the best shift (x, y) associated with the best metric score. After, we use np.roll() on this best shift to efficiently perform the shift to translate the original image, not the cropped image. Finally, we stack the pre-cropped and shifted images to get our final colored image.

Let's denote the search strategy of naive align on two images as naive_align.

JPG pictures generated with the naive approach

Cathedral

Displacement: Red - (12, 3), Green - (5, 2)

Monastery

Displacement: Red - (3, 2), Green - (-3, 2)

Tobolsk

Displacement: Red - (6, 3), Green - (3, 3)

Pyramid Approach

Motivation

Since our TIF images are much larger than our JPGs, it's not efficient to run naive align on TIFs. So we must come up with a faster algorithm – multiscale pyramid – by downscaling our image to smaller dimensions, and then choosing better alignment step-by-step as we scale back up to the original size. This approach significantly reduces the time complexity while still achieving good results.

Implementation

Initially, just like in naive align, we extract the red, green, and blue images from the image file and crop 10% from each side. This pre-processing is applied to both calls to align pairs of (green, blue) and (red, blue) in this pyramid approach.
Next, I find how much we can downscale, each time by a factor of 2, which was recommended by one of the TAs during a project party. I found that a restricting our maximum downscale to a minimum size of 100 (of both width and height dimensions) worked best.

Then, I directly downscaled, without a loop, my pair of images by 1/2 ^ {number of layers} and applied naive align (same method as the one used in my Naive Approach) with a search size of 15 on the pair. Now, we need to upscale back to the original dimensions, which I do with the same factor of 2. At each layer, I resize the images, double both dimensions of my best shift so far, and then call naive align with a search size (max_shift) of 20 to readjust the best shift.
Note: The algorithm above produced all the images with displacements shown below, except for emir.tif, which I had to change the minimum size of downscaling to 200. When I displayed all raw tifs, I noticed that the contrast between the three red, green, and blue channels was much significant than it was for all other images. This is due to the images not having the same brightness values. Other tifs had much more similarity amongst their three red, green, and blue channels.

You can see all the TIF images from both the project spec (example ones) and from what I self picked from the Library of Congress collection - the images are aligned with the pyramid algorithm in the Appendix.

Bells and Whistles

I tried three strategies to improve the quality of some images. The first two are done after aligning the images with the pyramid algorithm:

Global Histogram Equalization
Adaptive Histogram Equalization
Sobel Edge Detection

Global Histogram Equalization

This strategy enhances the contrast of the pixels by redistributing value intensities across the entire image.

Adaptive Histogram Equalization

This is better than global histogram equalization because it goes one step further and limits the height of the histogram and redistributes pixels uniformly to the rest of the bins. This improves the way we restrict the amplification of noise.

As we can see in the comparison below, the color of his clothes is darker compared to the wall behind him; the color of part of the door on the left and on the ground is also more heavily contrasted with the lighter parts of the image. Since both methods were applied after the image was aligned, the displacement is the same for all 3 images.

Emir with only Pyramid Align

Displacement: Red - (103, 55), Green - (49, 24)

Emir with Global Histogram Equalization

Displacement: Red - (103, 55), Green - (49, 24)

Emir with Adaptive Histogram Equalization

Displacement: Red - (103, 55), Green - (49, 24)

Sobel Edge Detection

This method gave me a slightly different green channel displacement from (103, 55), produced by the pyramid approach without any bells or whistles, to (107, 40). Looking closely at the emir's face, we can see that this method makes the image's quality more clear.