This project aims to programatically apply image processing techniques on photos from the Prokudin-Gorskii collection, primarily with a naive approach then a pyramid approach, to produce an image with color.
First, we consider two optimization objectives:
In the python notebook main.ipynb
, we can see that the
displacement pairs for both aligning red and green channels with the blue
one are the same across all 3 JPGs. As expected and as a result, we don't
see any difference between the qualities of the three image pairs. We will
proceed with SSD as the metric since it requires fewer computations per
pair of images than NCC requires (also so we can follow Occam's razor).
We can approach the 3 JPGs naively because they are small images so the
search space isn't too large for this brute force method and the algorithm
won't take long.
First, we extract an image for each of the red, green, blue channels from
the JPG. Then, we crop each with a 10% margin to avoid the edge artifacts
from impacting the brute force search. The borders don't contain useful
information and we don't want some pixels in our search box of all
possible shifts to find a best metric score, and thus alignment, with a
pixel in the edge.
Then, we search over the mentioned above box of
(-max_shift, max_shift)
, computing the score on each possible
shifted image and ultimately, choose the best shift (x, y) associated with
the best metric score. After, we use np.roll()
on this best
shift to efficiently perform the shift to translate the original image,
Let's denote the search strategy of naive align on two images as
naive_align
.
Displacement: Red - (12, 3), Green - (5, 2)
Displacement: Red - (3, 2), Green - (-3, 2)
Displacement: Red - (6, 3), Green - (3, 3)
Since our TIF images are much larger than our JPGs, it's not efficient to run naive align on TIFs. So we must come up with a faster algorithm – multiscale pyramid – by downscaling our image to smaller dimensions, and then choosing better alignment step-by-step as we scale back up to the original size. This approach significantly reduces the time complexity while still achieving good results.
Initially, just like in naive align, we extract the red, green, and blue
images from the image file and crop 10% from each side. This
pre-processing is applied to both calls to align pairs of (green, blue)
and (red, blue) in this pyramid approach.
Next, I find how much we can downscale, each time by a factor of 2, which
was recommended by one of the TAs during a project party. I found that a
restricting our maximum downscale to a minimum size of
100
(of both width and height dimensions) worked best.
Then, I directly downscaled, without a loop, my pair of images by
1/2 ^ {number of layers}
and applied naive align (same method
as the one used in my Naive Approach) with a search size of 15 on the
pair. Now, we need to upscale back to the original dimensions, which I do
with the same factor of 2. At each layer, I resize the images, double both
dimensions of my best shift so far, and then call naive align with a
search size (max_shift
) of 20 to readjust the best shift.
Note: The algorithm above produced all the images with
displacements shown below, except for emir.tif
, which
I had to change the minimum size of downscaling to 200
. When
I displayed all raw tifs, I noticed that the contrast between the three
red, green, and blue channels was much significant than it was for all
other images. This is due to the images not having the same brightness
values. Other tifs had much more similarity amongst their three red,
green, and blue channels.
You can see all the TIF images from both the project spec (example ones)
and from what I self picked from the
Library of Congress collection
- the images are aligned with the pyramid algorithm in the
Appendix.
This strategy enhances the contrast of the pixels by redistributing value intensities across the entire image.
This is better than global histogram equalization because it goes one step
further and limits the height of the histogram and redistributes pixels
uniformly to the rest of the bins. This improves the way we restrict the
amplification of noise.
As we can see in the comparison below, the color of his clothes is darker
compared to the wall behind him; the color of part of the door on the left
and on the ground is also more heavily contrasted with the lighter parts
of the image. Since both methods were applied after the image was aligned,
the displacement is the same for all 3 images.
Displacement: Red - (103, 55), Green - (49, 24)
Displacement: Red - (103, 55), Green - (49, 24)
Displacement: Red - (103, 55), Green - (49, 24)
This method gave me a slightly different green channel displacement from (103, 55), produced by the pyramid approach without any bells or whistles, to (107, 40). Looking closely at the emir's face, we can see that this method makes the image's quality more clear.
Displacement: Red - (103, 55), Green - (49, 24)
Displacement: Red - (107, 40), Green - (49, 24)
Displacement: Red - (58, -4), Green - (25, 4)
Displacement: Red - (103, 55), Green - (49, 24)
Displacement: Red - (124, 13), Green - (59, 16)
Displacement: Red - (89, 23), Green - (41, 17)
Displacement: Red - (112, 11), Green - (51, 9)
Displacement: Red - (178, 13), Green - (81, 10)
Displacement: Red - (108, 36), Green - (51, 26)
Displacement: Red - (140, -27), Green - (33, -11)
Displacement: Red - (176, 37), Green - (78, 29)
Displacement: Red - (112, 11), Green - (53, 14)
Displacement: Red - (87, 32), Green - (42, 5)
Displacement: Red - (67, 11), Green - (21, 18)
Displacement: Red - (75, -8), Green - (-2, 2)
Displacement: Red - (55, 46), Green - (31, 30)