Project 4 - [Auto] Stitching Photo Mosaics

Part A - Image Warping and Mosaicing

By Ethan Chen

Introduction

This project aims to use homographies to warp images into another image space and then seamlessly blend them into a mosaic.

Part 1: Shoot the Pictures

I took images on my phone with the auto-light adjustment feature turned off while keeping my hand still and adjusting the angle while keeping the center of projection (COP) of my phone in the same location. I ensured I had some overlap amongst the set of images and included noticeably different elements, like a lamp post or a new building, in those images.

I have included the raw images with points in a later section so they can be closer to the warped images and blended mosaics.

Here are the six sets of my original images with the correspondence points drawn on.

BWW 1211

Left

bww_1211_left_points.jpg

Middle (to warp the left image into)

bww_1211_middle_with_left_points.jpg

Middle (to warp the right image into)

bww_1211_middle_with_left_points.jpg

Right

bww_1211_right_points.jpg

Cory Hallway

Left

cory_hallway_left_points.jpg

Middle (to warp the left image into)

cory_hallway_middle_with_left_points.jpg

Middle (to warp the right image into)

cory_hallway_middle_with_right_points.jpg

Right

cory_hallway_right_points.jpg

Cory Elevators

Left

cory_elevators_left_points.jpg

Middle (to warp the left image into)

cory_elevators_middle_with_left_points.jpg

Middle (to warp the right image into)

cory_elevators_middle_with_right_points.jpg

Right

cory_elevators_right_points.jpg

BWW Outside Window of 2nd Floor

Left

bww_floor2_left_points.jpg

Middle (to warp the left image into)

bww_floor2_middle_with_left_points.jpg

Middle (to warp the right image into)

bww_floor2_middle_with_right_points.jpg

Right

bww_floor2_right_points.jpg

Campanile During the Day

Left

campanile_day_left_points.jpg

Middle (to warp the left image into)

campanile_day_middle_with_left_points.jpg

Middle (to warp the right image into)

campanile_day_middle_with_right_points.jpg

Right

campanile_day_right_points.jpg

Campanile During the Night

Left

campanile_night_left_points.jpg

Middle (to warp the left image into)

campanile_night_middle_with_left_points.jpg

Middle (to warp the right image into)

campanile_night_middle_with_right_points.jpg

Right

campanile_night_right_points.jpg

Part 2: Recover Homographies

To recover homographies - which is how we transform/warp one image to another's projective plane, we can solve a system of equations, which is equivalent to solving for the vector $h$ of values in $Ah = b$, where in $A$, $b$, $x_i, y_i$ are the coordinates of the source image and $x_i^\prime, y_i^\prime$ are the coordinates of the destination image, and $i$ is the number of pairs of correspondence points.

$ \begin{bmatrix} x_1 & y_1 & 1 & 0 & 0 & 0 & -x_1^\prime x_1 & -x_1^\prime y_1 \\ 0 & 0 & 0 & x_1 & y_1 & 1 & -y_1^\prime x_1 & -y_1^\prime y_1 \\ x_2 & y_2 & 1 & 0 & 0 & 0 & -x_2^\prime x_2 & -x_2^\prime y_2 \\ 0 & 0 & 0 & x_2 & y_2 & 1 & -y_2^\prime x_2 & -y_2^\prime y_2 \\ x_3 & y_3 & 1 & 0 & 0 & 0 & -x_3^\prime x_3 & -x_3^\prime y_3 \\ 0 & 0 & 0 & x_3 & y_3 & 1 & -y_3^\prime x_3 & -y_3^\prime y_3 \\ & & & & \vdots & & & \end{bmatrix} \begin{bmatrix} h_1 \\ h_2 \\ h_3 \\ h_4 \\ h_5 \\ h_6 \\ h_7 \\ h_8 \end{bmatrix} = \begin{bmatrix} x_1^\prime \\ y_1^\prime \\ x_2^\prime \\ y_2^\prime \\ x_3^\prime \\ y_3^\prime \\ \vdots \end{bmatrix} $

Then, we can reshape the homography matrix to be 3 by 3.

$ H = \begin{bmatrix} h_1 & h_2 & h_3 \\ h_4 & h_5 & h_6 \\ h_7 & h_8 & 1 \\ \end{bmatrix} $

Part 3: Warp the Images

Here are some examples of the images being warped with the corresponding homography matrices.

BWW Outside Window of 2nd Floor

Left Warped to Middle

bww_warped_left_to_middle.jpg

Right Warped to Middle

bww_warped_right_to_middle.jpg

BWW 1211

Left Warped to Middle

bww_1211_warped_left_to_middle.jpg

Right Warped to Middle

bww_1211_warped_right_to_middle.jpg

Part 4: Image Rectification

For this part, we can choose two sets of 4 points - the first set are the original 4 corners of the object in the image we want to rectify and the second set are the 4 corners we want the image to take up, which we expect to be a front-facing square or rectangle.

Here are the two sets of points on two images, followed by the rectified image.

Crest

First Set of Points

crest_with_points_src.jpg

Second Set of Points

crest_with_points_dst.jpg

Rectified

crest_rectified.jpg

Light Switches

First Set of Points

light_switches_with_points_src.jpg

Second Set of Points

light_switches_with_points_dst.jpg

Rectified

light_switches_rectified.jpg

Part 5: Blend the Images into a Mosaic

I tried two approaches - naively and a more complex one - to blend each set of 3 images into one mosaic image. For both, I first warped the left image to middle image and warped the right image to middle image and found my final canvas by taking the corresponding min and max of the transformed corners.

To naively blend the images, I directly added the pixel values of both warped images mentioned above and then used a mask to do the same for the middle image, which didn't need to be warped since I chose this as the "reference" image. Using a mask was essentially the same as directly adding the pixel values as since I didn't use distance transform in the approach.

My more complex approach was to use Gaussian and Laplacian stacks to blend like I did in Project 2 and also distance transforms. Like Aayush Gupta, a student in a previous semester of CS 180, did, I used a mutually exclusive mask, which is calculated by comparing the scipy.distance_transform_edt applied onto the masks. I had two rounds of this process - the first one was on the warped mask left and warped mask right and second round was on the result of blending those two images and the middle image. I used 5 levels and sigma = 4 and kernel_size = 6 * 4 = 24 for convolving images, just like I did in Project 2.

BWW 1211

Naive

bww_1211_naive_blending.jpg

Gaussian and Laplacian Stacks

bww_1211_blended_mosaic.jpg

Cory Hallway

Naive

cory_hallway_naive_blending.jpg

Gaussian and Laplacian Stacks

cory_hallway_blended_mosaic.jpg

Cory Elevators

Naive

cory_elevators_naive_blending.jpg

Gaussian and Laplacian Stacks

cory_elevators_blended_mosaic.jpg

BWW Outside Window of 2nd Floor

Naive

bww_floor2_naive_blending.jpg

Gaussian and Laplacian Stacks

bww_floor2_blended_mosaic.jpg

Campanile During the Night

Naive

campanile_night_naive_blending.jpg

Gaussian and Laplacian Stacks

campanile_night_blended_mosaic.jpg

Campanile During the Day

Naive

campanile_day_naive_blending.jpg

Gaussian and Laplacian Stacks

campanile_day_blended_mosaic.jpg

The first three blended mosaics in the right column are my best results. They are indoors and there's little fluctuation in the lighting. The latter 3 mosaics are taken with the sky and it seems that my camera kept including exposure differences across the images, despite turning the auto-light adjustment feature off. We can see that the more complex approach effectively makes the mosaic more seamless by reducing the abrupt changes in exposure to the human eye.

Part B - Feature Matching for Autostitching

In this project, we follow some algorithms and methods from the paper Multi-Image Matching using Multi-Scale Oriented Patches” by Brown et al.

Step 1: Detecting corner features in an image

We can use the function get_harris_corners from the starter code to get the Harris corners on each image that we want to stitch together. Here are examples of my two results. I chose a min_distance = 5, which is the minimum distance we allow to separate peaks, to pass into skimage.feature.peak_local_max

Cory Elevators

Right

cory_elevators_right_detected_harris_corners.png

Middle

cory_elevators_middle_detected_harris_corners.jpg

Step 2: Extracting a feature descriptor for each feature point

From Brown et al, we can use the equation below.

$r_i = \underset j \min \lvert \mathbf x_i - \mathbf x_j \rvert, \text{s.t.} f(\mathbf x_i) < c_{\text{robust}} f(\mathbf x_j), \mathbf x_j \in \mathcal{I}$

It aims to filter out corners to get a uniform distribution of corners, selecting the strongest ones while ensuring diversity spatially. We use a robustness factor called c_robust in our suppression condition and use the Euclidean distances between corners so we select ones that aren't too near one another. We aim to maximize the minimum radii, which gives us spatial uniformity amongst corners.

Cory Elevators

Right

cory_elevators_right_anms_harris_corners.png

Middle

cory_elevators_middle_anms_harris_corners.jpg

Step 3: Matching these feature descriptors between two images

Now, using the corner coordinates modified/filtered by ANMS, we can extract the feature descriptors. We achieve this by looping over each of the coordinates, padding them with the half window size, resizing the 40x40 window into an 8x8 one, performing bias/gain normalization and then flattening it. This helps us capture the image structure local to the region around each corner.

Then, we can loop over each descriptor and apply the Lowe's Ratio test on the ratio of the distance between the first and second nearest neighbo. If it's below the threshold, we can add the feature descriptor and its nearest neighbor to our modified set. Finally, we will pass in the two resulting lists of modified descriptors to RANSAC in the next step.

Here are the feature descriptors.

Cory Elevators (Before Feature Matching)

Right

cory_elevators_right_feature_descriptors_before_feature_matching.png

Middle

cory_elevators_middle_feature_descriptors_before_feature_matching.jpg

Cory Elevators (After Feature Matching)

Right

cory_elevators_right_feature_descriptors_after_feature_matching.png

Middle

cory_elevators_middle_feature_descriptors_after_feature_matching.jpg

Step 4: Use a robust method (RANSAC) to compute a homography

Now, we can proceed with our 4-point RANSAC algorithm, which will help us find the best homography to warp one image into another. Then, we will proceed with the same distance transforms for mutually exclusive masks and Gaussian and Laplacian stacks (5 levels) to blend as we did in Part A. The RANSAC procedure is as follows (for each iteration):
  1. Randomly sample without replacement 4 pairs of points from the feature matched ANMS coordinates
  2. Compute the homography $H$ from using those 4 pairs
  3. Calculate the resulting set of points by warping the source points with $H$
  4. Calculate the number of inliers, which are points that have a Euclidean distance from our destination points $<$ a threshold that we choose
  5. Update our best $H$ best_H and chosen (largest so far) inlier set best_inliers if the number of inliers from this current iteration exceeds that of the current best_inliers

We can then proceed to the rest of the algorithm to create our mosaic by using the best_H we got from RANSAC.

Step 5: Produce a mosaic

Here are the manually and automatically stiched results. In Part A, I created mosaics of 3 images but here in Part B, I used two images, specifically the middle and the right, since the exposure difference was much smaller.

Cory Elevators (RANSAC threshold = 4)

Manually Stitched Mosaic

cory_elevators_right_to_middle_mosaic_manual.jpg

Automatically Stitched Mosaic

cory_elevators_right_to_middle_mosaic_auto.jpg

Cory Hallway (RANSAC threshold = 4)

Manually Stitched Mosaic

cory_hallway_right_to_middle_mosaic_manual.jpg

Automatically Stitched Mosaic

cory_hallway_right_to_middle_mosaic_auto.jpg

Blake Street Facing North (RANSAC threshold = 4)

Manually Stitched Mosaic

blake_facing_north_right_to_middle_mosaic_manual.jpg

Automatically Stitched Mosaic

blake_facing_north_right_to_middle_mosaic_auto.jpg

What I learned

It was nice to progress from manually selecting points in Part A - something as fundamental as system of equations could be used for such cool computer vision techniques - and seeing the math behind the auto-detection in Part B that could automate the entire process.