Project 4 - [Auto] Stitching Photo Mosaics

Part A - Image Warping and Mosaicing

By Ethan Chen

Introduction

This project aims to use homographies to warp images into another image space and then seamlessly blend them into a mosaic.

Part 1: Shoot the Pictures

I took images on my phone with the auto-light adjustment feature turned off while keeping my hand still and adjusting the angle while keeping the center of projection (COP) of my phone in the same location. I ensured I had some overlap amongst the set of images and included noticeably different elements, like a lamp post or a new building, in those images.

I have included the raw images with points in a later section so they can be closer to the warped images and blended mosaics.

Here are the six sets of my original images with the correspondence points drawn on.

BWW 1211

Left

Middle (to warp the left image into)

Middle (to warp the right image into)

Right

Cory Hallway

Left

Middle (to warp the left image into)

Middle (to warp the right image into)

Right

Cory Elevators

Left

Middle (to warp the left image into)

Middle (to warp the right image into)

Right

BWW Outside Window of 2nd Floor

Left

Middle (to warp the left image into)

Middle (to warp the right image into)

Right

Campanile During the Day

Left

Middle (to warp the left image into)

Middle (to warp the right image into)

Right

Campanile During the Night

Left

Middle (to warp the left image into)

Middle (to warp the right image into)

Right

Part 2: Recover Homographies

To recover homographies - which is how we transform/warp one image to another's projective plane, we can solve a system of equations, which is equivalent to solving for the vector $h$ of values in $Ah = b$, where in $A$, $b$, $x_i, y_i$ are the coordinates of the source image and $x_i^\prime, y_i^\prime$ are the coordinates of the destination image, and $i$ is the number of pairs of correspondence points.

$ \begin{bmatrix} x_1 & y_1 & 1 & 0 & 0 & 0 & -x_1^\prime x_1 & -x_1^\prime y_1 \\ 0 & 0 & 0 & x_1 & y_1 & 1 & -y_1^\prime x_1 & -y_1^\prime y_1 \\ x_2 & y_2 & 1 & 0 & 0 & 0 & -x_2^\prime x_2 & -x_2^\prime y_2 \\ 0 & 0 & 0 & x_2 & y_2 & 1 & -y_2^\prime x_2 & -y_2^\prime y_2 \\ x_3 & y_3 & 1 & 0 & 0 & 0 & -x_3^\prime x_3 & -x_3^\prime y_3 \\ 0 & 0 & 0 & x_3 & y_3 & 1 & -y_3^\prime x_3 & -y_3^\prime y_3 \\ & & & & \vdots & & & \end{bmatrix} \begin{bmatrix} h_1 \\ h_2 \\ h_3 \\ h_4 \\ h_5 \\ h_6 \\ h_7 \\ h_8 \end{bmatrix} = \begin{bmatrix} x_1^\prime \\ y_1^\prime \\ x_2^\prime \\ y_2^\prime \\ x_3^\prime \\ y_3^\prime \\ \vdots \end{bmatrix} $

Then, we can reshape the homography matrix to be 3 by 3.

$ H = \begin{bmatrix} h_1 & h_2 & h_3 \\ h_4 & h_5 & h_6 \\ h_7 & h_8 & 1 \\ \end{bmatrix} $

Part 3: Warp the Images

Here are some examples of the images being warped with the corresponding homography matrices.

BWW Outside Window of 2nd Floor

Left Warped to Middle

Right Warped to Middle

BWW 1211

Left Warped to Middle

Right Warped to Middle

Part 4: Image Rectification

For this part, we can choose two sets of 4 points - the first set are the original 4 corners of the object in the image we want to rectify and the second set are the 4 corners we want the image to take up, which we expect to be a front-facing square or rectangle.

Here are the two sets of points on two images, followed by the rectified image.

Crest

First Set of Points

Second Set of Points

Rectified

Light Switches

First Set of Points

Second Set of Points

Rectified

Part 5: Blend the Images into a Mosaic

I tried two approaches - naively and a more complex one - to blend each set of 3 images into one mosaic image. For both, I first warped the left image to middle image and warped the right image to middle image and found my final canvas by taking the corresponding min and max of the transformed corners.

To naively blend the images, I directly added the pixel values of both warped images mentioned above and then used a mask to do the same for the middle image, which didn't need to be warped since I chose this as the "reference" image. Using a mask was essentially the same as directly adding the pixel values as since I didn't use distance transform in the approach.

My more complex approach was to use Gaussian and Laplacian stacks to blend like I did in Project 2 and also distance transforms. Like Aayush Gupta, a student in a previous semester of CS 180, did, I used a mutually exclusive mask, which is calculated by comparing the scipy.distance_transform_edt applied onto the masks. I had two rounds of this process - the first one was on the warped mask left and warped mask right and second round was on the result of blending those two images and the middle image. I used 5 levels and sigma = 4 and kernel_size = 6 * 4 = 24 for convolving images, just like I did in Project 2.

BWW 1211

Naive

Gaussian and Laplacian Stacks

Cory Hallway

Naive

Gaussian and Laplacian Stacks

Cory Elevators

Naive

Gaussian and Laplacian Stacks

BWW Outside Window of 2nd Floor

Naive

Gaussian and Laplacian Stacks

Campanile During the Night

Naive

Gaussian and Laplacian Stacks

Campanile During the Day

Naive

Gaussian and Laplacian Stacks

The first three blended mosaics in the right column are my best results. They are indoors and there's little fluctuation in the lighting. The latter 3 mosaics are taken with the sky and it seems that my camera kept including exposure differences across the images, despite turning the auto-light adjustment feature off. We can see that the more complex approach effectively makes the mosaic more seamless by reducing the abrupt changes in exposure to the human eye.

Part B - Feature Matching for Autostitching

In this project, we follow some algorithms and methods from the paper Multi-Image Matching using Multi-Scale Oriented Patches” by Brown et al.

Step 1: Detecting corner features in an image

We can use the function get_harris_corners from the starter code to get the Harris corners on each image that we want to stitch together. Here are examples of my two results. I chose a min_distance = 5, which is the minimum distance we allow to separate peaks, to pass into skimage.feature.peak_local_max

Cory Elevators

Right

Middle

Step 2: Extracting a feature descriptor for each feature point

From Brown et al, we can use the equation below.

$r_i = \underset j \min \lvert \mathbf x_i - \mathbf x_j \rvert, \text{s.t.} f(\mathbf x_i) < c_{\text{robust}} f(\mathbf x_j), \mathbf x_j \in \mathcal{I}$

It aims to filter out corners to get a uniform distribution of corners, selecting the strongest ones while ensuring diversity spatially. We use a robustness factor called c_robust in our suppression condition and use the Euclidean distances between corners so we select ones that aren't too near one another. We aim to maximize the minimum radii, which gives us spatial uniformity amongst corners.

Cory Elevators

Right

Middle

Step 3: Matching these feature descriptors between two images

Now, using the corner coordinates modified/filtered by ANMS, we can extract the feature descriptors. We achieve this by looping over each of the coordinates, padding them with the half window size, resizing the 40x40 window into an 8x8 one, performing bias/gain normalization and then flattening it. This helps us capture the image structure local to the region around each corner.

Then, we can loop over each descriptor and apply the Lowe's Ratio test on the ratio of the distance between the first and second nearest neighbo. If it's below the threshold, we can add the feature descriptor and its nearest neighbor to our modified set. Finally, we will pass in the two resulting lists of modified descriptors to RANSAC in the next step.

Here are the feature descriptors.

Cory Elevators (Before Feature Matching)

Right

Middle

Cory Elevators (After Feature Matching)

Right

Middle

Step 4: Use a robust method (RANSAC) to compute a homography

Now, we can proceed with our 4-point RANSAC algorithm, which will help us find the best homography to warp one image into another. Then, we will proceed with the same distance transforms for mutually exclusive masks and Gaussian and Laplacian stacks (5 levels) to blend as we did in Part A. The RANSAC procedure is as follows (for each iteration):

Randomly sample without replacement 4 pairs of points from the feature matched ANMS coordinates
Compute the homography $H$ from using those 4 pairs
Calculate the resulting set of points by warping the source points with $H$
Calculate the number of inliers, which are points that have a Euclidean distance from our destination points $<$ a threshold that we choose
Update our best $H$ best_H and chosen (largest so far) inlier set best_inliers if the number of inliers from this current iteration exceeds that of the current best_inliers

We can then proceed to the rest of the algorithm to create our mosaic by using the best_H we got from RANSAC.

Step 5: Produce a mosaic

Here are the manually and automatically stiched results. In Part A, I created mosaics of 3 images but here in Part B, I used two images, specifically the middle and the right, since the exposure difference was much smaller.

Cory Elevators (RANSAC `threshold` = 4)

Manually Stitched Mosaic

Automatically Stitched Mosaic

Cory Hallway (RANSAC `threshold` = 4)

Manually Stitched Mosaic

Automatically Stitched Mosaic

Blake Street Facing North (RANSAC `threshold` = 4)

Manually Stitched Mosaic

Automatically Stitched Mosaic

What I learned

It was nice to progress from manually selecting points in Part A - something as fundamental as system of equations could be used for such cool computer vision techniques - and seeing the math behind the auto-detection in Part B that could automate the entire process.