This project aims to morph one face with another and creating a caricature from a population of faces. Disclaimer: some images, especially triangulations, are not the same dimensions as the raw image because the former was screenshotted vs. saved through code.
First, I took a picture of my own face in a way that mimics the structure of the photo of
George Clooney (taken by
Martin Schoeller). Then, I aligned the faces using code from Project 2 to ensure the images are the same
shape. Then, with some cv2
code I wrote myself, I selected the facial keypoints
on the images of both George and my face to be the correspondence points, including the 4
corner ones (I wrote some code to adjust the selection of corner points).
After aligning
Then I compute the facial keypoints, which I then call scipy.spatial.Delaunay
on
to get the Delaunay triangulation for the mesh, which is how we set up the task of converting
source triangles to destination ones for the image warping.
Now, we can get the average of the correspondnece points and display that triangulation
To find the "mid-way face" between images A and B, we can use the average shape by warping A
and B into that and then average their colors together.
This is the "mid-way face" of George's face with mine.
We take a few steps. We need to pad the last row of matrix A with 0 0 1 and the last element on the B vector with 1 in order to accomodate translations. Without this pad, we would only be able to allow rotation, scaling, and shearing transformations.
I sourced my images from the
FEI database. I
chose to use the dataset of spatially normalized images (not cropped) – 2 subsets of 100
non-smiiling images and 100 smiling images - because we are also given 46 annotated points for
each of the 200 images. None of the 46 points were corners so I needed to manually add the
corner points, which was simple to do through code since each face took up the entire image so
we could just programmatically append the corners directly to the list of 46. Without adding
corner points, we wouldn't account for the Delaunay triangles of everything outside the region
of the face, which would be black in the resulting image.
Then, I computed the average face shape by subsets of non-smiling and smiling.
Next, I morphed each of the faces in the dataset to the corresponding subset's average shape. Here are the raw images shown in the first and third row and the warped images on the second and fourth row.
In these images above, we can note a few key changes: both faces in 25a and 25b are more aligned so that this person looks more towards the center. Their faces in the raw images are slightly angled toward the left of the audience. The faces in the raw 50a and 50b are also slightly angled upwards. Warping them to the average shape adjust it so that that person's faces are more front-facing. Some changes are more subtle and stand out less like the faces of 75a and 75b taking up more space in the warped images.
This is the mean image for each subset.
Now, we can warp my face into the average geometry - we take the non-smiling population to show as an example - and warp the average face of the non-smiling subet into my geometry.
To get a caricature of my face by extrapolating from the population mean in part 4, I first
cropped the image of myself to better suit the style of the images in the FEI datasets I used,
specifically, removing a big chunk of the top part of the image so less of my hair shows. Most
of the FEI database images don't fully show the person's forehead, which makes sense since our
dataset doesn't have annotated keypoints close to that area of the image.
This is the image cropped and downscaled to get the same shape of
(300, 250, 3)
as the FEI ones.
To get the caricature points, we first annotate the points on my face by going in the same
order as the annotated points from each image in the FEI dataset - the positions of these
points are chosen independnently of those in the dataset. Then, we take a scaled (by
alpha
) version of the difference between the points on my face with the mean
points of the subset. Then, we can warp my face using the same method as before to get the
caricature image.
We can see that the lower alphas make my face vertically shorter and wider (compared to the downscaled and cropped input image). This makes sense since my face also looks longer than most of the faces in the FEI dataset. As expected, increasing $\alpha$ past 1 exaggerates the length of my face, stretching it out even more. Large magnitudes of alpha has a greater distortion effect on the image.
I chose to warp and morph my face to and with an average Southeast Asian male face that I got from Pinterest here who looks older than I do.
Here are the input images. I got the middle image by manually cropping and downscaling like I did with my face in Part 5.
This morphed result above isn't too different from my own face. Let's observe a larger change/difference in the morph by using an average Chinese female vs. male face. I found this face at this link on Pinterest, too.
Using the same code, here are the input images and the generated warps and morph.
Here are the triangulations of both average Chinese male and female faces.
Using sklearn.decomposition.PCA
with n_components=0.95
, I computed a
PCA basis for the face space of the FEI non-smiling dataset. Here are the eigenfaces with the
top 10 largest singular values.
Now using the top 32 PCA components, we can compare the caricatures of raw images projected to the PCA basis vs. those in the original basis. Here, we can see that PCA performs well by capturing the unique facial features, specifically the eys, and changes of color intensity among features and including less noise. The caricature in the original basis lacks eye and hair color, as both colors look the same as the person's facial skin color. Using the original basis also makes us amplify each pixel without any filtering, so the exaggeration doesn't produce images with appealing or with contrast as noticeable as the images produced by using the PCA basis.
Lastly, we can generate random faces with the following process. First, we randomly sample weights from $[-1.0, 1.0]$ and use them to scale the top 32 principal components to reconstruct images. Then, we take the sum of this weighted average. From the beginning, we reshape images before we display them since projecting onto the PCA basis gives us the flattened 1D vector representation of our images. Some random faces can be seen below.