Image Data Augmentation for Facial Recognition

Manmeet Singh
7 min readMay 16, 2020

--

This article is a brief outline on various state-of-art techniques used for face data augmentation.

Introduction

The quality of training set has a great impact on results of deep learning based face related tasks. Collecting and labeling adequate samples with high quality and balanced distributions is a laborious task. Various augmentation techniques are mentioned below, with special emphasis on deep learning based approaches. Although these techniques are related to facial recognition, they can be generalized to other areas. The human face is one of the greatest subjects of interest, so a lot of research has gone into the area.

Among the various augmentation techniques, GANs have been recognized as a more powerful and effective tool in recent years.

In the latter part of the article, various evaluation metrics are discussed, along with potential future directions.

Transformation Types

Transforming the base image data set is an important task that increases the generalized performance of a model. For facial data, the transformations take into consideration the various combinations regarding how an individual may look on a given day. These techniques are outlines below, along with a brief explanation of techniques deployed for each transformation type.

Generic

The generic data augmentation techniques can be divided into two categories: geometric and photometric. These methods have been applied to various learning-based computer vision (CV) tasks.

Geometric

Geometric transformation examples created by imgaug using CelebA dataset
Geometric transformation examples created by imgaug using CelebA dataset

Geometric transformation alters the geometry by transferring image pixels to new positions. This includes rotation, reflection, translation, flipping, etc.

Photometric

Transformation using RGB channel manipulation and filter application

Photometric transformation alters the RGB channels by shifting pixel colors to new values. Thee main approaches include

  • Gray scaling — flattening the color channels to black and white colors
  • Color jittering — includes many different manipulations, such as inverting, adding, decreasing and multiply
  • Filtering — edge enhancement, blurring, sharpening, etc.
  • Lighting perturbation — changing environment lighting
  • noise adding — changing granularity of pixels
  • Vignetting — edge softening or shading
  • Contrast adjustment — applying color fixations as layers

Component

Hairstyle

Hairstyle affects face detection and recognition due to the occlusion and appearance variation of face caused by it.

Various methods such as composition methods overlaying potential hair style types on a person, as well as GANs have been used for hairstyle transfer. DiscoGAN and StarGAN are the two most prominent methods to perform this task using cross-domain and multi-domain translations, respectively.

Makeup

Left column is original data. Right is showing various transfers using BeautyGAN

An individual’s facial color composition can change day to day. To augment using makeup or facial color contrast data, facial makeup transfer aims to shift the style from given reference to a target.

BeautyGAN (results shown in image) applies a dual GAN, and incorporates global domain-level loss and local instance-level loss which essentially means the transfer is applied in a context aware fashion.

Accessory

Glasses transfer on-to (three left columns) and removal (three right columns)

The removal and wearing of accessories including glasses, earrings, nose ring, etc. Among all these, glasses are most commonly seen.

The above image shows a method based on residual image. This is defined as desired output - input image = Target

GANs have also been used in the area where the Info-GAN learned disentangled representations of faces in a n unsupervised manner and was able to modify presence of glasses.

Attribute

Pose

orientation translation using a combination of FT-GAN with CycleGAN for pixel transformation

The large variation in head positions in real life scenarios proposes a big challenge in face detection and recognition tasks. This is because self-occlusion and texture variation take place when head pose changes.

Generative models are widely used by recent works to synthesize faces with arbitrary poses. PixelCNN architecture has been applied to generate new portraits with different poses conditioned on pose embedding. Simply put, a set of poses are broken down into their latent variables and used for arbitrary pose generation.

Another GAN, X2Face, can perform the equivalent of a face transfer from a source face to a target face. This retains the identity of the target face that is to be transferred onto the source face dictating orientation.

There are various other GANs that have been utilized for pose transformation such as TP-GAN and FF-GAN

Expression

Some facial expression synthesis examples using 2D-based, 3D-based, and learning-based approaches. The leftmost images illustrate mesh deformation, modified 3D face model, and input heatmap respectively.

The facial expression synthesis and transfer technique is used to enrich the expressions. The 2D and 3D based algorithms came earlier than learning-based methods. Their advantage is that they do not need a large amount of training samples.

Besides the use of CNNs, wide application of generative models of autoencoders and GANs has been prominent.

ExpreGAN has been used for photo realistic facial expression editing with controllable expression intensity.

Age

These methods aim to synthesize faces of various ages and preserve personalized features at the same time.

Age transfer to various groups

The traditional methods of age transfer include the prototype-based method and model-based method.

  • The prototype based method creates average faces for different age groups, learns the shape and texture transformation between these groups, and applies them to images for age transfer.
  • The model-based method learns the mappings between a younger and older face and applies them to new instances

More recent works applied GANs with encoders for age transfer. The input images are encoded into latent vectors, transformed in the latent space, and reconstructed back into images with a different age.

Transformation Methods

Basic Image Processing

This category encompasses traditional 2D and 3D based image processing methods

Model-based Transformation

(A)The 2D AAM shape variation. (B)The 2D AAM texture variation.

Here, a contour map of a face is used to map an actual person’s face onto that map in order to generate new orientations and expressions on the subject in question.

Realism Enhancement

Although modern computer graphics techniques provide powerful tools to generate virtual faces, it remains difficult to generate a large number of photorealistic samples. This is due to the lack of accurate illumination and complicated surface modeling, which are all factors essential to identity.

Single path and dual path GANs have been utilized for this purpose.

The global and local pathways of a facial feature enhancement generator

The above image shows application of a dual-path GANs, which contains separate global generator and local generator for global structure and local details generation.

The recognition on a real face can be improved by feeding the defective generated faces into a generator for realism enhancement. The associated discriminator acts to minimize the gap between real and virtual domain. This leads to identity preservation.

Generative Based

The generative models provide a powerful tool to generate new data from modeled distribution by learning the data distribution of the training set.

The unit Gaussian distribution is mapped to a generated data distribution by the generative model. And the distance between the generated data distribution and the real data distribution is measured by the loss.

There are various sub-categories under Generative Models such as: Autoregressive Generative Models, VAEs, Flow-based, but we will mostly focus on GANs here.

The below images shows Data Augmentation Generative Adversarial Network (DAGAN) which is a basic framework based on conditional GAN (cGAN). Researchers tested its effectiveness on vanilla classifiers and one shot. Many face data augmentation researchers followed this architecture and extended it to a more powerful network.

DAGAN Architecture

The DAGAN contains a generator network and a discriminator network. During the generation process, an encoder maps the input image into a lower-dimensional latent space. Subsequently, a random vector is transformed and concatenated with the latent vector (white block on left side of the image). The long vector is passed to the decoder to generate an augmentation image. Right: In order to ensure the realism of the generated image, an adversarial discriminator network is used to tell apart the generated images from the real images. This helps in optimizing of the generator output towards greater realism.

The reference and target image domains are represented by R and T respectively. G and F are two generators to transfer R →T and T →R. The discriminators are represented by D(R) and D(T) respectively, where D(R) aims to distinguish between the real images in R and the generated fake images in F(T), and D(T) aims to distinguish between the real images in T and the generated fake images in G(R).

CycleGAN is a general-purpose solution for image-to-image translation. It learns a dual mapping between two domains simultaneously with no need for paired training examples. This is because it combines a cycle consistency loss with adversarial loss. The cycle consistency loss step tries to optimize the conversion of the generated image back to the original image. This ensures that the contextual features are preserved in this GAN.

The cycleGAN is able to complete and complement an imbalanced dataset more efficiently.

Evaluation Metrics

The two main methods of evaluation are the qualitative evaluation and quantitative evaluation.

Qualitative are mostly based around obtaining feedback from people.

Qualitative methods include:

Distance Measurement

  • L1 and L2 norm for color and spatial distance

Accuracy and Error Rate

  • Require balanced dataset

Inception Score

  • Relative entropy of two probability distributions

Frechet Inception Distance

  • Between multivariate Gaussian distributions of real and generated

Conclusion

In conclusion, we talked about various data augmentation techniques that can be generalized to other domains. The importance of GANs in this space is growing as data collection and generation is a laborious task. The application of GANs mixed with some of the traditional techniques, can help cut costs and drive results within any organization.

Reference

This article is based on a survey paper titled “A Survey on Face Data Augmentation” written by Wang, et al. which can be found at https://arxiv.org/pdf/1904.11685.pdf

--

--

Manmeet Singh
Manmeet Singh

Written by Manmeet Singh

Software engineer and an aspiring data scientist

No responses yet