Image Data Augmentation for Facial Recognition
This article is a brief outline on various state-of-art techniques used for face data augmentation.
Introduction
The quality of training set has a great impact on results of deep learning based face related tasks. Collecting and labeling adequate samples with high quality and balanced distributions is a laborious task. Various augmentation techniques are mentioned below, with special emphasis on deep learning based approaches. Although these techniques are related to facial recognition, they can be generalized to other areas. The human face is one of the greatest subjects of interest, so a lot of research has gone into the area.
Among the various augmentation techniques, GANs have been recognized as a more powerful and effective tool in recent years.
In the latter part of the article, various evaluation metrics are discussed, along with potential future directions.
Transformation Types
Transforming the base image data set is an important task that increases the generalized performance of a model. For facial data, the transformations take into consideration the various combinations regarding how an individual may look on a given day. These techniques are outlines below, along with a brief explanation of techniques deployed for each transformation type.
Generic
The generic data augmentation techniques can be divided into two categories: geometric and photometric. These methods have been applied to various learning-based computer vision (CV) tasks.
Geometric
Geometric transformation alters the geometry by transferring image pixels to new positions. This includes rotation, reflection, translation, flipping, etc.
Photometric
Photometric transformation alters the RGB channels by shifting pixel colors to new values. Thee main approaches include
- Gray scaling — flattening the color channels to black and white colors
- Color jittering — includes many different manipulations, such as inverting, adding, decreasing and multiply
- Filtering — edge enhancement, blurring, sharpening, etc.
- Lighting perturbation — changing environment lighting
- noise adding — changing granularity of pixels
- Vignetting — edge softening or shading
- Contrast adjustment — applying color fixations as layers
Component
Hairstyle
Hairstyle affects face detection and recognition due to the occlusion and appearance variation of face caused by it.
Various methods such as composition methods overlaying potential hair style types on a person, as well as GANs have been used for hairstyle transfer. DiscoGAN and StarGAN are the two most prominent methods to perform this task using cross-domain and multi-domain translations, respectively.
Makeup
An individual’s facial color composition can change day to day. To augment using makeup or facial color contrast data, facial makeup transfer aims to shift the style from given reference to a target.
BeautyGAN (results shown in image) applies a dual GAN, and incorporates global domain-level loss and local instance-level loss which essentially means the transfer is applied in a context aware fashion.
Accessory
The removal and wearing of accessories including glasses, earrings, nose ring, etc. Among all these, glasses are most commonly seen.
The above image shows a method based on residual image. This is defined as desired output - input image = Target
GANs have also been used in the area where the Info-GAN learned disentangled representations of faces in a n unsupervised manner and was able to modify presence of glasses.
Attribute
Pose
The large variation in head positions in real life scenarios proposes a big challenge in face detection and recognition tasks. This is because self-occlusion and texture variation take place when head pose changes.
Generative models are widely used by recent works to synthesize faces with arbitrary poses. PixelCNN architecture has been applied to generate new portraits with different poses conditioned on pose embedding. Simply put, a set of poses are broken down into their latent variables and used for arbitrary pose generation.
Another GAN, X2Face, can perform the equivalent of a face transfer from a source face to a target face. This retains the identity of the target face that is to be transferred onto the source face dictating orientation.
There are various other GANs that have been utilized for pose transformation such as TP-GAN and FF-GAN
Expression
The facial expression synthesis and transfer technique is used to enrich the expressions. The 2D and 3D based algorithms came earlier than learning-based methods. Their advantage is that they do not need a large amount of training samples.
Besides the use of CNNs, wide application of generative models of autoencoders and GANs has been prominent.
ExpreGAN has been used for photo realistic facial expression editing with controllable expression intensity.
Age
These methods aim to synthesize faces of various ages and preserve personalized features at the same time.
The traditional methods of age transfer include the prototype-based method and model-based method.
- The prototype based method creates average faces for different age groups, learns the shape and texture transformation between these groups, and applies them to images for age transfer.
- The model-based method learns the mappings between a younger and older face and applies them to new instances
More recent works applied GANs with encoders for age transfer. The input images are encoded into latent vectors, transformed in the latent space, and reconstructed back into images with a different age.
Transformation Methods
Basic Image Processing
This category encompasses traditional 2D and 3D based image processing methods
Model-based Transformation
Here, a contour map of a face is used to map an actual person’s face onto that map in order to generate new orientations and expressions on the subject in question.
Realism Enhancement
Although modern computer graphics techniques provide powerful tools to generate virtual faces, it remains difficult to generate a large number of photorealistic samples. This is due to the lack of accurate illumination and complicated surface modeling, which are all factors essential to identity.
Single path and dual path GANs have been utilized for this purpose.
The above image shows application of a dual-path GANs, which contains separate global generator and local generator for global structure and local details generation.
The recognition on a real face can be improved by feeding the defective generated faces into a generator for realism enhancement. The associated discriminator acts to minimize the gap between real and virtual domain. This leads to identity preservation.
Generative Based
The generative models provide a powerful tool to generate new data from modeled distribution by learning the data distribution of the training set.
There are various sub-categories under Generative Models such as: Autoregressive Generative Models, VAEs, Flow-based, but we will mostly focus on GANs here.
The below images shows Data Augmentation Generative Adversarial Network (DAGAN) which is a basic framework based on conditional GAN (cGAN). Researchers tested its effectiveness on vanilla classifiers and one shot. Many face data augmentation researchers followed this architecture and extended it to a more powerful network.
The DAGAN contains a generator network and a discriminator network. During the generation process, an encoder maps the input image into a lower-dimensional latent space. Subsequently, a random vector is transformed and concatenated with the latent vector (white block on left side of the image). The long vector is passed to the decoder to generate an augmentation image. Right: In order to ensure the realism of the generated image, an adversarial discriminator network is used to tell apart the generated images from the real images. This helps in optimizing of the generator output towards greater realism.
CycleGAN is a general-purpose solution for image-to-image translation. It learns a dual mapping between two domains simultaneously with no need for paired training examples. This is because it combines a cycle consistency loss with adversarial loss. The cycle consistency loss step tries to optimize the conversion of the generated image back to the original image. This ensures that the contextual features are preserved in this GAN.
The cycleGAN is able to complete and complement an imbalanced dataset more efficiently.
Evaluation Metrics
The two main methods of evaluation are the qualitative evaluation and quantitative evaluation.
Qualitative are mostly based around obtaining feedback from people.
Qualitative methods include:
Distance Measurement
- L1 and L2 norm for color and spatial distance
Accuracy and Error Rate
- Require balanced dataset
Inception Score
- Relative entropy of two probability distributions
Frechet Inception Distance
- Between multivariate Gaussian distributions of real and generated
Conclusion
In conclusion, we talked about various data augmentation techniques that can be generalized to other domains. The importance of GANs in this space is growing as data collection and generation is a laborious task. The application of GANs mixed with some of the traditional techniques, can help cut costs and drive results within any organization.
Reference
This article is based on a survey paper titled “A Survey on Face Data Augmentation” written by Wang, et al. which can be found at https://arxiv.org/pdf/1904.11685.pdf