How TAWNY is disrupting Emotion AI by using synthetic images for Deep Learning
In the world of AI, an algorithm’s performance greatly depends on having a large amount of high-quality data. Unfortunately, collecting such data was always expensive and time-consuming, giving an advantage to big tech companies with a large reach and deep pockets. Not anymore! Synthetic data is now disrupting the tech-scene and leveling up the playing field.
Synthetic images – or any other kind of synthetic data – are computer-generated images that mimic real-life images. Instead of relying completely on a big pool of annotated images to train our models, at TAWNY, our colleagues Ciprian Corneanu and Simon Kibler, are generating and using synthetic images for training. The generated dataset closely follows the structure of real sample data and reflects its main statistical properties. This method is inexpensive, allows more flexibility and dismisses any privacy concerns posed by using images of real people. Recent results by researchers at the Data to AI Lab at MIT suggest that synthetic data can successfully replace real data in software writing and testing (1).
The basis for creating photo-realistic synthetic images are the so called Generative Adversarial Nets (GANs). Originally proposed in 2014 (2), they have gained increased attention ever since. In recent years, GANs have been used for many different tasks including Text-to-Image Translation, Face Aging, Face Swapping and Face Synthesis!
GANs are composed of two neural networks, called a discriminator and a generator that play a two-player game. The task of the generator is to generate realistic images while the discriminator learns to decide whether an image is generated, or it Is real. As the discriminator gets better at telling generated from real images, the images created by the generator become more and more realistic.
This can be useful in multiple ways. First, we can generate faces for the emotion categories in which our training set is lacking. For example, happy faces are greatly overrepresented in any facial expression dataset. We can balance this by adding synthetic images from other categories. Second, we can vary some coefficients in the generation process to influence attributes like gender, age, ethnicity, etc. This can help alleviate bias in the training set. For example, if there were few elderly Asians in the training set, the final classification model might perform worse on people from that group relative to others. This could lead to prediction bias which is a problem of great concern with AI. By adding synthetic images from underrepresented groups to our training set, we can alleviate this bias and make the final model fairer.
However, there are also some limitations. The diversity of GAN-generated images is always lower compared to true image distribution. Therefore, the number of synthetic images that can be added to the training set is limited and should not exceed a certain ratio. In order to ensure optimal performance, we put metrics into place to measure the quality and diversity of the generated faces and find the “sweet spot” where the synthetic images help the most.
Greetings from our R&D team, didn't they do a great job?!
*Answer to trivia: Faces 2, 4, 5 and 7 correspond to real people, while faces 1, 3, 6 and 8 were synthetically generated
(1) Stephanie, K. (2017, March 3). Artificial data give the same results as real data — without compromising privacy. MIT News. Retrieved October 13, 2020, from https://news.mit.edu/2017/artificial-data-give-same-results-as-real-data-0303
(2) Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., ... & Bengio, Y. (2014). Generative adversarial nets. In Advances in neural information processing systems (pp. 2672-2680). http://papers.nips.cc/paper/5423-generative-adversarial-nets.pdf