Age-gender estimation

This paper proposes a spatial frequency domain critic with an alternating training strategy, effectively preserving age and gender information in generated images and outperforming existing methods.

Share
Age-gender estimation

작성자
AI Research Team | Se Hun Kim
This paper was published by KAIST IVY Lab. and Genesis Lab from the IITP(Institute of Information & Communications Technology Planning & Evaluation) 2018 ICT R&D Voucher project.

Original Paper

Adversarial Spatial Frequency Domain Critic Learning for Age and Gender Classification

Proposed method

The main idea in this paper is to synthesize the age and gender, dominantly revealed by spatial frequency domain, into a generated image. Another technique practiced was calculating the loss by alternating learning the age and gender. Details of that are as follows:

1. Encoder-Generator

The encoder-generator is similar to DCGAN. The encoder extracts features from the real image input from the CNN network, and the generator creates a fake image using the input values. The difference from DCGAN, however, is that here the output value of the encoder is synthesized with the label of age and gender to be used as the input value of the generator. The generator receives the age and gender and attempts to create a face image with those two pieces of information taken into account.

2. Adversarial Spatial Frequency Domain Critic

Adversarial spatial frequency domain critic plays the role of maintaining age and gender characteristics while reducing the noise and identifying the appearance of the generated image.

The public data sets shown in Fig. 2. (a) and (b) were classified by age and gender and then calculated for the average. Then when the gradient of the CNN activation was studied, different areas had been activated. The activation classified by age as shown in Fig. 2. (c) and (d), the texture such as wrinkles stood out, while in the activation classified by gender, the landmarks around the face such as the nose, eyes, and mouth stood out.

Fig. 2. Average image and activation by class

The characteristic values of age and gender, as shown in the images (a) and (b) in Fig. 3 which are obtained by multiplying the activation of each classification and the Fourier transformed images, have revealed dominantly in different spatial frequency domains.

Fig. 3. (a) shows the results after multiplying the spatial frequency. Fig. 2 (a) and Fig. 2. (c), (b) shows the results after multiplying the spatial frequency in Fig. 2. (b) and Fig. 2. (d)

Fourier transform is a mathematical transform that decomposes functions depending on space or time into functions depending on spatial or temporal frequency. It is also used to filter for specific characteristics by selecting only the desired frequency.

We use this characteristic of spatial frequency to preserve the characteristics of age and gender while reducing other feature characteristics by creating a mask. Different masks are created for age and gender, as they reduce other feature characteristics by multiplying 1 in the spatial frequency domain where age and gender are prominent, while multiplying a constant between 1 and 0 for other spatial frequency domains.

Equation 1. critic mask formula
Equation 2. critic loss function formula

3. Discriminator for multi-task classification

In this paper, the proposed discriminator plays two roles. It, similar to the GAN, screens the authenticity of the image, and classify age and gender. The loss function is calculated by role, and age and gender are also calculated separately. Age is classified into 8 classes using the cross-entropy loss function.

4. Alternating learning

The learning to classify age and gender takes place on the same network, but it takes alternately. As seen in Algorithm 1., the encoder-generator learns to reduce the 'loss for encoder-generator' and 'critic loss for gender.' The learning — for encoder-generator, critics, and discriminator — takes place first for gender, and then the same learning for age proceeds. As seen above alternating learning is repeated for every epoch.

Algorithm 1.

Experiment results

The experiment was conducted using Adience benchmark and LFW dataset. The results, from comparing handcraft-based methods and the CNN-based method, showed that the method using masks introduced in this paper showed higher accuracy than any other method. Even without the use of the mask method introduced in this paper, the classification of age showed superior accuracy.

Conclusion

We have confirmed that the proposed spatial frequency domain critic network, and the alternating learning strategy performed better than any other method in classifying age and gender. The generated image created from filtering specific regions of spatial frequency domain preserved age and gender information better, and the ability to classify age and gender was improved further by the alternate learning strategies.


References

[5] Eidinger et al., "Age and Gender Estimation of Unfiltered Faces," IEEE TIFS 2014.

[11] Levi et al., "Age and Gender Classification Using Convolutional Neural Networks," CVPRW 2015.

[12] Hsieh et al., "Multi-Task Learning for Face Identification and Attribute Estimation," ICASSP 2017.

[20] Hassner et al., "Effective Face Frontalization in Unconstrained Images," CVPR 2015.

Read more

단일 LLM의 한계를 넘어서: Multi-Agent System은 왜 필요한가

단일 LLM의 한계를 넘어서: Multi-Agent System은 왜 필요한가

단일 LLM으로 복잡한 비즈니스 문제를 해결하는 접근은 현실에서 쉽게 한계에 부딪힌다. 이 글에서는 단일 프롬프트부터 멀티 에이전트 시스템에 이르기까지 AI 아키텍처의 발전 단계를 분석하고, 각 구조가 왜 실패하거나 부족했는지 그 이유를 짚는다. 그리고 그 흐름 속에서 도출되는 멀티 에이전트 스케일링 법칙이 B2B 플랫폼 설계에 어떤 시사점을 주는지 살펴본다.