technology

Age-gender estimation

This paper proposes a spatial frequency domain critic with an alternating training strategy, effectively preserving age and gender information in generated images and outperforming existing methods.

작성자
AI Research Team | Se Hun Kim

This paper was published by KAIST IVY Lab. and Genesis Lab from the IITP(Institute of Information & Communications Technology Planning & Evaluation) 2018 ICT R&D Voucher project.

Original Paper

Adversarial Spatial Frequency Domain Critic Learning for Age and Gender Classification

Proposed method

The main idea in this paper is to synthesize the age and gender, dominantly revealed by spatial frequency domain, into a generated image. Another technique practiced was calculating the loss by alternating learning the age and gender. Details of that are as follows:

1. Encoder-Generator

The encoder-generator is similar to DCGAN. The encoder extracts features from the real image input from the CNN network, and the generator creates a fake image using the input values. The difference from DCGAN, however, is that here the output value of the encoder is synthesized with the label of age and gender to be used as the input value of the generator. The generator receives the age and gender and attempts to create a face image with those two pieces of information taken into account.

2. Adversarial Spatial Frequency Domain Critic

Adversarial spatial frequency domain critic plays the role of maintaining age and gender characteristics while reducing the noise and identifying the appearance of the generated image.

The public data sets shown in Fig. 2. (a) and (b) were classified by age and gender and then calculated for the average. Then when the gradient of the CNN activation was studied, different areas had been activated. The activation classified by age as shown in Fig. 2. (c) and (d), the texture such as wrinkles stood out, while in the activation classified by gender, the landmarks around the face such as the nose, eyes, and mouth stood out.

**Fig. 2.** Average image and activation by class

The characteristic values of age and gender, as shown in the images (a) and (b) in Fig. 3 which are obtained by multiplying the activation of each classification and the Fourier transformed images, have revealed dominantly in different spatial frequency domains.

**Fig. 3.** (a) shows the results after multiplying the spatial frequency. Fig. 2 (a) and Fig. 2. (c), (b) shows the results after multiplying the spatial frequency in Fig. 2. (b) and Fig. 2. (d)

Fourier transform is a mathematical transform that decomposes functions depending on space or time into functions depending on spatial or temporal frequency. It is also used to filter for specific characteristics by selecting only the desired frequency.

We use this characteristic of spatial frequency to preserve the characteristics of age and gender while reducing other feature characteristics by creating a mask. Different masks are created for age and gender, as they reduce other feature characteristics by multiplying 1 in the spatial frequency domain where age and gender are prominent, while multiplying a constant between 1 and 0 for other spatial frequency domains.

**Equation 2.** critic loss function formula

3. Discriminator for multi-task classification

In this paper, the proposed discriminator plays two roles. It, similar to the GAN, screens the authenticity of the image, and classify age and gender. The loss function is calculated by role, and age and gender are also calculated separately. Age is classified into 8 classes using the cross-entropy loss function.

4. Alternating learning

The learning to classify age and gender takes place on the same network, but it takes alternately. As seen in Algorithm 1., the encoder-generator learns to reduce the 'loss for encoder-generator' and 'critic loss for gender.' The learning — for encoder-generator, critics, and discriminator — takes place first for gender, and then the same learning for age proceeds. As seen above alternating learning is repeated for every epoch.

Experiment results

The experiment was conducted using Adience benchmark and LFW dataset. The results, from comparing handcraft-based methods and the CNN-based method, showed that the method using masks introduced in this paper showed higher accuracy than any other method. Even without the use of the mask method introduced in this paper, the classification of age showed superior accuracy.

Conclusion

We have confirmed that the proposed spatial frequency domain critic network, and the alternating learning strategy performed better than any other method in classifying age and gender. The generated image created from filtering specific regions of spatial frequency domain preserved age and gender information better, and the ability to classify age and gender was improved further by the alternate learning strategies.

References

[5] Eidinger et al., "Age and Gender Estimation of Unfiltered Faces," IEEE TIFS 2014.

[11] Levi et al., "Age and Gender Classification Using Convolutional Neural Networks," CVPRW 2015.

[12] Hsieh et al., "Multi-Task Learning for Face Identification and Attribute Estimation," ICASSP 2017.

[20] Hassner et al., "Effective Face Frontalization in Unconstrained Images," CVPR 2015.

Age-gender estimation

Original Paper

Proposed method

1. Encoder-Generator

2. Adversarial Spatial Frequency Domain Critic

3. Discriminator for multi-task classification

4. Alternating learning

Experiment results

Conclusion

Read more

Single LLMs Hit a Wall. Here’s What Comes Next

단일 LLM의 한계를 넘어서: Multi-Agent System은 왜 필요한가

Real-Time Labeling Interfaces: Turning Clinical Judgment into AI Training Data

실시간 라벨링 인터페이스: 정신과 임상 판단을 데이터로 구조화한 방법