cGANs with Projection Discriminator

Notice

Contact

Recent Posts

Recent Comments

Link

« 2024/10 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Tags more

Archives

Today

Total

관리 메뉴

Deeper Learning

cGANs with Projection Discriminator 본문

AI/Deep Learning

cGANs with Projection Discriminator

Dlaiml 2021. 8. 7. 16:45

ABSTRACT

conditional information을 projection을 기반으로 GAN의 discriminator에게 전달하는 이론을 제시한다.
기존 cGAN은 concat (embedding)형식으로 conditional information을 사용하고있다.

Introduction

이안 굿펠로우의 GAN은 이미지 생성분야에서 SOTA 알고리즘이다.
GAN의 Discriminator는 생성된 분포인 p_g(x)와 true target 분포인 q(x)의 divergence를 측정한다.
학습하면서 Discriminator (이하 D)가 Generator(이하 G)에게 더 정밀한 측정값을 보내주고 G는 이를 학습하여 target 분포에 가까운 이미지를 생성하는 방향으로 학습된다.
cGAN은 class information을 D와 G에게 주어 알맞은 class conditional image를 생성하도록 학습시키는 모델로 text to image, image to image 생성을 가능하게 하였다.
cGAN은 GAN과 다르게 D가 생성 분포를 특정 label의 true target과 비교한다.
기존 cGAN은 naive하게 conditional information을 input, feature map 또는 middle layer에 concat시키는 형식으로 정보를 implicit하게 전달한다.
우리의 모델은 G를 통과한 feature map을 embedded conditional information과 inner product하는 방식으로 conditional information을 전달한다.
inner product의 영향으로 인해 class information을 무시하는 방식으로 generator가 학습할 수 없게 된다.

위 처럼 다양한 이미지를 생성해내며 random noise z를 고정하고 category information을 interpolation하면 (b)와 같이 class가 변화하는것을 확인하였다.

The Architecture of the cGAN discriminator with a probabilistic model assumptions

input vector를 x, conditional information을 y

D는 discriminator, f는 x와 y의 function, theta는 f의 parameters, A는 activation function
q = true distribution, p = generated output distribution
D의 보통 loss는 아래와 같다 ( A는 sigmoid )

위 수식은 아래와 같이 decomposition이 가능하다.

Motivation Behind the projection Discriminator

수식 유도 과정 ( 논문 참고 )

Comparison with other Methods

Concatenate에 의한 conditional information의 통합은 다소 임의적일 뿐만 아니라 논리적으로 근거를 찾기 어려운 함수들이 후보군에 들어갈 수 있다.
conditional information을 전달하는 다른 방식으로는 loss 함수 자체의 수정이 있는데 AC-GANs은 discriminator와 classifier가 특정 layer를 공유하고 있으며 loss function에 true image와 generated image에 대한 classification loss를 측정하여 더해준다.
Plug and Play Generative models(PPGN)은 auxiliary classifier를 사용하는데 high-quality의 이미지를 생성할 수 있으나, auxiliary classifier가 분류하기 쉬운 형태로 generator가 이미지를 생성하도 학습되어 최종 목표인 q(x|y)를 p(x|y)와 가깝도록 만드는 과제를 수행하지 못하는 문제가 있다. num classes가 커지면 앞의 문제가 두드러지게 나타나며 실제로 1000 categories에 대해 학습한 결과 최종 모델은 거의 모든 class 정보를 무시하고 하나의 이미지만 생성했다.

Experiments

ResNet base discriminator, generator를 사용
discriminator에서 Lipschitz-1 제약을 만족하기 위해 spectral normalization 적용
hinge standard adversarial loss 사용

Adam optimizer 사용: alpha = 0.0002, Beta_1 = 0.5, Beta_2 = 0.9
discriminator를 5번 업데이트 하고 generator를 업데이트
400K iteration 이후 450K iteration에 learning rate가 0이 되도록 linear learning rate decay를 적용하였다.

Generator에 conditional batch normalization을 사용
- https://arxiv.org/abs/1707.00683v3

Modulating early visual processing by language

It is commonly assumed that language refers to high-level visual concepts while leaving low-level visual processing unaffected. This view dominates the current literature in computational models for language-vision tasks, where visual and linguistic input

arxiv.org

위 그림의 (b) - concat, (d) - projection, AC-GANs에 대해 FID, IS를 측정하고 비교하겠다.
projection 모델의 경우 450K iteration 이후 까지 성능의 향상이 계속되었지만 다른 모델구조는 그렇지 못하였다.

Projection 모델은 한 클래스 내에서 노이즈에 따라 다양한 이미지를 생성해냈다. AC-GANs과 concat모델에는 동일한 이미지가 생성되는 mode collapse가 나타났다.
reasonable한 time내에 training되는 것을 우선시 하였기 때문에 단순한 모델을 사용했음에도 좋은 결과를 얻었고 더 복잡한 모델에서는 더 좋은 시각적 퀄리티와 다양한 분포의 생성 이미지를 얻을 수 있다고 생각한다.
class y1과 class y2의 represented vector를 interpolate하면 category mophing이 가능하다. ( Introduction 첨부 사진 )
Super-resolution task에서도 좋은 성능을 보였다.

Fine-tuning with the pretrained model on the ILSVRC2012 classification task.
- PPGNs 논문에서 cost function에 pretrained auxiliary classifier의 loss를 추가하여 시각적으로 더 좋은 이미지를 생성해냈다.
- 하지만 pretrained label classifier가 분류하기 좋은 형태의 image만 generator가 생성하는 방식으로 학습되는 문제를 피하기 위해 fine-tuning 형태로 이를 loss에 추가하였다.
- 400K까지 adv loss만 사용하였고 후에 50K iteration을 classifier loss를 추가하여 fine-tuning 시켰다.
- 결과와 수식은 아래와 같다.

finetuned 결과를 보면 classifier가 분류하기 쉽도록 generator가 image를 생성해내는 것을 볼 수 있다.
visual quality는 좋아졌지만 diversity와 trade-off가 있다.

CIFAR-10, CIFAR-100 데이터셋에서 모델 및 하이퍼파라미터에 따른 성능 측정 결과

Code

http://github.com/yjunej/AdaIN-tf2

GitHub - yjunej/AdaIN-tf2: Simple AdaIN implements using TF2, Based on Paper "Arbitrary Style Transfer in Real-time with Adaptiv

Simple AdaIN implements using TF2, Based on Paper "Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization" - GitHub - yjunej/AdaIN-tf2: Simple AdaIN implements using T...

github.com

Reference

[1] Takeru Miyato et al. CGAN with projection discriminator

[2] Anh Nguyen, Jeff Clune, Yoshua Bengio, Alexey Dosovitskiy, and Jason Yosinski. Plug & play generative networks: Conditional iterative generation of images in latent space. In CVPR, 2017.

'AI > Deep Learning' 카테고리의 다른 글

CS231n - Lecture 5 ~ 7 (0)	2021.09.01
CS231n - Lecture 1 ~ 4 (0)	2021.08.21
Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization (AdaIN) (0)	2021.07.27
XLNet: Generalized Autoregressive Pretraining for Language Understanding (0)	2021.05.08
BERT (Bidirectional Encoder Representations from Transformers) (0)	2021.04.18