Notice

Contact

Recent Posts

Recent Comments

Link

« 2025/04 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

Tags more

Archives

Today

Total

관리 메뉴

Deeper Learning

U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation 본문

AI/Deep Learning

U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation

Dlaiml 2021. 10. 27. 09:15

Abstract

end-to-end로 학습 가능한 normalization function과 어텐션 모듈을 사용한 unsupervised image-to-image translation method를 제시
어텐션 모듈은 auxiliary classifier에 의해 얻은 attention map을 사용하여 중요한 region에 모델이 집중하도록 guide
이전 attention-based method와 다르게 두 도메인 간 geometric change가 가능
dataset에서 얻은 paramters로 shape와 texture 변화량을 flexible 하게 변화시킬 수 있는 AdaLIN(Adaptive Layer-Instance Normalization)
기존 SOTA보다 좋은 퀄리티

1. Introduction

img-to-img translation은 두 도메인간 이미지 매핑 함수를 학습하는 것이 목표
- 넓은 분야에 적용가능한 image-to-image에 대해 많은 연구가 진행되었음
- image inpainting, super resolution, colorization, style transfer
- 기존 연구는 texture, color를 mapping 하는 style transfer는 성공적이었으나 정렬되지 않은 wild image의 큰 shape의 변화 task는 그렇지 않았다
- data 분포의 복잡성을 줄이기위해 crop과 alignment 같은 전처리가 필요했다.
- 어텐션 모듈, 학습가능한 normalization function을 포함한 unsupervised image-to-image translation을 제시
- auxiliary classifier를 사용하여 두 도메인을 구별하며 얻은 어텐션 맵으로 어느 부분에 집중해야 할지 모델에게 guide
- 어텐션 모듈은 generator와 discriminator에 포함되어 있으며 의미있는 영역에 집중하도록 하여 shape transformation을 achieve
- Batch-Instance Normalization (BIN)에서 영감을 받아 AdaLIN을 제시
  - training 과정에서 adaptive하게 IN과 LN(layer normalization)의 비율을 조정
  - shape와 texture 변화량을 조절하여 flexibly 한 모델을 만든다.

2. Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization

목표는 source domain에서 target domain으로 매핑해주는 함수 학습
$G_{st}$ 는 source domain → target domain, $G_{ts}$ 는 target domain → source domain, 이 두 함수를 학습시킨다.
Discriminator는 $D_s, D_t$
Discriminator의 어텐션 모듈은 실제같은 이미지를 만들기 위해 가장 중요한 region에 집중할 수 있도록 Generator를 가이드한다.
Generator의 어텐션 모듈은 다른 domain과 구분되는 region에 집중한다.

2.1 Model

2.1.1 Generator

$X_s, X_t$ 은 각 source, target domain의 image sample
translation model $G_{st}$ 는 encoder $E_s$ , decoder $G_t$ , auxiliary classifier $\eta_s$
$E_s^{k_{(ij)}}$ 는 encoder k-th activation map의 (i, j) position의 값을 말한다.
CAM에서 영감을 받아, auxiliary classifier는 global average pooing, global max pooling을 사용하여 source domain의 k-th feature map에 부여하는 가중치 $w_s^k$ 를 학습한다.

$\mu_I, \mu_L, \sigma_I, \sigma_L$ 은 channel-wise, layer-wise mean, std를 말하며, $\gamma, \beta$ 는 fc layer의 output인 parameters
$\tau$ 는 learning rate, p는 0~1로 parameter update step에 제약이 걸려있다.
p는 LN이 중요한 task에서는 0으로 IN이 중요한 task에서는 1에 가까워지게 조정된다. decoder의 residual block에서는 p값을 1로, upsampling block에서는 0으로 초기화한다.
Whitening Coloring Transform(WCT)가 content, style feature transfer의 optimal method이나 계산비용이 매우 크다.
AdaIN은 훨씬 빠르나 sub-optimal to WCT로 feature channels 간 uncorrelation의 가정이 있기 때문에 transferred feature는 content보다 조금 더 많은 feature 포함하고 있다. (correlation 가정이 없기 때문에 채널 간 독립적 → 같거나 더 많은 feature를 포함하게 됨)
LN은 채널 간 uncorrleation을 가정하고 있지 않고 global statistic만 고려하기 때문에 original domain의 content structure를 때때로 보존하지 않을 수 있다.
이를 극복하기 위해 AdaIN과 LN을 혼합한 AdaLIN을 사용하여 선택적으로 content information을 보존하거나 변환시킬 수 있도록 하여 넓은 범위의 image-to-image translation problems을 해결하였다.

2.1.2 Discriminator

$X_t, G_{st}(X_s)$ 는 각각 target domain의 sample, translated source domain을 나타낸다.
discriminator $D_t$ 는 encoder $E_{D_t}$ , classifier $C_{D_t}$ , auxiliary classifier $\eta_{D_t}$ 로 구성되어있다.
auxiliary classifier와 discriminator는 x가 real / generated 인지 구분하며 학습된다.

Discriminator에서도 마찬가지로 Encoder의 각 layer에 auxiliary classifier를 사용하여 학습된 가중치 $w_{D_t}$ 를 부여한다
$D_t(x)$ 는 $C_{D_t}(a_{D_t}(x))$ 로 표기 가능하다.

2.2 Loss Function

3. Experiments

CycleGAN(2017), UNIT(2017), MUNIT(2018), DRIT(2018), AGGAN(2018), CartoonGAN(2018)과 제시한 모델을 비교
256x256 resolution image를 사용

4. Conclusions

어텐션 모듈과 AdaLIN을 사용하여 다양한 데이터셋에서 하나의 Network로 좋은 결과를 내는 unsupervised image-to-image translation, U-GAT-IT을 제시
auxiliary classifier로 도출한 어텐션 맵이 target domain과 source domain을 구분하는 region에 집중함을 분석을 통해 알 수 있다
AdaLIN을 사용하여 각기 다른 geometry, style 변화가 있는 다양한 dataset에서 좋은 성능을 보임
state-of-the-art GAN-based models for unsupervised image-to-image translation model

정리

geometry, color, texture style transfer가 가능한 unsupervised image-to-image translation model
fc layer를 통해 뽑아낸 AdaLIN과 어텐션 모듈을 사용하여 두 도메인을 구분할 수 있는 주요한 영역에 집중하도록 모델을 유도
GAN-based image-to-image translation SOTA

Reference

[1] Junho Kim et al. (2019). U-GAT-IT: Unsupervised Generative Attention Networks with Adaptive Layer-Instacne Normalization for Image-to-Image Translation. ICLR, url

'AI > Deep Learning' 카테고리의 다른 글

Full Stack Deep Learning Lecture 5 ~ 7 (0)	2021.10.30
Full Stack Deep Learning - Lecture 4 (0)	2021.10.30
CBAM: Convolution Block Attention Module (0)	2021.10.24
Squeeze-and-Excitation Networks (SE-Net) (0)	2021.10.14
Attention is all you need: Transformer (0)	2021.10.10

'AI/Deep Learning' Related Articles

Comments

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

Deeper Learning

Deeper Learning

U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation 본문

U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation

Abstract

1. Introduction

2. Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization

2.1 Model

2.1.1 Generator

2.1.2 Discriminator

2.2 Loss Function

3. Experiments

4. Conclusions

정리

'AI > Deep Learning' 카테고리의 다른 글

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역