Multiple Heads are Better than One: Few-shot Font Generation with Multiple Localized Experts

Notice

Contact

Recent Posts

Recent Comments

Link

« 2024/11 »
일	월	화	수	목	금	토
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30

Tags more

Archives

Today

Total

관리 메뉴

Deeper Learning

Multiple Heads are Better than One: Few-shot Font Generation with Multiple Localized Experts 본문

AI/Deep Learning

Multiple Heads are Better than One: Few-shot Font Generation with Multiple Localized Experts

Dlaiml 2021. 9. 19. 18:47

ABSTRACT

few-shot generation(FFG)는 2가지 조건을 만족하여야 한다.
- target char의 global structure 보존
- diverse local reference style의 표현
explicit 하게 component label을 주지 않고 MX-Font는 multiple style features를 추출한다.
multiple expert들이 different local concepts을 표현한다.
각 expert들이 different local style에 특화될 수 있도록 weak supervision방식으로 component label을 이용한다.
각 expert들에게 component assign problem을 graph matching problem으로 정의, Hungarian Algorithm으로 이를 해결하였다.
content-style disentanglement를 위해 independence loss, content-style adv loss를 사용하였다.
FFG에서 MX-Font는 SOTA를 달성하였다

Introduction

FFG는 적은 reference image로 fine-tuning 없이 글자를 생성하는 task

glyph-rich scripts에서 font-design은 매우 노동집약적이며 시간이 오래 걸린다.
high-quality font design을 위해 충족되어야할 조건 2가지
- generated glyph은 target char의 구조를 유지하여야 한다. (local component의 작은 손상도 char의 뜻이 바뀔 수 있다. 바 →마)
- reference glyph의 diverse local style을 가져야 한다. (굵기, 크기, serif 등)
두 조건을 만족하기 위해서 현재 까지의 방법들은 content, style disentangle을 사용했다.
Disentangle을 정확하게 해내는 것이 매우 어려운 과제, 현존 방식들은 diverse local style을 잡아내거나 global structure를 보존하기에 충분치 않다.

FFG methods를 2가지로 분류가능
- Universal style representation methods
  - 각 스타일에서 단 하나의 style representation을 추출
  - Glyph 이미지가 복잡하기 때문에 diverse local style을 표현하지 못할 때가 많다
- Component-conditioned methods
  - compositionality를 활용: character를 sub-character(component)로 분리
  - preserve local component information을 보존함
  - encoder가 target domain에 의해 한정되며 unseen component / cross-lingual font generation이 어렵다
Multiple Localized eXperts Few-shot Font Generation Network (MX-Font)를 제시
- multiple local style을 캡처
- 특정 언어에 국한되지 않음
- multi-head encoder (multiple localized experts)를 사용
  - 각 localized experts는 주어진 glyph image에서 다른 local sub-concepts에 특화됨
  - 이전의 component-conditioned와 달리 explicit 하게 하나의 experts가 특정 component를 담당하는 것이 아닌 weakly supervised 방법을 사용하여 experts들은 각각 다른 local concepts을 담당하게 된다.
  - 여러 experts가 같은 local component에 집중하는 방식으로 학습되는 것을 막기 위해 graph matching problem으로 정의하고 Hungarian algorithm을 사용하여 해결하였다.
- component-conditioned의 장점을 가져가면서 multiple local features개념을 사용하여 style feature를 추출할 때 language에 국한되는 단점을 해결하였다.
- FFG의 2개의 시나리오에서 SOTA 달성
  - In-domain transfer: Chinese fonts로 학습, Chinese unseen fonts 생성
  - zero-shot cross-lingual transfer: Chinese fonts로 학습, Korean font 생성
Related Works
- DM-Font의 Related Works와 동일
- DM-Font는 explicit component labels이 필요하기 때문에 cross-lingual font generation에 제한이 있음

Method

Model architecture

모델은 크게 3개의 모듈로 구성되어있다.

k-headed encoder (localized experts): $E_{i}$
generator: $G$
style and component feature classifiers: $Cls_{s}, Cls_{u}$

localized expert $E_{i}$

위 이미지에서 초록색 박스 부분

$$f_i = E_i(x) \in ℝ^{d \times w \times h} $$

glyph image x를 local feature $f_{i}$로 encoding, i는 num of Experts
d는 feature dimension, w*h는 spatial dimensions
local style feature, local content feature는 local feature f를 2개의 linear weight $W_{i,s}$, $W_{i,c}$ (d x d)를 사용하여 계산한다.

component label을 주어 k개의 local features를 학습하는 것이 아닌
number of the localized experts = k를 6으로 설정하였다.

feature classifiers $Cls_{s}, Cls_{u}$

$f_{s,i}, f_{u,i}$ 를 supervise 하기 위함, style labels(font_labels), component labels를 분류하는 모델
training step에만 사용
style label $y_s$는 font index, component label $y_c$는 component label $U_c$ 를 사용

decomposed component label을 만드는 방식은 LF-Font와 같음.
이전 methods가 style, content를 학습하기 위해서만 style, content classifier만 사용한 것과 달리 추가로 content-style disentanglement를 위한 content-style adversarial loss를 계산하는데 style, content classifier를 활용

generator $G$

glyph image $\tilde{x}$ 를 각 expert 들의 content, style features를 사용하여 생성함
$\cdot$ 은 concatenate

2. Learning multiple localized experts with weak local component supervision

복잡한 glyph image에서 different localized features를 추출하면 detailed local structure와 fine-grained local style을 각 local feauture가 잘 나타낼 수 있다
일반화 성능을 위해 의도적으로 explicit component dependency를 부여하지 않았음
multi-headed feature extractor인 multiple localized experts는 각각 다른 local concept에 특화되어있다. (서로 다른 항목에 집중)
component set의 정보를 주지만 위치를 알려주지 않는 weak component-level label
component , style classifier를 사용하여 각 expert가 다른 local concept에 집중할 수 있도록 학습시킨다.
위 그림과 같이 experts가 component set보다 적을 경우 1대 1로 매핑되는 것이 아니고 각각 experts가 다른 부분에 집중한다.

위 그림에서 expert $E_{i}$와 $u_{j}$를 연결하는 edge는 component classifier의 prediction probability
prediction값의 합이 최대인 edge set을 구하는 것이 목표
k = num of experts, m = num of components
k = 3, m = 4에서 max(k,m) = 4, edge set은 최대 4개의 edge를 포함한다.
Weighted Bipartite B-Matching problem으로 매칭 문제를 정의하며 이는 Hungarian algorithm으로 최적해를 구할 수 있다

given glyph image x에서 각 expert $E_{i}$는 content feature $f_{c,i}$를 추출한다.

component feature classifier cls는 content feature를 input으로 받아 각 component에 대한 prob을 계산한다.
allocation variable $w_{ij}$
- component j를 $E_{i}$가 담당하면 1의 값
- 그렇지 않을 경우 0의 값을 가지는 binary variable
- edge를 사용한다 / 하지 않는다

$w_{ij}$를 최적화하여 summation of selected probability(number of total allocation = max(k, m))를 최대화한다.
위 식은 Weighted Bipartite B-Matching (WBM) problem으로 변환이 가능하고 O((m+k)^3) 시간 복잡도의 Hungarian algorithm으로 해결이 가능하다.

$w_{ij}$를 사용하여 component classification loss $L_{cls,c}$는 위 식으로 최적화된다.
선택받지 못한 edge에 대한 component classification loss는 0이 됨

각 expert 사이의 independence를 feature similarity measurement인 Hilbert-Schmidt Independence Criterion으로 공식화하였다.
HSIC는 non-negative이며 두 input이 독립일 경우 0의 값을 가지기 때문에 HSIC를 minimize 하는 방법으로 independence criterion 설정
HSIC를 활용하여 $E_{i}$로 추출한 local feature $f_{i}$와 다른 local feature $f_{i'}$가 독립이 되도록 한다.

detail 한 HSIC에 대한 설명은 Appendix 참고

3. Content and style disentanglement

content와 style의 disentanglement를 위해 2가지 loss를 도입
content-style adversarial loss
- domain adversarial network의 method
- extracted content feature가 style을 classify 하는데 쓸모없도록 강제하기
- extracted style feature가 content를 classify하는데 쓸모없도록 강제하기
- style feature는 style label을 올바르게 classify(CrossEntropy)할 수 있어야 하고 content label에 대한 predict probability는 uniform distribution 형태(Entropy가 최대)가 되도록 학습된다.
- content도 마찬가지

- content feature와 style feature가 independence 하도록 independence loss

4. Training

same content를 가진 n개 glyph 선정, same style을 가진 n개 glyph 선정
we let the model generate a glyph with the content label yc and the style label ys. - 직접 모델 돌려 확인해보기
n = 3, 8개의 다른 glyph을 합성, mini-batch size = 24
discriminator와 generative adversarial loss로 높은 퀄리티의 glyph을 생성하도록 함
BigGAN과 같은 high-fedelity GAN과 SOTA Font generation model인 DM-Font, LF-Font를 따라 다음과 같이 loss를 설정
hinge generative adversarial loss $L_{adv}$
feature matching loss $L_{fm}$
pixel-level reconstruction loss $L_{recon}$

5. few-shot generation

$y_s$style의 $n_r$개의 reference glyphs $\{x^{r}{1},...,x^{r}{n_r}\}$
Experts $\{E_i, ...,E_k\}$는 localized style features $[f^{1}{s^r,i},...,f^{n^r}{s^r,i}]$를 (i = 1,...,k) 추출한다.
localized style features의 average로 style representation features를 구한다.

source glyph에서 추출된 content representation과 style representation은 결합되고 Generator를 통과하여 unseen style glyph을 생성한다.

'AI > Deep Learning' 카테고리의 다른 글

Deformable Convolutional Networks (0)	2021.09.25
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks (0)	2021.09.20
Few-shot Font Generation with Localized Style Representations and Factorization (0)	2021.09.19
CS231n - Lecture 8 ~ 13 (0)	2021.09.11
Few-shot Compositional Font Generation with Dual Memory (0)	2021.09.01

'AI/Deep Learning' Related Articles

Comments

Deeper Learning

Multiple Heads are Better than One: Few-shot Font Generation with Multiple Localized Experts 본문

Multiple Heads are Better than One: Few-shot Font Generation with Multiple Localized Experts

ABSTRACT

Introduction

Method

'AI > Deep Learning' 카테고리의 다른 글

티스토리툴바