Notice

Contact

Recent Posts

Recent Comments

Link

« 2025/04 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

Tags more

Archives

Today

Total

관리 메뉴

Deeper Learning

EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks 본문

AI/Deep Learning

EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks

Dlaiml 2021. 9. 20. 21:01

Abstract

CNN모델에서 depth, width, resolution 측면에서 세심한 밸런싱을 통해 더 좋은 성능을 이끌어낼 수 있다.
간단하며 효과적인 compound coefficient을 통해 depth, width, resolution의 dimension을 uniform 하게 scaling 하는 방법을 제시한다.
EfficientNets은 기존 ConvNets의 성능을 뛰어넘었으며 EfficientNet-B7은 ImageNet에서 84.3% top-1 acc로 SOTA를 달성하였다.
기존 최고 성능 모델과 비교하였을 때 inference에서 8.4x smaller, 6.1x faster
Transfer learning에도 좋은 성능을 보였다. (CIFAR-100, Flowers SOTA)

1. Introduction

ResNet-200, GPipe 등 scale up은 모델의 성능을 높여준다

Depth, Width, Resolution을 fixed scaling coefficients를 사용하여 uniform하게 scale up 하는 compound scaling method를 제시
input image가 커지면 receptive field를 키우기 위해 모델은 더 많은 layer를 필요로 한다는 점에서 compound scaling method를 이해할 수 있다.
이전 연구들은 network의 depth와 width의 relationship을 입증하였다.
EfficientNet 연구진은 처음으로 width, depth, resolution의 관계를 quantify하였다.
모델 scaling의 효율은 Baseline network에 따라 크게 달랐다.
- ResNet, MobileNet에서 좋은 성능을 보였다.
- Neural Architecture Search(Zoph & Le, 2017; Tan et al., 2019)를 사용하여 new baseline network을 선정하였다.

2. Related Work

높은 정확도의 CNN Model은 parameter가 매우 많으며 memory limit 해결이 필요하다
Efficiency를 높이기 위해 MobileNet, SqueezeNet, ShuffleNet, Neural Architecture Search 등 여러 연구가 진행되었다
Depth Scaling (#layers)
- ResNet-200
Width Scaling (#channels)
- WideResNet
- MobileNets
width와 depth가 cnn의 expressive power에 매우 중요하다는 연구가 여럿 있었으나 효과적으로 scaling 하는 방법에 대해서는 아직 완벽한 답변이 없었다 (open question)

3. Compound Model Scaling

3.1. Problem Formulation

CNN 모델 구조는 위와 같은 수식으로 나타낼 수 있다.
X는 input이며 H, W, C는 shape를 나타내며 F는 layer를 통과할 때 계산되는 모든 수식을 종합한 함수로 생각한다.
$F_{i}^{L_{i}}$ 는 i번째 stage에서 $L_{i}$ 번 $F_{i}$ 가 반복된다는 표기
ConvNet은 보통 spatial dimension인 width와 height가 줄어들며 depth인 channel dimension이 커지는 형태다. (ex. input <224,224,3> → output <7,7,512>)
매 stage 마다 width, height, channel, layer 반복 횟수 (L_i)를 모두 설정하는 데는 design space가 너무 크다.
constant ratio로 모두 uniform scaling

w, d, r은 각각 width, depth, resolution을 담당하는 scaling coefficients
hat이 추가된 기호들은 모두 Baseline network의 predefined parameter, 아래는 EfficientNet-B0의 parameter

3.2. Scaling Dimensions

Depth

Deeper CNN 모델은 더 풍부하고 복잡한 feature를 잡아낼 수 있으며 일반화 성능 또한 좋다.
하지만 vanishing gradient 문제를 직면
skip-connection, BN으로 문제를 해결하고자 하였으나 ResNet-1000은 Res-Net101과 성능이 비슷하다.
너무 깊은 모델의 경우 오히려 정확도가 감소함을 관찰

Width

small size 모델에서 주로 scaling
wider network는 더 fine-grained feature를 잡아내며 학습이 쉽다.
wide & shallow network는 high level features를 잡아내지 못한다.

Resolution

higher resolution에서 ConvNet은 fine-grained pattern을 잡아낸다
최근 ConvNet은 224x224보다 299x299 or 331x331를 사용
GPipe는 480x480으로 SOTA를 달성
resolution을 scale up 해서 얻을 수 있는 정확도는 resolution이 커질수록 점점 작아짐

→ Observation 1: depth, width, resolution 모두 scale up 하면 정확도가 높아지지만 정확도 상승률은 점점 감소한다.

3.3. Compound Scaling

depth, resolution을 고정하고 width를 조정하였는데 depth와 resolution에 따라 수렴의 속도와 정확도가 모두 달랐음

→ Observation 2: 정확도와 효율을 높이기 위해 width, depth, resolution scaling balance가 매우 중요함

$\phi$ 는 compound coefficient로 width, depth, resolution을 uniformly coefficient를 scaling 한다.
모델의 연산량은 d, w^2, r^2에 비례한다.
scaling에 의해 모델의 연산량은 총 (a * b * r^2) ^ phi 만큼 증가한다.( = 2^phi)

4. EfficientNet Architecture

Baseline model의 layer 구조 자체를 바꾸는 것이 아니기 때문에 Baseline model이 매우 중요하다.
scaling method의 장점을 최대한 활용하기 위해 mobile-size baseline, EfficientNet을 사용하였다.
MNasNet와 같은 search space 설정, 아래 식과 같은 optimization goal 설정으로 Neural Architecture Search를 통해 Baseline model을 설정

EfficientNet-B0은 위의 방식으로 찾은 모델로 MnaseNet과 비슷한 형태, main block은 mobile inverted bottleneck MBConv + squeeze-and-excitation optimization (MobileNet V1, V2 참고)

EfficientNet-B0을 Baseline으로 scale up, phi를 0으로 고정하고 알파, 베타, 감마를 1.2, 1.1, 1.15로 설정 (grid-search)
phi를 1~7로 늘리면 그에 따라 scale up 되는 모델이 EfficientNet-B1 ~ EfficientNet-B7

5. Experiments

config & skills

RMSProb
0.9 lr decay
batchnorm momentum 0.99
weight decay 1e-5
lr 0.256, 0.97 decay every 2.4 epochs
SiLU activation, dropout, stochastic depth, autoaugment

6. Discussion

scaling method와 EfficientNet Architecture의 효과가 disentangle 되어있지 않음
scaling method의 효과를 증명하기 위해 EfficientNet-B0에서 기존 scaling up method와 제시한 method를 비교

7. Conclusion

ConvNet의 scaling과 width, depth, resolution의 밸런스는 가장 중요하나 missing piece였다.
Baseline 모델의 장점을 유지하며 scaling up이 가능한 간단하고 효과적인 compound scaling method를 제시
mobile-size인 EfficientNet model을 매우 효과적으로 scale up 하여 ImageNet과 5개의 많이 쓰이는 전이 학습 데이터셋에서 SOTA를 달성.

Reference

[1] Mingxing Tan and Quoc V. Le. Efficientnet: Rethinking model scaling for convolutional neural networks. arXiv:1905.11946, 2019.

'AI > Deep Learning' 카테고리의 다른 글

DG-Font: Deformable Generative Networks for Unsupervised Font Generation (0)	2021.09.25
Deformable Convolutional Networks (0)	2021.09.25
Multiple Heads are Better than One: Few-shot Font Generation with Multiple Localized Experts (0)	2021.09.19
Few-shot Font Generation with Localized Style Representations and Factorization (0)	2021.09.19
CS231n - Lecture 8 ~ 13 (0)	2021.09.11

'AI/Deep Learning' Related Articles

Comments

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

Deeper Learning

Deeper Learning

EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks 본문

EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks

Abstract

1. Introduction

2. Related Work

3. Compound Model Scaling

3.1. Problem Formulation

3.2. Scaling Dimensions

3.3. Compound Scaling

4. EfficientNet Architecture

5. Experiments

6. Discussion

7. Conclusion

Reference

'AI > Deep Learning' 카테고리의 다른 글

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역