Notice

Recent Posts

Tags more

Archives

관리 메뉴

Deeper Learning

MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications 본문

AI/Deep Learning

Dlaiml 2021. 11. 6. 00:27

mobile, embedded vision application을 위한 depth-wise seperable convolutions을 활용한 경량화 모델 MobileNets을 제시
latency와 accuracy의 trade off를 조정하는 2개의 global hyperparameters 제시
MobileNets을 다른 모델과 비교하며 여러 실험을 통해 효율성을 검증하였음
다양한 task에 적용 가능

limited platform에서 빠르게 동작하여야 하는 real-word의 application이 다수 존재 (Robotics, self-driving)
2개의 hyper-parameters를 사용하여 모바일 및 embedded vision application에서 design requirements에 맞는 low latency, small efficient network architecture를 제시

작고 효율적인 neural network에 대한 연구가 최근(2017)에 많이 진행되었다.
MobileNets 논문은 모델 개발자가 resource 제한(latency, size)에 따른 small network를 선택할 수 있도록 class of network architectures을 제시
많은 논문들이 size에 최적화에 집중하지만 MobileNet은 latency도 고려
MobileNet의 주요 모듈은 depthwise separable convolution
reduced computation networks: Xception, Squeezenet, structured transform networks, deep fried convnets
shrinking, factorizing, compressing pretrained networks, distillation, low bit networks 등의 접근방식도 있다

core layer인 depthwise separable filters, shrinking hyperparamters인 width multiplier & resolution multiplier

factorized convolution
- standard convolution → depthwise convolution & 1x1 convolution(pointwise convolution)
MobileNet의 depthwise convolution은 각 input 채널마다 하나의 filter를 사용

Input size가 D * D * M이고 output size가 D * D * N이고 Kernel의 size가 (F, F) 일 때 Convolution 별 연산량
- Standard Convolution: F x F x M x N x D x D
- Depthwise Convolution (only for N == M ) : F x F x M x D x D
- Depthwise Seperable Convolution: F x F x M x D x D + M x N x D x D
Standard Convolution과 Depthwise Seperable Convolution의 연산량을 비교하면 아래와 같으며 3x3 depthwise seperable convolution을 사용하는 MobileNet에서 8~9배 연산량이 적다

대부분의 연산과 parameters가 1x1 Conv에 있음
1x1 conv 연산은 highly optimized general matrix multiply function인 GEMM에 의해 framework에서 이루어진다.
parameter가 적은 depthwise filter에 weight decay를 거의 적용하지 않는 것이 더 좋은 성능을 보였다

$\rho$ 는 Resolution Multiplier로 input image와 internal representation의 resolution을 줄인다
$\rho$ 는 0~1의 값으로 input resolution이 224, 192, 160, 128로 $\rho$ 에 의해 바뀐다.
$\alpha$ 와 마찬가지로 $\rho$ 도 computational cost를 대략 $\rho^2$ 만큼 줄인다.

Table 4
- 훨씬 적은 parameter와 계산량으로도 비슷한 성능
Table 5
- depth를 줄인 shallow 모델과 width를 줄인 thinner 모델의 성능을 비교, thinner 모델이 더 좋은 성능을 보임
Table 6, 7
- $\alpha$ 가 매우 작아지기(0.25) 전까지는 smooth 하게 성능이 조금씩 하락
- $\rho$ 도 마찬가지로 smooth 하게 성능 하락

Table 8~10,
- 다른 모델과의 성능 비교
- GoogleNet, VGG보다 좋은 성능 Inception V3보다 다소 떨어지는 top-1 acc

정리 & 후기

depthwise seperable convolution (depthwise convolution + pointwise convolution)으로 standard convolution을 대체하여 효과적으로 연산량과 params 수를 줄임
기존 모델에 비해 성능도 크게 떨어지지 않아 임베디드 디바이스에서도 사용할 수 있는 모델
depth, resolution, width를 줄이지 않으면서 convolution 연산 자체를 개선하여 경량화를 성공
v2에서 추가로 channel axis의 연산량을 Bottleneck Residual Block을 사용하여 개선

Andrew G. Howard et al. (2017). MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

You Only Look Once: Unified, Real-Time Object Detection (0)	2021.11.30
Pixel-Adaptive Convolutional Neural Networks (0)	2021.11.16
Adaptive Convolutional Kernels (0)	2021.11.05
Full Stack Deep Learning Lecture 5 ~ 7 (0)	2021.10.30
Full Stack Deep Learning - Lecture 4 (0)	2021.10.30