Notice

Contact

Recent Posts

Recent Comments

Link

« 2025/04 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

Tags more

Archives

Today

Total

관리 메뉴

Deeper Learning

CBAM: Convolution Block Attention Module 본문

AI/Deep Learning

CBAM: Convolution Block Attention Module

Dlaiml 2021. 10. 24. 11:12

Abstract

intermediate feature map에서 순차적으로 channel, spatial attention을 적용하는 모듈을 제시
weight 수가 적으며 모든 CNN 아키텍처에 쉽게 적용할 수 있는 일반적인 모듈
ImageNet-1K, MS COCO detection, VOC 2007 detection dataset에서 성능 향상이 있었음

1. Introduction

CNN은 vision task에서 rich representation power를 바탕으로 좋은 성능을 보였다.
CNN의 성능을 향상시키기 위해 최근 Depth, Width, Cardinality에 대한 연구가 주로 진행되고 있다.
어텐션은 어디에 집중해야 하는지 말해주는 것만이 아니라 중요한 영역에 대한 representation을 향상한다.
목표는 "중요한 features에 집중하고 불필요한 features를 무시하는 어텐션 매커니즘을 사용하여 representation power를 향상시키자"
convolution 연산은 cross-channel, spatial 정보를 혼합하며 informative features를 추출하기 때문에 주요한 차원인 Channel, Spatial Axes를 따라 의미 있는 features가 강조되도록 모듈을 설계

위 그림처럼 channel, spatial attention을 순차적으로 적용
- channel axes attention을 사용하여 무엇에(What) 집중할 것인지 학습
- spatial axes attention을 사용하여 어디에(Where) 집중할 것인지 학습
CBAM은 network가 어느 정보에 집중하거나 무시할지 학습시켜 효과적인 information flow를 가능케한다.
작고 가벼운 CBAM을 추가하여 정확한 attention, noise reduction 등의 효과로 ImageNet, COCO, VOC 등 데이터셋에서 좋은 성능을 보임

2. Related Work

Network engineering

well-designed network는 다양한 task에서 좋은 성능을 보인다.
skip-connection을 통해 depth의 한계를 극복한 ResNet (WideResNet, Inception-ResNet, ResNeXt)
WideResNet에서 network width가 점진적으로 증가하도록 제약을 건 PyramidNet
반복적으로 input feature를 output feature에 concat 하여 각 convolution block이 이전 layer의 raw information을 받을 수 있도록 한 DenseNet
기존 연구의 주요 관심사인 depth, width, cardinality가 아닌 "Attention"에 CBAM은 집중

Attention mechanism

어텐션은 human perception에서 중요한 역할을 한다
인간은 한 번에 모든 장면을 기억하는 것이 아니라 여러 부분을 순차적으로 보며 선별적으로 중요한 region에 집중하기 때문에 시각적 구조를 더 잘 파악한다.
Residual Attention Network에서는 Encoder-Decoder 구조의 네트워크로 3d attention map을 만들고 적용 → 성능이 더 좋았으며 noisy input에 robust
CBAM은 이를 나누어 channel, spatial attention을 각각 적용하였기 때문에 3d feature map을 사용한 method보다 계산량과 parameters를 크게 줄였고 기존 CNN 구조에 쉽게 끼워 넣을 수 있는 plug-and-play 모듈이 되었다.
Squeeze-and-Excitation module은 inter-channel 관계를 모델링하였지만 global average pooling으로 인해 'where' to focus를 알려주는 spatial attention의 정보를 담지 못하였다

3. Convolutional Block Attention Module

Intermediate feature map $F$ (C, H, W)를 input으로 하고 순차적으로 1D channel attention map $M_c$ (C, 1, 1), 2D spatial attention map $M_s$ (1, H, W)를 통과한다.

Channel attention module

features의 채널 간 관계를 사용하여 channel attention을 설계
feature map의 각 채널은 feature detector의 역할을 하며 channel attention은 input image에서 중요한 어떤 것에 집중한다.
spatial information을 통합하기 위해 Global Average Pooling: GAP를 사용
더 나은 channel-wise attention을 위해 object를 구별하기 위한 중요한 feature에 대한 단서를 모을 수 있는 Global Max Pooling: GMP를 추가로 사용
GAP, GMP를 통과시킨 두 spatial context descriptor: $F^{c}{avg}, F^{c}{max}$
두 spatial context descriptor를 shared network에 forwarding 하여 channel attention map $M_{c}$ (C, 1, 1)을 만든다.
shared network는 one hidden layer MLP 구조, parameter overhead를 줄이기 위해 hidden layer의 activation 채널은 C / r (r은 reduction ratio)

$W_0$ (C → r x C dim), $W_1$ (C → C / r)

Spatial attention module

spatial attention은 'where is an informative part?'에 집중하여 channel attention을 보완한다.
GAP, GMP를 사용하여 channel axis로 차원을 줄이고 concat 하여 효과적인 feature descriptor를 만든다.
channel axis를 따라 pooling을 적용하는 것은 informative regions을 강조하는 효과
convolution layer를 통과하여 spatial attention map $M_s(F)$ (H x W)를 생성

$f^{7x7}$ 은 filter size 7x7을 의미

Arrangement of attention modules

channel, spatial attention을 이전 논문 BAM에서 parallel 하게 배치하였는데 sequential 하게 배치한 CBAM의 성능이 더 좋았다.
channel-first order의 성능이 spatial-first module의 성능보다 조금 더 좋았다.

4. Experiments

ImageNet-1K에서 image classification, MS COCO, VOC 2007에서 object-detection을 실험하였다.

5. Conclusion

CNN의 representation power를 향상시키기 위한 새로운 접근법으로 convolutional bottleneck attention module: CBAM을 제시
channel, spatial 어텐션을 순차적으로 적용하며 그 과정에서 GAP, GMP를 둘 다 사용
intermediate feature map에서 어느 부분에서 무엇을 집중해야 하는지 효과적으로 학습시킬 수 있는 모듈
ImageNet-1K, MS COCO, VOC 2007에서 성능 향상, visualization을 통한 검증

정리

BAM에서 spatial, channel 어텐션을 병렬적으로 적용하던 것을 순차적으로 사용한 모듈
무엇을 어느 부분에서 집중해야 하는지 channel, spatial 어텐션을 사용하여 학습
spatial attention이 channel 어텐션의 spatial information loss를 보완해줄 수 있음
추가 계산량, parameters가 적어 쉽게 기존 CNN 아키텍처에 적용할 수 있는 모듈

Reference

[1] S. Woo et al. (2018). CBAM: Covolutional Block Attention Module. url

'AI > Deep Learning' 카테고리의 다른 글

Full Stack Deep Learning - Lecture 4 (0)	2021.10.30
U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation (0)	2021.10.27
Squeeze-and-Excitation Networks (SE-Net) (0)	2021.10.14
Attention is all you need: Transformer (0)	2021.10.10
ELMo (Embeddings from Language Model) (0)	2021.10.08

'AI/Deep Learning' Related Articles

Comments

Deeper Learning Dlaiml 님의 블로그입니다.

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

Deeper Learning

Deeper Learning

CBAM: Convolution Block Attention Module 본문

CBAM: Convolution Block Attention Module

Abstract

1. Introduction

2. Related Work

Network engineering

Attention mechanism

3. Convolutional Block Attention Module

Channel attention module

Spatial attention module

Arrangement of attention modules

4. Experiments

5. Conclusion

정리

'AI > Deep Learning' 카테고리의 다른 글

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역