'AI/Deep Learning' 카테고리의 글 목록 (2 Page)

Notice

Contact

Recent Posts

Recent Comments

Link

« 2025/04 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

Tags more

Archives

Today

Total

관리 메뉴

목록AI/Deep Learning (99)

Deeper Learning

VAE: Auto-Encoding Variational Bayes

Diederik P. Kingma, Max Welling, Machine Learning Group Universiteit van Amsterdam. (2013) Abstract intractable posterior, large dataset, continous latent variable 환경에서 directed probabilistic model의 추론, 학습을 어떻게 효율적으로 할 수 있을까? mild 한 미분 가능 조건하에 intractable case에서도 가능하며 large dataset을 다룰 수 있는 stochastic variational inference와 학습 알고리즘을 제시 Contribution variational lower bound에 reparameterization을 적용..

AI/Deep Learning 2022. 2. 26. 14:53

ConvMixer: Patches Are All You Need?

Asher Trockman, J.Zico Kolter. Carnegie Mellon University and Bosch Center for AI. (2022.01.24) Abstract CNN이 vision task에서 지배적인 아키텍처였으나 최근 ViT가 SOTA를 달성 self-attentoin의 quadratic runtime의 한계로 large images를 처리하기 위해 patch embedding을 사용한다. 여기서 질문, ViT의 성능은 Transformer 아키텍처로 인한 것인가? 아니면 input representation으로 patch를 사용한 것이 영향을 끼쳤는가? 논문은 후자에 대한 증거를 제시한다 patch를 바로 input으로 받는 MLP-Mixer, 같은 resolution을..

AI/Deep Learning 2022. 2. 21. 18:45

GANet: Glyph-Attention Network for Few-Shot Font Generation

Under review as a conference paper at ICLR 2022 Mingtao Guo, Wei Xiong, Zheng Wang, Yong Tang, Ting Wu. Xian Univ. (2021.09) Abstract Non-local neural network에서 영감을 받아 제시한 GANet content encoder와 style encoder는 content glyph과 style glyph에서 key와 value를 추출 기존 SOTA few-shot font 생성 모델을 뛰어넘는 결과 Introduction 중국 폰트에 대한 인기가 세계적으로 증가하면서 니즈또한 증가하고 있음 그러나 폰트 생성은 노동집약적이며 오랜시간이 걸림 image to image translation과..

AI/Deep Learning 2022. 2. 18. 12:23

AdaConv: Adaptive Convolutions for Structure-Aware Style Transfer

Prashanth Chandran, Gaspard Zoss, Paulo Gotardo, Markus Gross, Derek Bradley. DisneyResearch, Department of Computer Sceince ETH Zurich. (2021). CVPR Abstract Style Transformer는 content image의 content를 유지하며 style image의 style을 입히는 CNN의 artistic 적용 SOTA neural style transfer는 style image의 통계적 특성을 content image에 옮기는 AdaIN AdaIN은 global operation으로 local geometric structure를 무시하고 transfer 시키지 못하는 문..

AI/Deep Learning 2022. 2. 15. 16:16

CGAN: Conditional Generative Adversarial Nets

Mehdi Mirza, Simon Osindero.(2014.11) Abstract Generative Adversarial Nets(이하 GAN)은 최근(2014.06) 생성 모델을 학습하기 위한 새로운 방식으로 제시되었음. $y$ label을 generator와 discriminator에게 주어 GAN이 conditional 생성을 가능하게 함. MNIST를 class label에 따라 생성, multi-modal model로 training label이 없는 image에 대해 tagging 결과를 수록 1. Introduction GAN은 다루기 힘든 여러 확률적 계산을 근사하기 위한 생성모델 학습 프레임워크로 최근에 제시되었음. Markov chain을 사용하지 않고 역전파로만 gradient를 ..

AI/Deep Learning 2022. 2. 9. 09:01

StyleGAN2: Analyzing and Improving the Image Quality of StyleGAN

Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, Timo Aila NVIDIA. (2019.12) Abstract StyleGAN은 unconditional data-driven 이미지 생성 SOTA 달성 모델 StyleGAN의 특징적인 아티팩트를 분석하고 이를 해결하기 위한 아키텍처와 학습방법을 제안 generator의 normalization을 새로 디자인, progressive growing을 재고(resolution을 증가시키며 학습하는 것), generator regularization을 통해 latent code에서 image로의 매핑을 개선 path length regularizer는 이미지 퀄리티를 개선할..

AI/Deep Learning 2022. 2. 7. 10:30

StyleGAN: A Style-Based Generator Architecture for Generative Adversarial Networks

Tero Karras, Samuli Laine, Timo Aila. NVIDIA. (2019.03) Abstract style transfer 논문의 아이디어를 차용한 아키텍처를 제시 새 아키텍처는 비지도 학습으로 자동으로 학습되며, high-level attributes(pose, identity)와 생성 이미지의 stochastic variation(주근깨, 머리카락)을 separation, 직관적인 scale에 따른 생성 이미지 조정 또한 가능 기존 distribution quality metrics에서 SOTA를 달성, 입증 가능한 더 나은 interpolation 성능, variation의 latent factor를 더 잘 disentangle interpolation 퀄리티와 disentang..

AI/Deep Learning 2022. 1. 31. 20:32

ConvNeXt: A ConvNet for the 2020s

[convnext] Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie, Facebook AI Research (FAIR), UC Berkeley (2022.01.02) Abstract 2020년도에 ViT는 이미지 분류 SOTA 모델로 CNN을 대체 하지만 vanilla ViT는 object detection 또는 semantic segmentation과 같은 general computer vision task에 적용하는데 어려움을 겪음 Swin Transformer와 같은 계층적 Transformer는 여러 vision task에서 좋은 성능을 냄 하지만 이러한 hybrid 접근의 유효성은 c..

AI/Deep Learning 2022. 1. 30. 14:33

MLP-Mixer: An all-MLP Architecture for Vision

Ilya Tolstikhin, Neil Houlsby, Alexander Kolesnikov, Lucas Beyer, Xiaohua Zhai, Thomas Unterthiner, Jessica Yung, Andreas Steiner, Daniel Keysers, Jakob Uszkoreit, Mario Lucic, Alexey Dosovitskiy. Google Research, Brain Team. (2021.05) Abstract CNN은 vision task에서 널리 사용되었음, 최근에는 ViT와 같이 attention-based networks도 사용 convolution과 attention는 좋은 성능을 내기 위한 필수조건이 아님 오직 multi-layer perceptrons을 사용한 MLP-..

AI/Deep Learning 2022. 1. 28. 09:42

Variational autoencoder

VAE는 생성을 목표로 하는 모델로 AE와 형태는 비슷하나 근간은 명백하게 다르다. 생성모델의 목표는 주어진 dataset $x$ 를 사용하여 density estimation을 통해 $p(x)$ 를 근사하는 것 generator를 구성하는 모든 parameter set을 $\theta$ 라고 하자 샘플링가능한 분포 $z$ 에서 샘플링한 $z^i$ 는 decoder(generator)인 $g_\theta$ 를 통과하여 데이터를 생성 생성결과 = $g_\theta(x|z=z^i)$ generator까지 고려하였을 때 최대화 해야하는 대상은 $p_\theta(x|z)$ (parameter set $\theta$ 로 구성된 generator를 사용하였을 때 prior $z$ 에서 $x$ 를 생성할 확률) 샘플링..

AI/Deep Learning 2022. 1. 26. 22:20

Few-shot Font Style Transfer between Different Languages

Chenhao Li, Yuta Taniguchi, Min Lu, Shin’ichi Konomi HDI Lab, Kyushu University. (2021) Abstract 적은 샘플로 언어 간 font style을 transfer 할 수 있는 FTransGAN을 제시 언어 간 font style transfer는 많이 연구되지 않았음 multi-level attention을 통해 local, global style을 capture English, Chinese를 모두 가진 847개의 font dataset을 사용 Introduction 폰트를 만드는 것은 매우 노동집약적, 대부분 하나의 언어에 대한 폰트만 만듦 폰트의 subset만 보고 나머지를 생성하는 것은 neural net으로 가능해졌으며 관련 ..

AI/Deep Learning 2022. 1. 7. 08:50

DeiT: Training data-efficient image transformers & distillation through attention

Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, Herve Jegou. Facebook AI. Sorbonne University. (2020.12) Abstract Image task에 pure attention 기반 모델이 활용되지만 large dataset에서 pre-training이 필수적이며 이는 활용에 한계를 가져옴 single computer에서 3일 동안 ImageNet dataset만 학습하여 top-1 acc 83.1%를 달성 attention을 통해 학습하는 distillation token을 도입한 transformer를 위한 teacher-student strategy를 제시 ..

AI/Deep Learning 2022. 1. 5. 09:09

CoAtNet: Marrying Convolution and Attention for All Data Sizes

Zihang Dai, Hanxiao Liu, Quoc V. Le, Mingxing Tan Google Research, Brain Team (2021.06) Abstract Transformer 아직 vision task에서 SOTA convolutional network보다 성능이 떨어짐 Transformer는 더 큰 capacity를 가지지만 inductive bias의 부족으로 일반화 성능이 convolutional network에 비해 떨어짐 두 아키텍처의 장점을 결합하기 위해 hybrid model인 CoAtNets를 제시 CoAtNet의 key insights depthwise Convolution, Self-Attention은 간단한 relative attention을 통해 결합 가능 수직으..

AI/Deep Learning 2022. 1. 1. 00:51

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, Baining Guo Microsoft Research Asia (2021.03) Abstract CV task를 위한 범용적인 Backbone 모델로 새로운 vision Transformer인 Swin Transformer를 제시 vision에 Transformer를 사용하기에는 visual entities의 scale 변동, high resolution pixels 등의 어려움이 있음 이를 해결하기 위해 Shifted windows를 사용하는 hierarchical Transformer를 제시 shifted windowing은 겹치지 않는 local window 한정으..

AI/Deep Learning 2021. 12. 31. 21:45

MobileNetV2: Inverted Residuals and Linear Bottlenecks

Mark Sandler Andrew Howard Menglong Zhu Andrey Zhmoginov Liang-Chieh Chen Google Inc. CVPR Abstract 다양한 task, model size에서 mobile model SOTA를 달성한 MobileNetV2를 제시 SSDLite를 사용한 object detection, DeepLabv3 기반 semantic segmentation에서 MobileNetV2를 사용 thin bottle neck layers 간 shortcut connection에서 inverted residual structure를 사용 intermediate expansion layer는 lightweight depthwise convolution을 사용 narr..

AI/Deep Learning 2021. 12. 24. 13:32

Residual Attention Network for Image Classification

Fei Wang1, Mengqing Jiang2, Chen Qian1, Shuo Yang3, Cheng Li1,Honggang Zhang4, XiaogangWang3, Xiaoou Tang, SenseTime Group Limited, Tsinghua University,The Chinese University of Hong Kong, Beijing University of Posts and Telecommunications. (2017) Abstract 당시 SOTA인 attention mechanism을 CNN에 사용한 Residual Attention Network를 제시 Residual Attention Network(이하 RAN)은 attention-aware feature를 뽑아내는 Atten..

AI/Deep Learning 2021. 12. 22. 10:17

[Vision Transformer, ViT] An Image is Worth 16x16 Words: Transformers For Image Recognition At Scale

Alexey Dosovitskiy et al., (2020), Google Research, Brain Team Abstract 사실상 Transformer 구조가 NLP task에서 standard가 되었지만 vision task에서는 아직 적용에 한계가 있었음 Transformer는 CNN을 대체하지 못하고 CNN의 일부 컴포넌트를 대체하는 식으로 결합하여 사용되고 있었음 이미지 분류 태스크에서 pure transformer로 좋은 성능을 낼 수 있음 large datasets에서 pre-trained 한 ViT 모델은 mid-sized or small image recognition(ImageNet, CIFAR-100, VTAB, etc)에서 더 적은 computational cost를 필요로 하면..

AI/Deep Learning 2021. 12. 17. 15:17

ArcFace: Additive Angular Margin Loss for Deep Face Recognition

2018, Stefanos Zafeiriou Imperial College London Jiankang Deng et al. Abstract Deep Convolutional Neural Networks를 사용하여 large-scale face recognition에서 feature learning을 할 때 challenge는 discriminative power를 향상시키기 위한 loss function 설계 Centre loss는 deep features와 해당 class의 centre 사이의 Euclidean space에서 distance에 penalty를 주어 intra-class compactness를 달성 Sphere Face는 last fc layer의 linear transformation..

AI/Deep Learning 2021. 12. 13. 08:26

You Only Look Once: Unified, Real-Time Object Detection

2016, University of Washington, Allen Institute for AI, Facebook AI Research source: https://arxiv.org/pdf/1506.02640.pdf Abstract YOLO, obejct detection을 위한 새로운 method를 제시 이전 연구들은 classifier를 detection을 위해 사용하였으나 YOLO는 object detection을 spatially separated bbox와 해당하는 bbox의 class probabilities를 예측하는 regression 문제로 정의 하나의 신경망 모델이 full image에서 one-stage로 bbox와 class probabilities를 예측 전체 detection ..

AI/Deep Learning 2021. 11. 30. 09:33

Pixel-Adaptive Convolutional Neural Networks

Abstract Convolutions은 CNN의 기본 building block spatially shared weights는 CNN이 쓰이는 이유임과 동시에 한계점이다 learnable local pixel features에 따라 변하는 pixel-adaptive convolution(PAC)를 제시 PAC는 자주 쓰이는 filter들의 일반화 버전으로 많은 case에서 그대로 사용이 가능하다. deep join image upsampling에서 SOTA fully-connected CRF를 PAC-CRF로 대체하여 성능, 속도 향상 pre-trained networks에서 PAC drop-in replacement로 성능 향상 1. Introduction standard convolution의 두 ..

AI/Deep Learning 2021. 11. 16. 23:28

MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

Abstract mobile, embedded vision application을 위한 depth-wise seperable convolutions을 활용한 경량화 모델 MobileNets을 제시 latency와 accuracy의 trade off를 조정하는 2개의 global hyperparameters 제시 MobileNets을 다른 모델과 비교하며 여러 실험을 통해 효율성을 검증하였음 다양한 task에 적용 가능 1. Introduction limited platform에서 빠르게 동작하여야 하는 real-word의 application이 다수 존재 (Robotics, self-driving) 2개의 hyper-parameters를 사용하여 모바일 및 embedded vision applicatio..

AI/Deep Learning 2021. 11. 6. 00:27

Adaptive Convolutional Kernels

Abstract CV recognition performance을 향상시키기 위해 많은 복잡한 neural network architectures가 연구되었다. high computing resource를 요구하지 않고 온 디바이스로 한정된 resource에서 모델을 동작시키기 위해 adaptive convolutional kernel을 제시 input image가 dynamic하게 kernel을 결정 Adaptive kernels을 사용한 모델은 기존 CNN 보다 2배 이상 수렴이 빨랐으며, LeNet base model에서 MNIST dataset에 대해 99% 이상의 정확도를 달성하는데 66배 적은 parameters만 필요로 하였음 1. Introduction SOTA CV 모델은 대부분 deep..

AI/Deep Learning 2021. 11. 5. 23:58

Full Stack Deep Learning Lecture 5 ~ 7

Lecture 5Setting up ML Projects85% of AI projects fail Prioritizing projectsML project의 feasibilityData availabilitystable?hard to acquire?expensive labeling?how much data is needed?security requirements?Accuracy requirementhow costly are wrong pred?how frequently does the system need to be right to be useful?ethical implicationsProblem difficultyis the problem well-defined?good published work o..

AI/Deep Learning 2021. 10. 30. 17:47

Full Stack Deep Learning - Lecture 4

Lecture 4 Transfer Learning 10k labeled 데이터로 classifier 학습. 이미지넷처럼 ResNet-50을 사용하고 싶으나 모델이 너무 커 작은 데이터에 쉽게 오버피팅 ImageNet에서 pretrained된 ResNet-50을 load하고 10k data로 fine-tuning → 성능 향상 Language Model need to converting word to vectors one hot encoding Scaled poorly with vocab size Very high-dimensional sparse vectors → NN operations work poorly Violates what we know about word similarity (ex. dist..

AI/Deep Learning 2021. 10. 30. 17:37

U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation

Abstract end-to-end로 학습 가능한 normalization function과 어텐션 모듈을 사용한 unsupervised image-to-image translation method를 제시 어텐션 모듈은 auxiliary classifier에 의해 얻은 attention map을 사용하여 중요한 region에 모델이 집중하도록 guide 이전 attention-based method와 다르게 두 도메인 간 geometric change가 가능 dataset에서 얻은 paramters로 shape와 texture 변화량을 flexible 하게 변화시킬 수 있는 AdaLIN(Adaptive Layer-Instance Normalization) 기존 SOTA보다 좋은 퀄리티 1. Introdu..

AI/Deep Learning 2021. 10. 27. 09:15

CBAM: Convolution Block Attention Module

Abstract intermediate feature map에서 순차적으로 channel, spatial attention을 적용하는 모듈을 제시 weight 수가 적으며 모든 CNN 아키텍처에 쉽게 적용할 수 있는 일반적인 모듈 ImageNet-1K, MS COCO detection, VOC 2007 detection dataset에서 성능 향상이 있었음 1. Introduction CNN은 vision task에서 rich representation power를 바탕으로 좋은 성능을 보였다. CNN의 성능을 향상시키기 위해 최근 Depth, Width, Cardinality에 대한 연구가 주로 진행되고 있다. 어텐션은 어디에 집중해야 하는지 말해주는 것만이 아니라 중요한 영역에 대한 represent..

AI/Deep Learning 2021. 10. 24. 11:12

Squeeze-and-Excitation Networks (SE-Net)

Abstract CNN의 메인 블록은 spatial, channel-wise information를 각 layer의 local receptive fields에서 결합하여 정보를 가지고 있는 feature를 만든다. 이전 연구들은 spatial encoding을 강화하는데 집중하였지만 저자는 channel relationship에 집중하여 "Squeeze-and-Excitation"(SE) block을 제시 SE block은 adaptive 하게 channel-wise feature를 recalibrate(재교정, 재보정)한다. computational cost가 크지 않으며 ILSVRC에서 1등을 차지 1. Introduction CNN은 필터들은 spatial, channel-wise 정보를 함께 사용..

AI/Deep Learning 2021. 10. 14. 01:43

Attention is all you need: Transformer

Transformer 트랜스포머 모델은 기존의 seq2seq 모델에서 Encoder, Decoder 형태를 유지하면서 RNN을 사용하지 않고 어텐션 스코어를 중심으로 학습을 하는 모델이다. 기존 Attention 모델에서 seq2seq의 Encoder가 Decoder로 정보를 전달할 때 hidden state에 정보를 모두 담기가 어렵고 시점에 따른 정보 희석의 문제를 해결하기 위해 어텐션 스코어를 사용하여 이를 보정하였다면 트랜스포머 모델은 어텐션 스코어 자체를 Encoder와 Decoder사이의 연결점으로 사용한다. Multi-head-Self-Attention 트랜스포머 모델은 셀프 어텐션을 통해 계산한 어텐션 스코어를 사용하기 때문에 먼저 셀프 어텐션에 대해 알아보겠다. 셀프 어텐션은 한 문장에서..

AI/Deep Learning 2021. 10. 10. 00:00

ELMo (Embeddings from Language Model)

Beyond Embeddings Word2Vec, Glove 임베딩은 13~14년에 인기 많은 task에서 좋은 성능을 보였음 Problem: representations are shallow 첫 번째 layer만 all Wikipedia data로 pretrained한 embedding의 이점을 가짐 나머지 layer (LSTMs)는 pretrained data를 보지 못하고 own data로만 학습됨 Bank Account와 River Bank에서 Bank는 전혀 다른 의미를 가지지만 Word2Vec 또는 Glove로는 이를 반영하지 못하고 Bank가 같은 벡터로 임베딩 되어 사용됨 문맥을 반영한 워드 임베딩 (Contextualized Word Embedding)이 필요하다. → embedding ..

AI/Deep Learning 2021. 10. 8. 09:50

Full Stack Deep Learning - Lecture 1 ~ 3

개괄적인 정리 for 그룹스터디 발표 Lecture 2.A Convolutional Networks Fc layer를 사용하지 않는 이유 연산량 image가 움직였을 때 모든 input의 위치가 변함 ( translation variant ) filter math W' = floor((W-F+2P) / S + 1) half padding == same padding full padding: 3x3 filter 기준 (2,2) padding, image가 1 pixel이라도 kernel에 포함되도록 하는 최대 패딩 (3,3) padding을 할 경우 kernel이 padding으로만 이루어진 영역을 계산함. (의미 없음) Transposed convolution Upsampling을 위한 기법 conv는 ..

AI/Deep Learning 2021. 10. 7. 01:15

Prev 1 2 3 4 Next

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

Deeper Learning

목록AI/Deep Learning (99)

Deeper Learning

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역