'multimodal' 태그의 글 목록

Notice

Contact

Recent Posts

Recent Comments

Link

« 2025/04 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

Tags more

Archives

Today

Total

관리 메뉴

목록multimodal (3)

Deeper Learning

CLIP: Learning Transferable Visual Models From Natural Language Supervision

Alec Radford, Jong Wook Kim , Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever, (2021.02) Abstract SOTA 컴퓨터 비전 시스템은 정해진 object의 카테고리를 예측하도록 학습하는 것 지도학습 방식은 라벨링된 데이터를 요구하기 때문에 일반화, 활용성에 제약이 존재 raw text-image pair 데이터를 학습하면 지도학습에서 더 많은 소스를 활용할 수 있음 raw text-image pair 데이터로 이미지 representation 학습을 ..

AI/Deep Learning 2022. 11. 4. 20:07

M3AE: Multimodal Masked Autoencoders Learn Transferable Representations

Xinyang Geng, Hao Liu, Lisa Lee, Dale Schuurmans, Sergey Levine, Pieter Abbeel, [UC Berkeley, Google Brain] (2022.05.31) Abstract 다양한 multimodal data를 학습하는 scalable 모델을 만드는 것은 아직까지 어려움이 많다 vision-language data에서 주된 접근법은 modality 마다 separate encoder를 학습하는 contrastive learning contrastive learning은 효율적이지만, 사용한 data augmentation에 따른 sampling bias로 downstream task에서 성능이 감소하는 문제가 존재 contrastive learn..

AI/Deep Learning 2022. 6. 11. 01:12

Multimodal Learning

What is Multimodal Learning? Modality(a particular way of doing or experiencing something)가 다른 데이터를 사용하여 이루어지는 학습이다. 예를 들어, 이미지 데이터와 텍스트 데이터를 input으로 사용하는 image captioning task가 Multimodal Learning에 속한다. Challenge Multimodal Learning이 쉽지 않은 이유는 여러 가지가 있다. Different representations between modalities 이미지, 오디오, 텍스트, 3D 데이터는 그 표현 형식이 서로 달라 단순한 연산이 불가능하다. Unbalance between heterogeneous feature spac..

AI/Deep Learning 2021. 3. 17. 13:41

Prev 1 Next

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

Deeper Learning

목록multimodal (3)

Deeper Learning

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역