Posts by Collection

publications

All You May Need for VQA are Image Captions

Soravit Changpinyo, Doron Kukliansky, Idan Szpektor, Xi Chen, Nan Ding, Radu Soricut

arXiv:2205.01883 (2022)   paper link   Google AI blog

We propose a method that automatically derives VQA examples at volume, by leveraging the abundance of existing image-caption annotations combined with neural models for textual question generation.

Towards Multi-Lingual Visual Question Answering

Soravit Changpinyo, Linting Xue, Idan Szpektor, Ashish V. Thapliyal, Julien Amelot, Xi Chen, Radu Soricut

arXiv:2209.05401 (2022)   paper link

We propose scalable solutions to multi-lingual visual question answering (mVQA), on both data and modeling fronts. We first propose a translation-based framework to mVQA data generation that requires much less human annotation efforts than the conventional approach of directly collection questions and answers.