PaLI: A Jointly-Scaled Multilingual Language-Image Model
Xi Chen, Xiao Wang et al
arXiv:2209.06794 (2022) paper link Google AI blog
PaLI (Pathways Language and Image model) achieves state-of-the-art in multiple vision and language tasks (such as captioning, visual question-answering, scene-text understanding), while retaining a simple, modular, and scalable design.