Overcoming Language Priors in Visual Question Answering via Distinguishing Superficially Similar Instances

September 18, 2022 ยท Declared Dead ยท ๐Ÿ› International Conference on Computational Linguistics

๐Ÿ“œ CAUSE OF DEATH: Death by README
Repo has only a README

Repo contents: LICENSE, README.md, requirements.txt

Authors Yike Wu, Yu Zhao, Shiwan Zhao, Ying Zhang, Xiaojie Yuan, Guoqing Zhao, Ning Jiang arXiv ID 2209.08529 Category cs.CL: Computation & Language Cross-listed cs.CV, cs.MM Citations 25 Venue International Conference on Computational Linguistics Repository https://github.com/wyk-nku/Distinguishing-VQA.git}{Distinguishing-VQA} โญ 4 Last Checked 1 month ago
Abstract
Despite the great progress of Visual Question Answering (VQA), current VQA models heavily rely on the superficial correlation between the question type and its corresponding frequent answers (i.e., language priors) to make predictions, without really understanding the input. In this work, we define the training instances with the same question type but different answers as \textit{superficially similar instances}, and attribute the language priors to the confusion of VQA model on such instances. To solve this problem, we propose a novel training framework that explicitly encourages the VQA model to distinguish between the superficially similar instances. Specifically, for each training instance, we first construct a set that contains its superficially similar counterparts. Then we exploit the proposed distinguishing module to increase the distance between the instance and its counterparts in the answer space. In this way, the VQA model is forced to further focus on the other parts of the input beyond the question type, which helps to overcome the language priors. Experimental results show that our method achieves the state-of-the-art performance on VQA-CP v2. Codes are available at \href{https://github.com/wyk-nku/Distinguishing-VQA.git}{Distinguishing-VQA}.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Computation & Language

๐ŸŒ… ๐ŸŒ… Old Age

Attention Is All You Need

Ashish Vaswani, Noam Shazeer, ... (+6 more)

cs.CL ๐Ÿ› NeurIPS ๐Ÿ“š 166.0K cites 8 years ago

Died the same way โ€” ๐Ÿ“œ Death by README