Find Someone Who: Visual Commonsense Understanding in Human-Centric Grounding

December 14, 2022 ยท Declared Dead ยท ๐Ÿ› Conference on Empirical Methods in Natural Language Processing

๐Ÿฆด CAUSE OF DEATH: Skeleton Repo
Boilerplate only, no real code

Repo contents: README.md, fig.jpg

Authors Haoxuan You, Rui Sun, Zhecan Wang, Kai-Wei Chang, Shih-Fu Chang arXiv ID 2212.06971 Category cs.CV: Computer Vision Cross-listed cs.CL Citations 7 Venue Conference on Empirical Methods in Natural Language Processing Repository https://github.com/Hxyou/HumanCog โญ 6 Last Checked 2 months ago
Abstract
From a visual scene containing multiple people, human is able to distinguish each individual given the context descriptions about what happened before, their mental/physical states or intentions, etc. Above ability heavily relies on human-centric commonsense knowledge and reasoning. For example, if asked to identify the "person who needs healing" in an image, we need to first know that they usually have injuries or suffering expressions, then find the corresponding visual clues before finally grounding the person. We present a new commonsense task, Human-centric Commonsense Grounding, that tests the models' ability to ground individuals given the context descriptions about what happened before, and their mental/physical states or intentions. We further create a benchmark, HumanCog, a dataset with 130k grounded commonsensical descriptions annotated on 67k images, covering diverse types of commonsense and visual scenes. We set up a context-object-aware method as a strong baseline that outperforms previous pre-trained and non-pretrained models. Further analysis demonstrates that rich visual commonsense and powerful integration of multi-modal commonsense are essential, which sheds light on future works. Data and code will be available https://github.com/Hxyou/HumanCog.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Computer Vision

Died the same way โ€” ๐Ÿฆด Skeleton Repo

R.I.P. ๐Ÿฆด Skeleton Repo

Neural Style Transfer: A Review

Yongcheng Jing, Yezhou Yang, ... (+4 more)

cs.CV ๐Ÿ› IEEE TVCG ๐Ÿ“š 828 cites 8 years ago