Learning to Infer Unseen Single-/Multi-Attribute-Object Compositions with Graph Networks

October 27, 2020 · Declared Dead · 🏛 IEEE Transactions on Pattern Analysis and Machine Intelligence

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Hui Chen, Jingjing Jiang, Nanning Zheng arXiv ID 2010.14343 Category cs.CV: Computer Vision Citations 12 Venue IEEE Transactions on Pattern Analysis and Machine Intelligence Last Checked 4 months ago

Abstract

Inferring the unseen attribute-object composition is critical to make machines learn to decompose and compose complex concepts like people. Most existing methods are limited to the composition recognition of single-attribute-object, and can hardly learn relations between the attributes and objects. In this paper, we propose an attribute-object semantic association graph model to learn the complex relations and enable knowledge transfer between primitives. With nodes representing attributes and objects, the graph can be constructed flexibly, which realizes both single- and multi-attribute-object composition recognition. In order to reduce mis-classifications of similar compositions (e.g., scratched screen and broken screen), driven by the contrastive loss, the anchor image feature is pulled closer to the corresponding label feature and pushed away from other negative label features. Specifically, a novel balance loss is proposed to alleviate the domain bias, where a model prefers to predict seen compositions. In addition, we build a large-scale MultiAttribute Dataset (MAD) with 116,099 images and 8,030 label categories for inferring unseen multi-attribute-object compositions. Along with MAD, we propose two novel metrics Hard and Soft to give a comprehensive evaluation in the multi-attribute setting. Experiments on MAD and two other single-attribute-object benchmarks (MIT-States and UT-Zappos50K) demonstrate the effectiveness of our approach.