Mask Matching Transformer for Few-Shot Segmentation

December 05, 2022 · Declared Dead · 🏛 Neural Information Processing Systems

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Siyu Jiao, Gengwei Zhang, Shant Navasardyan, Ling Chen, Yao Zhao, Yunchao Wei, Humphrey Shi arXiv ID 2301.01208 Category cs.CV: Computer Vision Citations 46 Venue Neural Information Processing Systems Last Checked 3 months ago

Abstract

In this paper, we aim to tackle the challenging few-shot segmentation task from a new perspective. Typical methods follow the paradigm to firstly learn prototypical features from support images and then match query features in pixel-level to obtain segmentation results. However, to obtain satisfactory segments, such a paradigm needs to couple the learning of the matching operations with heavy segmentation modules, limiting the flexibility of design and increasing the learning complexity. To alleviate this issue, we propose Mask Matching Transformer (MM-Former), a new paradigm for the few-shot segmentation task. Specifically, MM-Former first uses a class-agnostic segmenter to decompose the query image into multiple segment proposals. Then, a simple matching mechanism is applied to merge the related segment proposals into the final mask guided by the support images. The advantages of our MM-Former are two-fold. First, the MM-Former follows the paradigm of decompose first and then blend, allowing our method to benefit from the advanced potential objects segmenter to produce high-quality mask proposals for query images. Second, the mission of prototypical features is relaxed to learn coefficients to fuse correct ones within a proposal pool, making the MM-Former be well generalized to complex scenarios or cases. We conduct extensive experiments on the popular COCO-$20^i$ and Pascal-$5^i$ benchmarks. Competitive results well demonstrate the effectiveness and the generalization ability of our MM-Former.