ROSE: Revolutionizing Open-Set Dense Segmentation with Patch-Wise Perceptual Large Multimodal Model
November 29, 2024 · Declared Dead · 🏛 arXiv.org
"Paper promises code 'coming soon'"
Evidence collected by the PWNC Scanner
Authors
Kunyang Han, Yibo Hu, Mengxue Qu, Hailin Shi, Yao Zhao, Yunchao Wei
arXiv ID
2412.00153
Category
cs.CV: Computer Vision
Cross-listed
cs.LG
Citations
1
Venue
arXiv.org
Last Checked
1 month ago
Abstract
Advances in CLIP and large multimodal models (LMMs) have enabled open-vocabulary and free-text segmentation, yet existing models still require predefined category prompts, limiting free-form category self-generation. Most segmentation LMMs also remain confined to sparse predictions, restricting their applicability in open-set environments. In contrast, we propose ROSE, a Revolutionary Open-set dense SEgmentation LMM, which enables dense mask prediction and open-category generation through patch-wise perception. Our method treats each image patch as an independent region of interest candidate, enabling the model to predict both dense and sparse masks simultaneously. Additionally, a newly designed instruction-response paradigm takes full advantage of the generation and generalization capabilities of LMMs, achieving category prediction independent of closed-set constraints or predefined categories. To further enhance mask detail and category precision, we introduce a conversation-based refinement paradigm, integrating the prediction result from previous step with textual prompt for revision. Extensive experiments demonstrate that ROSE achieves competitive performance across various segmentation tasks in a unified framework. Code will be released.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
📜 Similar Papers
In the same crypt — Computer Vision
🌅
🌅
Old Age
🌅
🌅
Old Age
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
R.I.P.
👻
Ghosted
You Only Look Once: Unified, Real-Time Object Detection
🌅
🌅
Old Age
SSD: Single Shot MultiBox Detector
🌅
🌅
Old Age
Squeeze-and-Excitation Networks
R.I.P.
👻
Ghosted
Rethinking the Inception Architecture for Computer Vision
Died the same way — ⏳ Coming Soon™
R.I.P.
⏳
Coming Soon™
Exploring Simple Siamese Representation Learning
R.I.P.
⏳
Coming Soon™
An Analysis of Scale Invariance in Object Detection - SNIP
R.I.P.
⏳
Coming Soon™
Class-balanced Grouping and Sampling for Point Cloud 3D Object Detection
R.I.P.
⏳
Coming Soon™