LASP: Text-to-Text Optimization for Language-Aware Soft Prompting of Vision & Language Models
October 03, 2022 · Declared Dead · 🏛 Computer Vision and Pattern Recognition
"Paper promises code 'coming soon'"
Evidence collected by the PWNC Scanner
Authors
Adrian Bulat, Georgios Tzimiropoulos
arXiv ID
2210.01115
Category
cs.CV: Computer Vision
Cross-listed
cs.AI,
cs.LG
Citations
71
Venue
Computer Vision and Pattern Recognition
Last Checked
1 month ago
Abstract
Soft prompt learning has recently emerged as one of the methods of choice for adapting V&L models to a downstream task using a few training examples. However, current methods significantly overfit the training data, suffering from large accuracy degradation when tested on unseen classes from the same domain. To this end, in this paper, we make the following 4 contributions: (1) To alleviate base class overfitting, we propose a novel Language-Aware Soft Prompting (LASP) learning method by means of a text-to-text cross-entropy loss that maximizes the probability of the learned prompts to be correctly classified with respect to pre-defined hand-crafted textual prompts. (2) To increase the representation capacity of the prompts, we propose grouped LASP where each group of prompts is optimized with respect to a separate subset of textual prompts. (3) We identify a visual-language misalignment introduced by prompt learning and LASP, and more importantly, propose a re-calibration mechanism to address it. (4) We show that LASP is inherently amenable to including, during training, virtual classes, i.e. class names for which no visual samples are available, further increasing the robustness of the learned prompts. Through evaluations on 11 datasets, we show that our approach (a) significantly outperforms all prior works on soft prompting, and (b) matches and surpasses, for the first time, the accuracy on novel classes obtained by hand-crafted prompts and CLIP for 8 out of 11 test datasets. Code will be made available at https://www.adrianbulat.com/lasp
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
📜 Similar Papers
In the same crypt — Computer Vision
🌅
🌅
Old Age
🌅
🌅
Old Age
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
R.I.P.
👻
Ghosted
You Only Look Once: Unified, Real-Time Object Detection
🌅
🌅
Old Age
SSD: Single Shot MultiBox Detector
🌅
🌅
Old Age
Squeeze-and-Excitation Networks
R.I.P.
👻
Ghosted
Rethinking the Inception Architecture for Computer Vision
Died the same way — ⏳ Coming Soon™
R.I.P.
⏳
Coming Soon™
Exploring Simple Siamese Representation Learning
R.I.P.
⏳
Coming Soon™
An Analysis of Scale Invariance in Object Detection - SNIP
R.I.P.
⏳
Coming Soon™
Class-balanced Grouping and Sampling for Point Cloud 3D Object Detection
R.I.P.
⏳
Coming Soon™