๐
๐
Old Age
Don't Pay Attention, PLANT It: Pretraining Attention via Learning-to-Rank
October 30, 2024 ยท Entered Twilight ยท + Add venue
Repo contents: .gitattributes, .gitconfig, .github, .gitignore, LICENSE, MANIFEST.in, README.md, nbs, settings.ini, setup.py, xcube
Authors
Debjyoti Saha Roy, Byron C. Wallace, Javed A. Aslam
arXiv ID
2410.23066
Category
cs.CL: Computation & Language
Cross-listed
cs.LG
Citations
0
Repository
https://github.com/debjyotiSRoy/xcube/tree/plant
Last Checked
3 months ago
Abstract
State-of-the-art Extreme Multi-Label Text Classification models rely on multi-label attention to focus on key tokens in input text, but learning good attention weights is challenging. We introduce PLANT - Pretrained and Leveraged Attention - a plug-and-play strategy for initializing attention. PLANT works by planting label-specific attention using a pretrained Learning-to-Rank model guided by mutual information gain. This architecture-agnostic approach integrates seamlessly with large language model backbones such as Mistral-7B, LLaMA3-8B, DeepSeek-V3, and Phi-3. PLANT outperforms state-of-the-art methods across tasks including ICD coding, legal topic classification, and content recommendation. Gains are especially pronounced in few-shot settings, with substantial improvements on rare labels. Ablation studies confirm that attention initialization is a key driver of these gains. For code and trained models, see https://github.com/debjyotiSRoy/xcube/tree/plant
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Computation & Language
๐
๐
Old Age
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
๐
๐
Old Age
XLNet: Generalized Autoregressive Pretraining for Language Understanding
๐ฎ
๐ฎ
The Ethereal
Effective Approaches to Attention-based Neural Machine Translation
๐
๐
Old Age
A large annotated corpus for learning natural language inference
๐
๐
Old Age