Don't Pay Attention, PLANT It: Pretraining Attention via Learning-to-Rank

October 30, 2024 ยท Entered Twilight ยท + Add venue

๐Ÿ’ค TWILIGHT: Eternal Rest
Repo abandoned since publication

Repo contents: .gitattributes, .gitconfig, .github, .gitignore, LICENSE, MANIFEST.in, README.md, nbs, settings.ini, setup.py, xcube

Authors Debjyoti Saha Roy, Byron C. Wallace, Javed A. Aslam arXiv ID 2410.23066 Category cs.CL: Computation & Language Cross-listed cs.LG Citations 0 Repository https://github.com/debjyotiSRoy/xcube/tree/plant Last Checked 3 months ago
Abstract
State-of-the-art Extreme Multi-Label Text Classification models rely on multi-label attention to focus on key tokens in input text, but learning good attention weights is challenging. We introduce PLANT - Pretrained and Leveraged Attention - a plug-and-play strategy for initializing attention. PLANT works by planting label-specific attention using a pretrained Learning-to-Rank model guided by mutual information gain. This architecture-agnostic approach integrates seamlessly with large language model backbones such as Mistral-7B, LLaMA3-8B, DeepSeek-V3, and Phi-3. PLANT outperforms state-of-the-art methods across tasks including ICD coding, legal topic classification, and content recommendation. Gains are especially pronounced in few-shot settings, with substantial improvements on rare labels. Ablation studies confirm that attention initialization is a key driver of these gains. For code and trained models, see https://github.com/debjyotiSRoy/xcube/tree/plant
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Computation & Language

๐ŸŒ… ๐ŸŒ… Old Age

Attention Is All You Need

Ashish Vaswani, Noam Shazeer, ... (+6 more)

cs.CL ๐Ÿ› NeurIPS ๐Ÿ“š 166.0K cites 9 years ago