Don't Pay Attention, PLANT It: Pretraining Attention via Learning-to-Rank

October 30, 2024 · Entered Twilight · + Add venue

Repo contents: .gitattributes, .gitconfig, .github, .gitignore, LICENSE, MANIFEST.in, README.md, nbs, settings.ini, setup.py, xcube

Authors Debjyoti Saha Roy, Byron C. Wallace, Javed A. Aslam arXiv ID 2410.23066 Category cs.CL: Computation & Language Cross-listed cs.LG Citations 0 Repository https://github.com/debjyotiSRoy/xcube/tree/plant Last Checked 3 months ago

Abstract

State-of-the-art Extreme Multi-Label Text Classification models rely on multi-label attention to focus on key tokens in input text, but learning good attention weights is challenging. We introduce PLANT - Pretrained and Leveraged Attention - a plug-and-play strategy for initializing attention. PLANT works by planting label-specific attention using a pretrained Learning-to-Rank model guided by mutual information gain. This architecture-agnostic approach integrates seamlessly with large language model backbones such as Mistral-7B, LLaMA3-8B, DeepSeek-V3, and Phi-3. PLANT outperforms state-of-the-art methods across tasks including ICD coding, legal topic classification, and content recommendation. Gains are especially pronounced in few-shot settings, with substantial improvements on rare labels. Ablation studies confirm that attention initialization is a key driver of these gains. For code and trained models, see https://github.com/debjyotiSRoy/xcube/tree/plant