Selective Rotary Position Embedding
November 21, 2025 Β· Declared Dead Β· π arXiv.org
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Sajad Movahedi, Timur Carstensen, Arshia Afzal, Frank Hutter, Antonio Orvieto, Volkan Cevher
arXiv ID
2511.17388
Category
cs.CL: Computation & Language
Cross-listed
cs.LG
Citations
1
Venue
arXiv.org
Last Checked
3 months ago
Abstract
Position information is essential for language modeling. In softmax transformers, Rotary Position Embeddings (\textit{RoPE}) encode positions through \textit{fixed-angle} rotations, while in linear transformers, order is handled via input-dependent (selective) gating that decays past key-value associations. Selectivity has generally been shown to improve language-related tasks. Inspired by this, we introduce \textit{Selective RoPE}, an \textit{input-dependent} rotary embedding mechanism, that generalizes \textit{RoPE}, and enables rotation in \textit{arbitrary angles} for both linear and softmax transformers. We show that softmax attention already performs a hidden form of these rotations on query-key pairs, uncovering an implicit positional structure. We further show that in state-space models and gated linear transformers, the real part manages forgetting while the imaginary part encodes positions through rotations. We validate our method by equipping gated transformers with \textit{Selective RoPE}, demonstrating that its input-dependent rotations improve performance in language modeling and on difficult sequence tasks like copying, state tracking, and retrieval.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Computation & Language
π
π
Old Age
π
π
Old Age
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
π
π
Old Age
XLNet: Generalized Autoregressive Pretraining for Language Understanding
ποΈ
ποΈ
Transcended
Effective Approaches to Attention-based Neural Machine Translation
π
π
Old Age
A large annotated corpus for learning natural language inference
π
π
Old Age
HellaSwag: Can a Machine Really Finish Your Sentence?
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Neural Architecture Search with Reinforcement Learning
R.I.P.
π»
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
π»
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
π»
Ghosted