๐
๐
Old Age
Improving Cross-Lingual Factual Recall via Consistency-Driven Reinforcement Learning
June 04, 2026 ยท Grace Period ยท ๐ EMNLP 2026
Authors
Jonathan von Rad, Louis Arts, George Burgess, Eleftheria Kolokytha, Harry O'Donnell, Ektor Oikonomidis Doumpas, Eduardo Sanchez, Yao Lu, Pontus Stenetorp
arXiv ID
2606.06586
Category
cs.CL: Computation & Language
Citations
0
Venue
EMNLP 2026
Abstract
Large language models (LLMs) trained predominantly on English data encode substantial world knowledge, yet often fail to express it reliably in other languages, a phenomenon known as cross-lingual factual inconsistency. To study and address this, we introduce PolyFact, a large-scale parallel multilingual factual QA dataset containing 100K Wikidata-grounded facts across 12 typologically diverse languages. Using PolyFact, we compare light continual pretraining (CPT), supervised fine-tuning (SFT), and reinforcement learning via Group Relative Policy Optimization (GRPO) for improving cross-lingual factual recall in Qwen-2.5-7B and OLMo-2-1124-7B. We find that GRPO consistently outperforms SFT, improving both cross-lingual consistency and generalization to unseen languages, while CPT on parallel data yields limited additional gains. Mechanistic analyses further show that GRPO reorganizes multilingual routing by reducing language specialization in MLP layers and attention heads, thereby promoting more shared cross-lingual representations. We release our code, models, and dataset.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Computation & Language
๐
๐
Old Age
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
๐
๐
Old Age
XLNet: Generalized Autoregressive Pretraining for Language Understanding
๐ฎ
๐ฎ
The Ethereal
Effective Approaches to Attention-based Neural Machine Translation
๐
๐
Old Age
A large annotated corpus for learning natural language inference
๐
๐
Old Age