Training Socially Aligned Language Models on Simulated Social Interactions
May 26, 2023 ยท Entered Twilight ยท ๐ International Conference on Learning Representations
Repo contents: .gitattributes, .github, .gitignore, .pre-commit-config.yaml, LICENSE, MANIFEST.in, Makefile, README.md, assets, collect_data.py, requirements.txt, run_inference.py, setup.cfg, setup.py, stable_alignment, test
Authors
Ruibo Liu, Ruixin Yang, Chenyan Jia, Ge Zhang, Denny Zhou, Andrew M. Dai, Diyi Yang, Soroush Vosoughi
arXiv ID
2305.16960
Category
cs.CL: Computation & Language
Cross-listed
cs.AI,
cs.CY,
cs.HC
Citations
91
Venue
International Conference on Learning Representations
Repository
https://github.com/agi-templar/Stable-Alignment
โญ 354
Last Checked
1 month ago
Abstract
Social alignment in AI systems aims to ensure that these models behave according to established societal values. However, unlike humans, who derive consensus on value judgments through social interaction, current language models (LMs) are trained to rigidly replicate their training corpus in isolation, leading to subpar generalization in unfamiliar scenarios and vulnerability to adversarial attacks. This work presents a novel training paradigm that permits LMs to learn from simulated social interactions. In comparison to existing methodologies, our approach is considerably more scalable and efficient, demonstrating superior performance in alignment benchmarks and human evaluations. This paradigm shift in the training of LMs brings us a step closer to developing AI systems that can robustly and accurately reflect societal norms and values.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Computation & Language
๐
๐
Old Age
๐
๐
Old Age
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
R.I.P.
๐ป
Ghosted
Language Models are Few-Shot Learners
R.I.P.
๐ป
Ghosted
RoBERTa: A Robustly Optimized BERT Pretraining Approach
R.I.P.
๐ป
Ghosted
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
R.I.P.
๐ป
Ghosted