Training Socially Aligned Language Models on Simulated Social Interactions

May 26, 2023 ยท Entered Twilight ยท ๐Ÿ› International Conference on Learning Representations

๐Ÿ’ค TWILIGHT: Eternal Rest
Repo abandoned since publication

Repo contents: .gitattributes, .github, .gitignore, .pre-commit-config.yaml, LICENSE, MANIFEST.in, Makefile, README.md, assets, collect_data.py, requirements.txt, run_inference.py, setup.cfg, setup.py, stable_alignment, test

Authors Ruibo Liu, Ruixin Yang, Chenyan Jia, Ge Zhang, Denny Zhou, Andrew M. Dai, Diyi Yang, Soroush Vosoughi arXiv ID 2305.16960 Category cs.CL: Computation & Language Cross-listed cs.AI, cs.CY, cs.HC Citations 91 Venue International Conference on Learning Representations Repository https://github.com/agi-templar/Stable-Alignment โญ 354 Last Checked 1 month ago
Abstract
Social alignment in AI systems aims to ensure that these models behave according to established societal values. However, unlike humans, who derive consensus on value judgments through social interaction, current language models (LMs) are trained to rigidly replicate their training corpus in isolation, leading to subpar generalization in unfamiliar scenarios and vulnerability to adversarial attacks. This work presents a novel training paradigm that permits LMs to learn from simulated social interactions. In comparison to existing methodologies, our approach is considerably more scalable and efficient, demonstrating superior performance in alignment benchmarks and human evaluations. This paradigm shift in the training of LMs brings us a step closer to developing AI systems that can robustly and accurately reflect societal norms and values.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Computation & Language

๐ŸŒ… ๐ŸŒ… Old Age

Attention Is All You Need

Ashish Vaswani, Noam Shazeer, ... (+6 more)

cs.CL ๐Ÿ› NeurIPS ๐Ÿ“š 166.0K cites 8 years ago