RLHF: A comprehensive Survey for Cultural, Multimodal and Low Latency Alignment Methods

November 06, 2025 ยท The Cartographer ยท ๐Ÿ› arXiv.org

๐Ÿ“š THE CARTOGRAPHER: The Cartographer
Survey/review paper โ€” maps the landscape rather than implementing a method.

"No code URL or promise found in abstract"
"Title-pattern auto-detect: RLHF: A comprehensive Survey for Cultural, Multimodal and Low Latency Alignment Methods"

Evidence collected by the PWNC Scanner

Authors Raghav Sharma, Manan Mehta, Sai Tiger Raina arXiv ID 2511.03939 Category cs.LG: Machine Learning Cross-listed cs.AI, cs.CL Citations 1 Venue arXiv.org Last Checked 4 days ago
Abstract
Reinforcement Learning from Human Feedback (RLHF) is the standard for aligning Large Language Models (LLMs), yet recent progress has moved beyond canonical text-based methods. This survey synthesizes the new frontier of alignment research by addressing critical gaps in multi-modal alignment, cultural fairness, and low-latency optimization. To systematically explore these domains, we first review foundational algo- rithms, including PPO, DPO, and GRPO, before presenting a detailed analysis of the latest innovations. By providing a comparative synthesis of these techniques and outlining open challenges, this work serves as an essential roadmap for researchers building more robust, efficient, and equitable AI systems.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Machine Learning