Ranking Unraveled: Recipes for LLM Rankings in Head-to-Head AI Combat

November 19, 2024 ยท Declared Dead ยท ๐Ÿ› Annual Meeting of the Association for Computational Linguistics

๐Ÿ‘ป CAUSE OF DEATH: Ghosted
No code link whatsoever

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Roland Daynauth, Christopher Clarke, Krisztian Flautner, Lingjia Tang, Jason Mars arXiv ID 2411.14483 Category cs.CL: Computation & Language Cross-listed cs.AI Citations 5 Venue Annual Meeting of the Association for Computational Linguistics Last Checked 4 months ago
Abstract
Deciding which large language model (LLM) to use is a complex challenge. Pairwise ranking has emerged as a new method for evaluating human preferences for LLMs. This approach entails humans evaluating pairs of model outputs based on a predefined criterion. By collecting these comparisons, a ranking can be constructed using methods such as Elo. However, applying these algorithms as constructed in the context of LLM evaluation introduces several challenges. In this paper, we explore the effectiveness of ranking systems for head-to-head comparisons of LLMs. We formally define a set of fundamental principles for effective ranking and conduct a series of extensive evaluations on the robustness of several ranking algorithms in the context of LLMs. Our analysis uncovers key insights into the factors that affect ranking accuracy and efficiency, offering guidelines for selecting the most appropriate methods based on specific evaluation contexts and resource constraints.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Computation & Language

๐ŸŒ… ๐ŸŒ… Old Age

Attention Is All You Need

Ashish Vaswani, Noam Shazeer, ... (+6 more)

cs.CL ๐Ÿ› NeurIPS ๐Ÿ“š 166.0K cites 9 years ago

Died the same way โ€” ๐Ÿ‘ป Ghosted