A Workbench for Autograding Retrieve/Generate Systems

May 21, 2024 ยท Declared Dead ยท ๐Ÿ› Annual International ACM SIGIR Conference on Research and Development in Information Retrieval

๐Ÿ“œ CAUSE OF DEATH: Death by README
Repo has only a README

Repo contents: README.md

Authors Laura Dietz arXiv ID 2405.13177 Category cs.IR: Information Retrieval Citations 14 Venue Annual International ACM SIGIR Conference on Research and Development in Information Retrieval Repository https://github.com/TREMA-UNH/autograding-workbench Last Checked 1 month ago
Abstract
This resource paper addresses the challenge of evaluating Information Retrieval (IR) systems in the era of autoregressive Large Language Models (LLMs). Traditional methods relying on passage-level judgments are no longer effective due to the diversity of responses generated by LLM-based systems. We provide a workbench to explore several alternative evaluation approaches to judge the relevance of a system's response that incorporate LLMs: 1. Asking an LLM whether the response is relevant; 2. Asking the LLM which set of nuggets (i.e., relevant key facts) is covered in the response; 3. Asking the LLM to answer a set of exam questions with the response. This workbench aims to facilitate the development of new, reusable test collections. Researchers can manually refine sets of nuggets and exam questions, observing their impact on system evaluation and leaderboard rankings. Resource available at https://github.com/TREMA-UNH/autograding-workbench
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Information Retrieval

Died the same way โ€” ๐Ÿ“œ Death by README