Zero-Shot Dense Retrieval with Embeddings from Relevance Feedback

October 28, 2024 · Declared Dead · 🏛 arXiv.org

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Nour Jedidi, Yung-Sung Chuang, Leslie Shing, James Glass arXiv ID 2410.21242 Category cs.IR: Information Retrieval Cross-listed cs.AI, cs.CL, cs.LG Citations 5 Venue arXiv.org Last Checked 4 months ago

Abstract

Building effective dense retrieval systems remains difficult when relevance supervision is not available. Recent work has looked to overcome this challenge by using a Large Language Model (LLM) to generate hypothetical documents that can be used to find the closest real document. However, this approach relies solely on the LLM to have domain-specific knowledge relevant to the query, which may not be practical. Furthermore, generating hypothetical documents can be inefficient as it requires the LLM to generate a large number of tokens for each query. To address these challenges, we introduce Real Document Embeddings from Relevance Feedback (ReDE-RF). Inspired by relevance feedback, ReDE-RF proposes to re-frame hypothetical document generation as a relevance estimation task, using an LLM to select which documents should be used for nearest neighbor search. Through this re-framing, the LLM no longer needs domain-specific knowledge but only needs to judge what is relevant. Additionally, relevance estimation only requires the LLM to output a single token, thereby improving search latency. Our experiments show that ReDE-RF consistently surpasses state-of-the-art zero-shot dense retrieval methods across a wide range of low-resource retrieval datasets while also making significant improvements in latency per-query.