A Gold Standard Dataset for the Reviewer Assignment Problem
March 23, 2023 Β· Declared Dead Β· π Trans. Mach. Learn. Res.
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Ivan Stelmakh, John Wieting, Sarina Xi, Graham Neubig, Nihar B. Shah
arXiv ID
2303.16750
Category
cs.IR: Information Retrieval
Cross-listed
cs.DL,
cs.LG
Citations
20
Venue
Trans. Mach. Learn. Res.
Last Checked
4 months ago
Abstract
Many peer-review venues are using algorithms to assign submissions to reviewers. The crux of such automated approaches is the notion of the "similarity score" -- a numerical estimate of the expertise of a reviewer in reviewing a paper -- and many algorithms have been proposed to compute these scores. However, these algorithms have not been subjected to a principled comparison, making it difficult for stakeholders to choose the algorithm in an evidence-based manner. The key challenge in comparing existing algorithms and developing better algorithms is the lack of publicly available gold-standard data. We address this challenge by collecting a novel dataset of similarity scores that we release to the research community. Our dataset consists of 477 self-reported expertise scores provided by 58 researchers who evaluated their expertise in reviewing papers they have read previously. Using our dataset, we compare several widely used similarity algorithms and offer key insights. First, all algorithms exhibit significant error, with misranking rates between 12%-30% in easier cases and 36%-43% in harder ones. Second, most specialized algorithms are designed to work with titles and abstracts of papers, and in this regime the SPECTER2 algorithm performs best. Interestingly, classical TF-IDF matches SPECTER2 in accuracy when given access to full submission texts. In contrast, off-the-shelf LLMs lag behind specialized approaches.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Information Retrieval
R.I.P.
π»
Ghosted
π
π
Old Age
Neural Graph Collaborative Filtering
R.I.P.
π»
Ghosted
DeepFM: A Factorization-Machine based Neural Network for CTR Prediction
R.I.P.
π»
Ghosted
BERT4Rec: Sequential Recommendation with Bidirectional Encoder Representations from Transformer
R.I.P.
π
404 Not Found
Graph Neural Networks for Social Recommendation
R.I.P.
π»
Ghosted
Personalized Top-N Sequential Recommendation via Convolutional Sequence Embedding
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
π»
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
π»
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
π»
Ghosted