Using Score Distributions to Compare Statistical Significance Tests for Information Retrieval Evaluation
January 30, 2019 Β· Declared Dead Β· π J. Assoc. Inf. Sci. Technol.
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Javier Parapar, David E. Losada, Manuel A. Presedo-Quindimil, Alvaro Barreiro
arXiv ID
1901.10696
Category
cs.IR: Information Retrieval
Citations
23
Venue
J. Assoc. Inf. Sci. Technol.
Last Checked
4 months ago
Abstract
Statistical significance tests can provide evidence that the observed difference in performance between two methods is not due to chance. In Information Retrieval, some studies have examined the validity and suitability of such tests for comparing search systems. We argue here that current methods for assessing the reliability of statistical tests suffer from some methodological weaknesses, and we propose a novel way to study significance tests for retrieval evaluation. Using Score Distributions, we model the output of multiple search systems, produce simulated search results from such models, and compare them using various significance tests. A key strength of this approach is that we assess statistical tests under perfect knowledge about the truth or falseness of the null hypothesis. This new method for studying the power of significance tests in Information Retrieval evaluation is formal and innovative. Following this type of analysis, we found that both the sign test and Wilcoxon signed test have more power than the permutation test and the t-test. The sign test and Wilcoxon signed test also have a good behavior in terms of type I errors. The bootstrap test shows few type I errors, but it has less power than the other methods tested.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Information Retrieval
R.I.P.
π»
Ghosted
π
π
Old Age
Neural Graph Collaborative Filtering
R.I.P.
π»
Ghosted
DeepFM: A Factorization-Machine based Neural Network for CTR Prediction
R.I.P.
π»
Ghosted
BERT4Rec: Sequential Recommendation with Bidirectional Encoder Representations from Transformer
R.I.P.
π
404 Not Found
Graph Neural Networks for Social Recommendation
R.I.P.
π»
Ghosted
Personalized Top-N Sequential Recommendation via Convolutional Sequence Embedding
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
π»
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
π»
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
π»
Ghosted