The curious case of the test set AUROC

December 19, 2023 ยท Declared Dead ยท ๐Ÿ› Nature Machine Intelligence

๐Ÿ‘ป CAUSE OF DEATH: Ghosted
No code link whatsoever

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Michael Roberts, Alon Hazan, Sรถren Dittmer, James H. F. Rudd, Carola-Bibiane Schรถnlieb arXiv ID 2312.16188 Category cs.LG: Machine Learning Cross-listed stat.ME Citations 8 Venue Nature Machine Intelligence Last Checked 4 months ago
Abstract
Whilst the size and complexity of ML models have rapidly and significantly increased over the past decade, the methods for assessing their performance have not kept pace. In particular, among the many potential performance metrics, the ML community stubbornly continues to use (a) the area under the receiver operating characteristic curve (AUROC) for a validation and test cohort (distinct from training data) or (b) the sensitivity and specificity for the test data at an optimal threshold determined from the validation ROC. However, we argue that considering scores derived from the test ROC curve alone gives only a narrow insight into how a model performs and its ability to generalise.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Machine Learning

Died the same way โ€” ๐Ÿ‘ป Ghosted