ScoreStop: Gradient-based early stopping using functional score tests

June 01, 2026 · Grace Period · 🏛 the International Conference on Machine Learning 2026 Workshop on Hypothesis Testing

Authors Oliver J. Hines, Christian L. Hines arXiv ID 2606.02740 Category stat.ML: Machine Learning (Stat) Cross-listed cs.LG Citations 0 Venue the International Conference on Machine Learning 2026 Workshop on Hypothesis Testing

Abstract

Gradient boosted decision trees require a stopping rule to avoid overfitting. The standard rule monitors a validation loss and stops if the loss fails to improve for a fixed patience period. However, the patience parameter has no interpretable scale and validation losses can be noisy or implicitly defined by a user-specified gradient. We propose ScoreStop, a gradient-based early-stopping rule that casts the stopping decision at each iteration as a test of the null hypothesis that the current predictor is the population risk minimizer. We use a functional score test, computed on validation data, with a statistic that is scale-invariant in the update direction, with a known asymptotic distribution under the null. Because our test uses gradients rather than loss values, the same construction applies to implicit losses such as LambdaRank, and data-dependent losses such as Cox regression via influence functions. In synthetic experiments and real-data benchmarks, we show that ScoreStop is competitive with loss-based methods.