On the Efficiency of ERM in Feature Learning
November 18, 2024 ยท Declared Dead ยท ๐ Neural Information Processing Systems
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Ayoub El Hanchi, Chris J. Maddison, Murat A. Erdogdu
arXiv ID
2411.12029
Category
stat.ML: Machine Learning (Stat)
Cross-listed
cs.LG,
math.ST
Citations
1
Venue
Neural Information Processing Systems
Last Checked
4 months ago
Abstract
Given a collection of feature maps indexed by a set $\mathcal{T}$, we study the performance of empirical risk minimization (ERM) on regression problems with square loss over the union of the linear classes induced by these feature maps. This setup aims at capturing the simplest instance of feature learning, where the model is expected to jointly learn from the data an appropriate feature map and a linear predictor. We start by studying the asymptotic quantiles of the excess risk of sequences of empirical risk minimizers. Remarkably, we show that when the set $\mathcal{T}$ is not too large and when there is a unique optimal feature map, these quantiles coincide, up to a factor of two, with those of the excess risk of the oracle procedure, which knows a priori this optimal feature map and deterministically outputs an empirical risk minimizer from the associated optimal linear class. We complement this asymptotic result with a non-asymptotic analysis that quantifies the decaying effect of the global complexity of the set $\mathcal{T}$ on the excess risk of ERM, and relates it to the size of the sublevel sets of the suboptimality of the feature maps. As an application of our results, we obtain new guarantees on the performance of the best subset selection procedure in sparse linear regression under general assumptions.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Machine Learning (Stat)
๐ฎ
๐ฎ
The Ethereal
๐ฎ
๐ฎ
The Ethereal
Layer Normalization
๐ฎ
๐ฎ
The Ethereal
Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles
R.I.P.
๐ป
Ghosted
Variational Inference with Normalizing Flows
๐
๐
The Cartographer
Towards A Rigorous Science of Interpretable Machine Learning
R.I.P.
๐ป
Ghosted
Optimization Methods for Large-Scale Machine Learning
Died the same way โ ๐ป Ghosted
R.I.P.
๐ป
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
๐ป
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
๐ป
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
๐ป
Ghosted