How Much Data Are Augmentations Worth? An Investigation into Scaling Laws, Invariance, and Implicit Regularization
October 12, 2022 ยท Entered Twilight ยท ๐ International Conference on Learning Representations
Repo contents: .flake8, .gitignore, .pre-commit-config.yaml, LICENSE, README.md, config, dataaug, environment_minimal.yml, fig0_scaling_baselines.sh, fig1_scaling_repetitions.sh, fig2_all_augmentations.sh, fig3a_scaling_model_width.sh, fig3b_scaling_model_type.sh, fig4_scaling_invariant_archs.sh, pyproject.toml, setup.cfg, train_sgd_variant.py
Authors
Jonas Geiping, Micah Goldblum, Gowthami Somepalli, Ravid Shwartz-Ziv, Tom Goldstein, Andrew Gordon Wilson
arXiv ID
2210.06441
Category
cs.LG: Machine Learning
Cross-listed
cs.CV
Citations
53
Venue
International Conference on Learning Representations
Repository
https://github.com/JonasGeiping/dataaugs
โญ 18
Last Checked
1 month ago
Abstract
Despite the clear performance benefits of data augmentations, little is known about why they are so effective. In this paper, we disentangle several key mechanisms through which data augmentations operate. Establishing an exchange rate between augmented and additional real data, we find that in out-of-distribution testing scenarios, augmentations which yield samples that are diverse, but inconsistent with the data distribution can be even more valuable than additional training data. Moreover, we find that data augmentations which encourage invariances can be more valuable than invariance alone, especially on small and medium sized training sets. Following this observation, we show that augmentations induce additional stochasticity during training, effectively flattening the loss landscape.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Machine Learning
R.I.P.
๐ป
Ghosted
R.I.P.
๐ป
Ghosted
XGBoost: A Scalable Tree Boosting System
R.I.P.
๐ป
Ghosted
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
R.I.P.
๐ป
Ghosted
Semi-Supervised Classification with Graph Convolutional Networks
R.I.P.
๐ป
Ghosted
Proximal Policy Optimization Algorithms
R.I.P.
๐ป
Ghosted