New advances in universal approximation with neural networks of minimal width

November 13, 2024 · Declared Dead · 🏛 arXiv.org

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Dennis Rochau, Robin Chan, Hanno Gottschalk arXiv ID 2411.08735 Category cs.NE: Neural & Evolutionary Cross-listed math.FA Citations 3 Venue arXiv.org Last Checked 4 months ago

Abstract

We prove several universal approximation results at minimal or near-minimal width for approximation of $L^p(\mathbb{R}^{d_x}, \mathbb{R}^{d_y})$ and $C^0(\mathbb{R}^{d_x}, \mathbb{R}^{d_y})$ on compact sets. Our approach uses a unified coding scheme that yields explicit constructions relying only on standard analytic tools. We show that feedforward neural networks with two leaky ReLU activations $σ_α$, $σ_{-α}$ achieve the optimal width $\max\{d_x, d_y\}$ for $L^p$ approximation, while a single leaky ReLU $σ_α$ achieves width $\max\{2, d_x, d_y\}$, providing an alternative proof of the results of Cai et al. (2023). By generalizing to stepped leaky ReLU activations, we extend these results to uniform approximation of continuous functions while identifying sets of activation functions compatible with gradient-based training. Since our constructions pass through an intermediate dimension of one, they imply that autoencoders with a one-dimensional feature space are universal approximators. We further show that squashable activations combined with FLOOR achieve width $\max\{3, d_x, d_y\}$ for uniform approximation. We also establish a lower bound of $\max\{d_x, d_y\} + 1$ for networks when all activations are continuous and monotone and $d_y \leq 2d_x$. Moreover, we extend our results to invertible LU-decomposable networks, proving distributional universal approximation for LU-Net normalizing flows and providing a constructive proof of the classical theorem of Brenier and Gangbo on $L^p$ approximation by diffeomorphisms.