Near Optimal Sketching of Low-Rank Tensor Regression

September 20, 2017 · Declared Dead · 🏛 Neural Information Processing Systems

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Jarvis Haupt, Xingguo Li, David P. Woodruff arXiv ID 1709.07093 Category cs.LG: Machine Learning Cross-listed cs.DS, stat.ML Citations 37 Venue Neural Information Processing Systems Last Checked 3 months ago

Abstract

We study the least squares regression problem \begin{align*} \min_{Θ\in \mathcal{S}_{\odot D,R}} \|AΘ-b\|_2, \end{align*} where $\mathcal{S}_{\odot D,R}$ is the set of $Θ$ for which $Θ= \sum_{r=1}^{R} θ_1^{(r)} \circ \cdots \circ θ_D^{(r)}$ for vectors $θ_d^{(r)} \in \mathbb{R}^{p_d}$ for all $r \in [R]$ and $d \in [D]$, and $\circ$ denotes the outer product of vectors. That is, $Θ$ is a low-dimensional, low-rank tensor. This is motivated by the fact that the number of parameters in $Θ$ is only $R \cdot \sum_{d=1}^D p_d$, which is significantly smaller than the $\prod_{d=1}^{D} p_d$ number of parameters in ordinary least squares regression. We consider the above CP decomposition model of tensors $Θ$, as well as the Tucker decomposition. For both models we show how to apply data dimensionality reduction techniques based on {\it sparse} random projections $Φ\in \mathbb{R}^{m \times n}$, with $m \ll n$, to reduce the problem to a much smaller problem $\min_Θ \|ΦA Θ- Φb\|_2$, for which if $Θ'$ is a near-optimum to the smaller problem, then it is also a near optimum to the original problem. We obtain significantly smaller dimension and sparsity in $Φ$ than is possible for ordinary least squares regression, and we also provide a number of numerical simulations supporting our theory.