Convergence of Graph Laplacian with kNN Self-tuned Kernels
November 03, 2020 Β· Declared Dead Β· π Information and Inference A Journal of the IMA
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Xiuyuan Cheng, Hau-Tieng Wu
arXiv ID
2011.01479
Category
math.ST
Cross-listed
cs.LG,
stat.ML
Citations
27
Venue
Information and Inference A Journal of the IMA
Last Checked
2 months ago
Abstract
Kernelized Gram matrix $W$ constructed from data points $\{x_i\}_{i=1}^N$ as $W_{ij}= k_0( \frac{ \| x_i - x_j \|^2} {Ο^2} )$ is widely used in graph-based geometric data analysis and unsupervised learning. An important question is how to choose the kernel bandwidth $Ο$, and a common practice called self-tuned kernel adaptively sets a $Ο_i$ at each point $x_i$ by the $k$-nearest neighbor (kNN) distance. When $x_i$'s are sampled from a $d$-dimensional manifold embedded in a possibly high-dimensional space, unlike with fixed-bandwidth kernels, theoretical results of graph Laplacian convergence with self-tuned kernels have been incomplete. This paper proves the convergence of graph Laplacian operator $L_N$ to manifold (weighted-)Laplacian for a new family of kNN self-tuned kernels $W^{(Ξ±)}_{ij} = k_0( \frac{ \| x_i - x_j \|^2}{ Ξ΅\hatΟ(x_i) \hatΟ(x_j)})/\hatΟ(x_i)^Ξ±\hatΟ(x_j)^Ξ±$, where $\hatΟ$ is the estimated bandwidth function {by kNN}, and the limiting operator is also parametrized by $Ξ±$. When $Ξ±= 1$, the limiting operator is the weighted manifold Laplacian $Ξ_p$. Specifically, we prove the point-wise convergence of $L_N f $ and convergence of the graph Dirichlet form with rates. Our analysis is based on first establishing a $C^0$ consistency for $\hatΟ$ which bounds the relative estimation error $|\hatΟ - \barΟ|/\barΟ$ uniformly with high probability, where $\barΟ = p^{-1/d}$, and $p$ is the data density function. Our theoretical results reveal the advantage of self-tuned kernel over fixed-bandwidth kernel via smaller variance error in low-density regions. In the algorithm, no prior knowledge of $d$ or data density is needed. The theoretical results are supported by numerical experiments on simulated data and hand-written digit image data.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β math.ST
R.I.P.
π»
Ghosted
R.I.P.
π»
Ghosted
An introduction to Topological Data Analysis: fundamental and practical aspects for data scientists
R.I.P.
π»
Ghosted
Minimax Optimal Procedures for Locally Private Estimation
R.I.P.
π»
Ghosted
Optimal Best Arm Identification with Fixed Confidence
R.I.P.
π»
Ghosted
Fast low-rank estimation by projected gradient descent: General statistical and algorithmic guarantees
R.I.P.
π»
Ghosted
User-friendly guarantees for the Langevin Monte Carlo with inaccurate gradient
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Language Models are Few-Shot Learners
R.I.P.
π»
Ghosted
PyTorch: An Imperative Style, High-Performance Deep Learning Library
R.I.P.
π»
Ghosted
XGBoost: A Scalable Tree Boosting System
R.I.P.
π»
Ghosted