Data-Dependent LSH for the Earth Mover's Distance
March 08, 2024 Β· Declared Dead Β· π Symposium on the Theory of Computing
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Rajesh Jayaram, Erik Waingarten, Tian Zhang
arXiv ID
2403.05041
Category
cs.DS: Data Structures & Algorithms
Citations
6
Venue
Symposium on the Theory of Computing
Last Checked
4 months ago
Abstract
We give new data-dependent locality sensitive hashing schemes (LSH) for the Earth Mover's Distance ($\mathsf{EMD}$), and as a result, improve the best approximation for nearest neighbor search under $\mathsf{EMD}$ by a quadratic factor. Here, the metric $\mathsf{EMD}_s(\mathbb{R}^d,\ell_p)$ consists of sets of $s$ vectors in $\mathbb{R}^d$, and for any two sets $x,y$ of $s$ vectors the distance $\mathsf{EMD}(x,y)$ is the minimum cost of a perfect matching between $x,y$, where the cost of matching two vectors is their $\ell_p$ distance. Previously, Andoni, Indyk, and Krauthgamer gave a (data-independent) locality-sensitive hashing scheme for $\mathsf{EMD}_s(\mathbb{R}^d,\ell_p)$ when $p \in [1,2]$ with approximation $O(\log^2 s)$. By being data-dependent, we improve the approximation to $\tilde{O}(\log s)$. Our main technical contribution is to show that for any distribution $ΞΌ$ supported on the metric $\mathsf{EMD}_s(\mathbb{R}^d, \ell_p)$, there exists a data-dependent LSH for dense regions of $ΞΌ$ which achieves approximation $\tilde{O}(\log s)$, and that the data-independent LSH actually achieves a $\tilde{O}(\log s)$-approximation outside of those dense regions. Finally, we show how to "glue" together these two hashing schemes without any additional loss in the approximation. Beyond nearest neighbor search, our data-dependent LSH also gives optimal (distributional) sketches for the Earth Mover's Distance. By known sketching lower bounds, this implies that our LSH is optimal (up to $\mathrm{poly}(\log \log s)$ factors) among those that collide close points with constant probability.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Data Structures & Algorithms
π
π
The Cartographer
R.I.P.
π»
Ghosted
Route Planning in Transportation Networks
R.I.P.
π»
Ghosted
Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration
R.I.P.
π»
Ghosted
Hierarchical Clustering: Objective Functions and Algorithms
R.I.P.
π»
Ghosted
Graph Isomorphism in Quasipolynomial Time
π
π
The Cartographer
Simulation optimization: A review of algorithms and applications
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
π»
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
π»
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
π»
Ghosted