๐ฎ
๐ฎ
The Ethereal
On the Complexity of Sorted Neighborhood
January 08, 2015 ยท The Ethereal ยท ๐ arXiv.org
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Mayank Kejriwal, Daniel P. Miranker
arXiv ID
1501.01696
Category
cs.CC: Computational Complexity
Cross-listed
cs.DS
Citations
3
Venue
arXiv.org
Last Checked
2 months ago
Abstract
Record linkage concerns identifying semantically equivalent records in databases. Blocking methods are employed to avoid the cost of full pairwise similarity comparisons on $n$ records. In a seminal work, Hernandez and Stolfo proposed the Sorted Neighborhood blocking method. Several empirical variants have been proposed in recent years. In this paper, we investigate the complexity of the Sorted Neighborhood procedure on which the variants are built. We show that achieving maximum performance on the Sorted Neighborhood procedure entails solving a sub-problem, which is shown to be NP-complete by reducing from the Travelling Salesman Problem. We also show that the sub-problem can occur in the traditional blocking method. Finally, we draw on recent developments concerning approximate Travelling Salesman solutions to define and analyze three approximation algorithms.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Computational Complexity
๐ฎ
๐ฎ
The Ethereal
An Exponential Separation Between Randomized and Deterministic Complexity in the LOCAL Model
๐ฎ
๐ฎ
The Ethereal
The Parallelism Tradeoff: Limitations of Log-Precision Transformers
๐ฎ
๐ฎ
The Ethereal
The Hardness of Approximation of Euclidean k-means
๐ฎ
๐ฎ
The Ethereal
Slightly Superexponential Parameterized Problems
๐ฎ
๐ฎ
The Ethereal