On the Complexity of Sorted Neighborhood

January 08, 2015 ยท The Ethereal ยท ๐Ÿ› arXiv.org

๐Ÿ”ฎ THE ETHEREAL: The Ethereal
Pure theory โ€” exists on a plane beyond code

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Mayank Kejriwal, Daniel P. Miranker arXiv ID 1501.01696 Category cs.CC: Computational Complexity Cross-listed cs.DS Citations 3 Venue arXiv.org Last Checked 2 months ago
Abstract
Record linkage concerns identifying semantically equivalent records in databases. Blocking methods are employed to avoid the cost of full pairwise similarity comparisons on $n$ records. In a seminal work, Hernandez and Stolfo proposed the Sorted Neighborhood blocking method. Several empirical variants have been proposed in recent years. In this paper, we investigate the complexity of the Sorted Neighborhood procedure on which the variants are built. We show that achieving maximum performance on the Sorted Neighborhood procedure entails solving a sub-problem, which is shown to be NP-complete by reducing from the Travelling Salesman Problem. We also show that the sub-problem can occur in the traditional blocking method. Finally, we draw on recent developments concerning approximate Travelling Salesman solutions to define and analyze three approximation algorithms.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Computational Complexity