Towards Better Bounds for Finding Quasi-Identifiers
November 25, 2022 Β· Declared Dead Β· π ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Ryan Hildebrant, Quoc-Tung Le, Duy-Hoang Ta, Hoa T. Vu
arXiv ID
2211.13882
Category
cs.DS: Data Structures & Algorithms
Citations
2
Venue
ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems
Last Checked
3 months ago
Abstract
We revisit the problem of finding small $Ξ΅$-separation keys introduced by Motwani and Xu (2008). In this problem, the input is $m$-dimensional tuples $x_1,x_2,\ldots,x_n $. The goal is to find a small subset of coordinates that separates at least $(1-Ξ΅){n \choose 2}$ pairs of tuples. They provided a fast algorithm that runs on $Ξ(m/Ξ΅)$ tuples sampled uniformly at random. We show that the sample size can be improved to $Ξ(m/\sqrtΞ΅)$. Our algorithm also enjoys a faster running time. To obtain this result, we provide upper and lower bounds on the sample size to solve the following decision problem. Given a subset of coordinates $A$, reject if $A$ separates fewer than $(1-Ξ΅){n \choose 2}$ pairs, and accept if $A$ separates all pairs. The algorithm must be correct with probability at least $1-Ξ΄$ for all $A$. We show that for algorithms based on sampling: - $Ξ(m/\sqrtΞ΅)$ samples are sufficient and necessary so that $Ξ΄\leq e^{-m}$ and - $Ξ©(\sqrt{\frac{\log m}Ξ΅})$ samples are necessary so that $Ξ΄$ is a constant. Our analysis is based on a constrained version of the balls-into-bins problem. We believe our analysis may be of independent interest. We also study a related problem that asks for the following sketching algorithm: with given parameters $Ξ±,k$ and $Ξ΅$, the algorithm takes a subset of coordinates $A$ of size at most $k$ and returns an estimate of the number of unseparated pairs in $A$ up to a $(1\pmΞ΅)$ factor if it is at least $Ξ±{n \choose 2}$. We show that even for constant $Ξ±$ and success probability, such a sketching algorithm must use $Ξ©(mk \log Ξ΅^{-1})$ bits of space; on the other hand, uniform sampling yields a sketch of size $Ξ(\frac{mk \log m}{Ξ±Ξ΅^2})$ for this purpose.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Data Structures & Algorithms
π
π
The Cartographer
R.I.P.
π»
Ghosted
Route Planning in Transportation Networks
R.I.P.
π»
Ghosted
Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration
R.I.P.
π»
Ghosted
Hierarchical Clustering: Objective Functions and Algorithms
R.I.P.
π»
Ghosted
Graph Isomorphism in Quasipolynomial Time
π
π
The Cartographer
Simulation optimization: A review of algorithms and applications
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
π»
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
π»
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
π»
Ghosted