On the Privacy Properties of Variants on the Sparse Vector Technique

August 28, 2015 · Declared Dead · 🏛 arXiv.org

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Yan Chen, Ashwin Machanavajjhala arXiv ID 1508.07306 Category cs.DB: Databases Cross-listed cs.CR Citations 34 Venue arXiv.org Last Checked 2 months ago

Abstract

The sparse vector technique is a powerful differentially private primitive that allows an analyst to check whether queries in a stream are greater or lesser than a threshold. This technique has a unique property -- the algorithm works by adding noise with a finite variance to the queries and the threshold, and guarantees privacy that only degrades with (a) the maximum sensitivity of any one query in stream, and (b) the number of positive answers output by the algorithm. Recent work has developed variants of this algorithm, which we call {\em generalized private threshold testing}, and are claimed to have privacy guarantees that do not depend on the number of positive or negative answers output by the algorithm. These algorithms result in a significant improvement in utility over the sparse vector technique for a given privacy budget, and have found applications in frequent itemset mining, feature selection in machine learning and generating synthetic data. In this paper we critically analyze the privacy properties of generalized private threshold testing. We show that generalized private threshold testing does not satisfy ε-differential privacy for any finite ε. We identify a subtle error in the privacy analysis of this technique in prior work. Moreover, we show an adversary can use generalized private threshold testing to recover counts from the datasets (especially small counts) exactly with high accuracy, and thus can result in individuals being reidentified. We demonstrate our attacks empirically on real datasets.