Privacy Vulnerabilities in Marginals-based Synthetic Data

October 07, 2024 Β· Declared Dead Β· πŸ› 2025 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML)

πŸ‘» CAUSE OF DEATH: Ghosted
No code link whatsoever

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Steven Golob, Sikha Pentyala, Anuar Maratkhan, Martine De Cock arXiv ID 2410.05506 Category cs.CR: Cryptography & Security Cross-listed cs.LG Citations 11 Venue 2025 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML) Last Checked 4 months ago
Abstract
When acting as a privacy-enhancing technology, synthetic data generation (SDG) aims to maintain a resemblance to the real data while excluding personally-identifiable information. Many SDG algorithms provide robust differential privacy (DP) guarantees to this end. However, we show that the strongest class of SDG algorithms--those that preserve \textit{marginal probabilities}, or similar statistics, from the underlying data--leak information about individuals that can be recovered more efficiently than previously understood. We demonstrate this by presenting a novel membership inference attack, MAMA-MIA, and evaluate it against three seminal DP SDG algorithms: MST, PrivBayes, and Private-GSD. MAMA-MIA leverages knowledge of which SDG algorithm was used, allowing it to learn information about the hidden data more accurately, and orders-of-magnitude faster, than other leading attacks. We use MAMA-MIA to lend insight into existing SDG vulnerabilities. Our approach went on to win the first SNAKE (SaNitization Algorithm under attacK ... $\varepsilon$) competition.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

πŸ“œ Similar Papers

In the same crypt β€” Cryptography & Security

Died the same way β€” πŸ‘» Ghosted