Bias-Aware Sketches

October 25, 2016 · Declared Dead · 🏛 Proceedings of the VLDB Endowment

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Jiecao Chen, Qin Zhang arXiv ID 1610.07718 Category cs.DS: Data Structures & Algorithms Citations 23 Venue Proceedings of the VLDB Endowment Last Checked 4 months ago

Abstract

Linear sketching algorithms have been widely used for processing large-scale distributed and streaming datasets. Their popularity is largely due to the fact that linear sketches can be naturally composed in the distributed model and be efficiently updated in the streaming model. The errors of linear sketches are typically expressed in terms of the sum of coordinates of the input vector excluding those largest ones, or, the mass on the tail of the vector. Thus, the precondition for these algorithms to perform well is that the mass on the tail is small, which is, however, not always the case -- in many real-world datasets the coordinates of the input vector have a {\em bias}, which will generate a large mass on the tail. In this paper we propose linear sketches that are {\em bias-aware}. We rigorously prove that they achieve strictly better error guarantees than the corresponding existing sketches, and demonstrate their practicality and superiority via an extensive experimental evaluation on both real and synthetic datasets.

📄 View on arXiv 🌐 View on ar5iv 📑 PDF 🎉 Report Code Found

Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

📜 Similar Papers

In the same crypt — Data Structures & Algorithms

📚 📚 The Cartographer

Relief-Based Feature Selection: Introduction and Review

Ryan J. Urbanowicz, Melissa Meeker, ... (+3 more)

cs.DS 🏛 J.BI 📚 1.1K cites 8 years ago

R.I.P. 👻 Ghosted

Route Planning in Transportation Networks

Hannah Bast, Daniel Delling, ... (+6 more)

cs.DS 🏛 Algorithm Engineering 📚 759 cites 11 years ago

R.I.P. 👻 Ghosted

Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration

Jason Altschuler, Jonathan Weed, Philippe Rigollet

cs.DS 🏛 NeurIPS 📚 654 cites 9 years ago

R.I.P. 👻 Ghosted

Hierarchical Clustering: Objective Functions and Algorithms

Vincent Cohen-Addad, Varun Kanade, ... (+2 more)

cs.DS 🏛 SODA 📚 637 cites 9 years ago

R.I.P. 👻 Ghosted

Graph Isomorphism in Quasipolynomial Time

László Babai

cs.DS 🏛 STOC 📚 616 cites 10 years ago

📚 📚 The Cartographer

Simulation optimization: A review of algorithms and applications

Satyajith Amaran, Nikolaos V. Sahinidis, ... (+2 more)

cs.DS 🏛 4OR 📚 588 cites 8 years ago

Died the same way — 👻 Ghosted

R.I.P. 👻 Ghosted

Federated Learning: Strategies for Improving Communication Efficiency

Jakub Konečný, H. Brendan McMahan, ... (+4 more)

cs.LG 🏛 arXiv 📚 5.2K cites 9 years ago

R.I.P. 👻 Ghosted

In-Datacenter Performance Analysis of a Tensor Processing Unit

Norman P. Jouppi, Cliff Young, ... (+73 more)

cs.AR 🏛 ISCA 📚 5.1K cites 9 years ago

R.I.P. 👻 Ghosted

Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning

Hoo-Chang Shin, Holger R. Roth, ... (+7 more)

cs.CV 🏛 IEEE TMI 📚 4.9K cites 10 years ago

R.I.P. 👻 Ghosted

Explanation in Artificial Intelligence: Insights from the Social Sciences

Tim Miller

cs.AI 🏛 AI 📚 4.9K cites 9 years ago