QSketch: An Efficient Sketch for Weighted Cardinality Estimation in Streams

June 27, 2024 · Declared Dead · 🏛 Knowledge Discovery and Data Mining

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Yiyan Qi, Rundong Li, Pinghui Wang, Yufang Sun, Rui Xing arXiv ID 2406.19143 Category cs.DB: Databases Cross-listed cs.DS Citations 2 Venue Knowledge Discovery and Data Mining Last Checked 4 months ago

Abstract

Estimating cardinality, i.e., the number of distinct elements, of a data stream is a fundamental problem in areas like databases, computer networks, and information retrieval. This study delves into a broader scenario where each element carries a positive weight. Unlike traditional cardinality estimation, limited research exists on weighted cardinality, with current methods requiring substantial memory and computational resources, challenging for devices with limited capabilities and real-time applications like anomaly detection. To address these issues, we propose QSketch, a memory-efficient sketch method for estimating weighted cardinality in streams. QSketch uses a quantization technique to condense continuous variables into a compact set of integer variables, with each variable requiring only 8 bits, making it 8 times smaller than previous methods. Furthermore, we leverage dynamic properties during QSketch generation to significantly enhance estimation accuracy and achieve a lower time complexity of $O(1)$ for updating estimations upon encountering a new element. Experimental results on synthetic and real-world datasets show that QSketch is approximately 30\% more accurate and two orders of magnitude faster than the state-of-the-art, using only $1/8$ of the memory.