Finding Favourite Tuples on Data Streams with Provably Few Comparisons
July 06, 2023 Β· Declared Dead Β· π Knowledge Discovery and Data Mining
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Guangyi Zhang, Nikolaj Tatti, Aristides Gionis
arXiv ID
2307.02946
Category
cs.DB: Databases
Cross-listed
cs.DS
Citations
3
Venue
Knowledge Discovery and Data Mining
Last Checked
4 months ago
Abstract
One of the most fundamental tasks in data science is to assist a user with unknown preferences in finding high-utility tuples within a large database. To accurately elicit the unknown user preferences, a widely-adopted way is by asking the user to compare pairs of tuples. In this paper, we study the problem of identifying one or more high-utility tuples by adaptively receiving user input on a minimum number of pairwise comparisons. We devise a single-pass streaming algorithm, which processes each tuple in the stream at most once, while ensuring that the memory size and the number of requested comparisons are in the worst case logarithmic in $n$, where $n$ is the number of all tuples. An important variant of the problem, which can help to reduce human error in comparisons, is to allow users to declare ties when confronted with pairs of tuples of nearly equal utility. We show that the theoretical guarantees of our method can be maintained for this important problem variant. In addition, we show how to enhance existing pruning techniques in the literature by leveraging powerful tools from mathematical programming. Finally, we systematically evaluate all proposed algorithms over both synthetic and real-life datasets, examine their scalability, and demonstrate their superior performance over existing methods.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Databases
R.I.P.
π»
Ghosted
R.I.P.
π»
Ghosted
Untangling Blockchain: A Data Processing View of Blockchain Systems
R.I.P.
π»
Ghosted
Converting Static Image Datasets to Spiking Neuromorphic Datasets Using Saccades
R.I.P.
π»
Ghosted
BLOCKBENCH: A Framework for Analyzing Private Blockchains
R.I.P.
π»
Ghosted
Data Synthesis based on Generative Adversarial Networks
R.I.P.
π»
Ghosted
HoloClean: Holistic Data Repairs with Probabilistic Inference
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
π»
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
π»
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
π»
Ghosted