Visualization-Aware Sampling for Very Large Databases

October 13, 2015 ยท Declared Dead ยท ๐Ÿ› IEEE International Conference on Data Engineering

๐Ÿ‘ป CAUSE OF DEATH: Ghosted
No code link whatsoever

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Yongjoo Park, Michael Cafarella, Barzan Mozafari arXiv ID 1510.03921 Category cs.DB: Databases Citations 129 Venue IEEE International Conference on Data Engineering Last Checked 2 months ago
Abstract
Interactive visualizations are crucial in ad hoc data exploration and analysis. However, with the growing number of massive datasets, generating visualizations in interactive timescales is increasingly challenging. One approach for improving the speed of the visualization tool is via data reduction in order to reduce the computational overhead, but at a potential cost in visualization accuracy. Common data reduction techniques, such as uniform and stratified sampling, do not exploit the fact that the sampled tuples will be transformed into a visualization for human consumption. We propose a visualization-aware sampling (VAS) that guarantees high quality visualizations with a small subset of the entire dataset. We validate our method when applied to scatter and map plots for three common visualization goals: regression, density estimation, and clustering. The key to our sampling method's success is in choosing tuples which minimize a visualization-inspired loss function. Our user study confirms that optimizing this loss function correlates strongly with user success in using the resulting visualizations. We also show the NP-hardness of our optimization problem and propose an efficient approximation algorithm. Our experiments show that, compared to previous methods, (i) using the same sample size, VAS improves user's success by up to 35% in various visualization tasks, and (ii) VAS can achieve a required visualization quality up to 400 times faster.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Databases

R.I.P. ๐Ÿ‘ป Ghosted

Datasheets for Datasets

Timnit Gebru, Jamie Morgenstern, ... (+5 more)

cs.DB ๐Ÿ› CACM ๐Ÿ“š 2.6K cites 8 years ago

Died the same way โ€” ๐Ÿ‘ป Ghosted