Determinantal Point Processes for Coresets
March 23, 2018 ยท Declared Dead ยท ๐ Journal of machine learning research
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Nicolas Tremblay, Simon Barthelmรฉ, Pierre-Olivier Amblard
arXiv ID
1803.08700
Category
stat.ML: Machine Learning (Stat)
Cross-listed
cs.DS,
cs.LG
Citations
35
Venue
Journal of machine learning research
Last Checked
4 months ago
Abstract
When faced with a data set too large to be processed all at once, an obvious solution is to retain only part of it. In practice this takes a wide variety of different forms, and among them "coresets" are especially appealing. A coreset is a (small) weighted sample of the original data that comes with the following guarantee: a cost function can be evaluated on the smaller set instead of the larger one, with low relative error. For some classes of problems, and via a careful choice of sampling distribution (based on the so-called "sensitivity" metric), iid random sampling has turned to be one of the most successful methods for building coresets efficiently. However, independent samples are sometimes overly redundant, and one could hope that enforcing diversity would lead to better performance. The difficulty lies in proving coreset properties in non-iid samples. We show that the coreset property holds for samples formed with determinantal point processes (DPP). DPPs are interesting because they are a rare example of repulsive point processes with tractable theoretical properties, enabling us to prove general coreset theorems. We apply our results to both the k-means and the linear regression problems, and give extensive empirical evidence that the small additional computational cost of DPP sampling comes with superior performance over its iid counterpart. Of independent interest, we also provide analytical formulas for the sensitivity in the linear regression and 1-means cases.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Machine Learning (Stat)
๐ฎ
๐ฎ
The Ethereal
๐ฎ
๐ฎ
The Ethereal
Layer Normalization
๐ฎ
๐ฎ
The Ethereal
Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles
R.I.P.
๐ป
Ghosted
Variational Inference with Normalizing Flows
๐
๐
The Cartographer
Towards A Rigorous Science of Interpretable Machine Learning
R.I.P.
๐ป
Ghosted
Optimization Methods for Large-Scale Machine Learning
Died the same way โ ๐ป Ghosted
R.I.P.
๐ป
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
๐ป
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
๐ป
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
๐ป
Ghosted