Extrapolating the profile of a finite population

May 21, 2020 · Declared Dead · 🏛 Annual Conference Computational Learning Theory

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Soham Jana, Yury Polyanskiy, Yihong Wu arXiv ID 2005.10561 Category math.ST Cross-listed cs.IT, stat.ML Citations 6 Venue Annual Conference Computational Learning Theory Last Checked 2 months ago

Abstract

We study a prototypical problem in empirical Bayes. Namely, consider a population consisting of $k$ individuals each belonging to one of $k$ types (some types can be empty). Without any structural restrictions, it is impossible to learn the composition of the full population having observed only a small (random) subsample of size $m = o(k)$. Nevertheless, we show that in the sublinear regime of $m =ω(k/\log k)$, it is possible to consistently estimate in total variation the \emph{profile} of the population, defined as the empirical distribution of the sizes of each type, which determines many symmetric properties of the population. We also prove that in the linear regime of $m=c k$ for any constant $c$ the optimal rate is $Θ(1/\log k)$. Our estimator is based on Wolfowitz's minimum distance method, which entails solving a linear program (LP) of size $k$. We show that there is a single infinite-dimensional LP whose value simultaneously characterizes the risk of the minimum distance estimator and certifies its minimax optimality. The sharp convergence rate is obtained by evaluating this LP using complex-analytic techniques.

📄 View on arXiv 🌐 View on ar5iv 📑 PDF 🎉 Report Code Found

Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

📜 Similar Papers

In the same crypt — math.ST

R.I.P. 👻 Ghosted

Nonparametric regression using deep neural networks with ReLU activation function

Johannes Schmidt-Hieber

math.ST 🏛 Annals of Statistics 📚 949 cites 8 years ago

R.I.P. 👻 Ghosted

An introduction to Topological Data Analysis: fundamental and practical aspects for data scientists

Frédéric Chazal, Bertrand Michel

math.ST 🏛 AI 📚 727 cites 8 years ago

R.I.P. 👻 Ghosted

Minimax Optimal Procedures for Locally Private Estimation

John Duchi, Martin Wainwright, Michael Jordan

math.ST 🏛 arXiv 📚 481 cites 10 years ago

R.I.P. 👻 Ghosted

Optimal Best Arm Identification with Fixed Confidence

Aurélien Garivier, Emilie Kaufmann

math.ST 🏛 COLT 📚 384 cites 10 years ago

R.I.P. 👻 Ghosted

Fast low-rank estimation by projected gradient descent: General statistical and algorithmic guarantees

Yudong Chen, Martin J. Wainwright

math.ST 🏛 arXiv 📚 329 cites 10 years ago

R.I.P. 👻 Ghosted

User-friendly guarantees for the Langevin Monte Carlo with inaccurate gradient

Arnak S. Dalalyan, Avetik G. Karagulyan

math.ST 🏛 Stochastic Processes and their Applications 📚 319 cites 8 years ago

Died the same way — 👻 Ghosted

R.I.P. 👻 Ghosted

Language Models are Few-Shot Learners

Tom B. Brown, Benjamin Mann, ... (+29 more)

cs.CL 🏛 NeurIPS 📚 54.2K cites 5 years ago

R.I.P. 👻 Ghosted

PyTorch: An Imperative Style, High-Performance Deep Learning Library

Adam Paszke, Sam Gross, ... (+19 more)

cs.LG 🏛 NeurIPS 📚 49.7K cites 6 years ago

R.I.P. 👻 Ghosted

XGBoost: A Scalable Tree Boosting System

Tianqi Chen, Carlos Guestrin

cs.LG 🏛 KDD 📚 49.2K cites 10 years ago

R.I.P. 👻 Ghosted

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Sergey Ioffe, Christian Szegedy

cs.LG 🏛 ICML 📚 46.0K cites 11 years ago