๐
๐
The Cartographer
Coupling without Communication and Drafter-Invariant Speculative Decoding
August 15, 2024 ยท Entered Twilight ยท ๐ International Symposium on Information Theory
Repo contents: README.md, generate_model_distribution.py, img, plot_coupling_bounds.py
Authors
Majid Daliri, Christopher Musco, Ananda Theertha Suresh
arXiv ID
2408.07978
Category
cs.DS: Data Structures & Algorithms
Cross-listed
cs.CL,
cs.LG
Citations
2
Venue
International Symposium on Information Theory
Repository
https://github.com/majid-daliri/DISD
โญ 2
Last Checked
3 months ago
Abstract
Suppose Alice has a distribution $P$ and Bob has a distribution $Q$. Alice wants to draw a sample $a\sim P$ and Bob a sample $b \sim Q$ such that $a = b$ with as high of probability as possible. It is well-known that, by sampling from an optimal coupling between the distributions, Alice and Bob can achieve $\Pr[a = b] = 1 - D_{TV}(P,Q)$, where $D_{TV}(P,Q)$ is the total variation distance between $P$ and $Q$. What if Alice and Bob must solve this same problem \emph{without communicating at all?} Perhaps surprisingly, with access to public randomness, they can still achieve $\Pr[a = b] \geq \frac{1 - D_{TV}(P,Q)}{1 + D_{TV}(P,Q)} \geq 1-2D_{TV}(P,Q)$ using a simple protocol based on the Weighted MinHash algorithm. This bound was shown to be optimal in the worst-case by [Bavarian et al., 2020]. In this work, we revisit the communication-free coupling problem. We provide a simpler proof of the optimality result from [Bavarian et al., 2020]. We show that, while the worst-case success probability of Weighted MinHash cannot be improved, an equally simple protocol based on Gumbel sampling offers a Pareto improvement: for every pair of distributions $P, Q$, Gumbel sampling achieves an equal or higher value of $\Pr[a = b]$ than Weighted MinHash. Importantly, this improvement translates to practice. We demonstrate an application of communication-free coupling to \emph{speculative decoding}, a recent method for accelerating autoregressive large language models [Leviathan, Kalman, Matias, ICML 2023]. We show that communication-free protocols can be used to contruct \emph{\CSD{}} schemes, which have the desirable property that their output is fixed given a fixed random seed, regardless of what drafter is used for speculation. In experiments on a language generation task, Gumbel sampling outperforms Weighted MinHash. Code is available at https://github.com/majid-daliri/DISD.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Data Structures & Algorithms
R.I.P.
๐ป
Ghosted
Route Planning in Transportation Networks
R.I.P.
๐ป
Ghosted
Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration
R.I.P.
๐ป
Ghosted
Hierarchical Clustering: Objective Functions and Algorithms
R.I.P.
๐ป
Ghosted
Graph Isomorphism in Quasipolynomial Time
๐
๐
The Cartographer