Breaking AR's Sampling Bottleneck: Provable Acceleration via Diffusion Language Models

May 27, 2025 · Declared Dead · 🏛 NeurIPS 2025

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Gen Li, Changxiao Cai arXiv ID 2505.21400 Category cs.LG: Machine Learning Cross-listed cs.IT, math.ST, stat.ML Citations 9 Venue NeurIPS 2025 Last Checked 4 months ago

Abstract

Diffusion models have emerged as a powerful paradigm for modern generative modeling, demonstrating strong potential for large language models (LLMs). Unlike conventional autoregressive (AR) models that generate tokens sequentially, diffusion models allow for parallel sampling, offering a promising path to accelerate generation and eliminate the left-to-right generation constraints. Despite their empirical success, theoretical understandings of diffusion language models remain underdeveloped. In this work, we develop convergence guarantees for diffusion language models from an information-theoretic perspective. Our analysis demonstrates that the sampling error, measured by the Kullback-Leibler (KL) divergence, decays inversely with the number of iterations $T$ and scales linearly with the mutual information between tokens in the target text sequence. Crucially, our theory covers the regime $T<L$, where $L$ is the text sequence length. This justifies that high-quality samples can be generated with fewer iterations than $L$, thereby breaking the fundamental sampling bottleneck of $L$ steps required by AR models. We further establish matching upper and lower bounds, up to some constant factor, that shows the tightness of our convergence analysis. These results offer novel theoretical insights into the practical effectiveness of diffusion language models.