Exploring Facets of Language Generation in the Limit
November 22, 2024 Β· Declared Dead Β· π Annual Conference Computational Learning Theory
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Moses Charikar, Chirag Pabbaraju
arXiv ID
2411.15364
Category
cs.DS: Data Structures & Algorithms
Cross-listed
cs.AI,
cs.CL,
cs.LG
Citations
17
Venue
Annual Conference Computational Learning Theory
Last Checked
3 months ago
Abstract
The recent work of Kleinberg & Mullainathan [KM24] provides a concrete model for language generation in the limit: given a sequence of examples from an unknown target language, the goal is to generate new examples from the target language such that no incorrect examples are generated beyond some point. In sharp contrast to strong negative results for the closely related problem of language identification, they establish positive results for language generation in the limit for all countable collections of languages. Follow-up work by Raman & Tewari [RT24] studies bounds on the number of distinct inputs required by an algorithm before correct language generation is achieved -- namely, whether this is a constant for all languages in the collection (uniform generation) or a language-dependent constant (non-uniform generation). We show that every countable language collection has a generator which has the stronger property of non-uniform generation in the limit. However, while the generation algorithm of [KM24] can be implemented using membership queries, we show that any algorithm cannot non-uniformly generate even for collections of just two languages, using only membership queries. We also formalize the tension between validity and breadth in the generation algorithm of [KM24] by introducing a definition of exhaustive generation, and show a strong negative result for exhaustive generation. Our result shows that a tradeoff between validity and breadth is inherent for generation in the limit. We also provide a precise characterization of the language collections for which exhaustive generation is possible. Finally, inspired by algorithms that can choose to obtain feedback, we consider a model of uniform generation with feedback, completely characterizing language collections for which such uniform generation with feedback is possible in terms of a complexity measure of the collection.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Data Structures & Algorithms
π
π
The Cartographer
R.I.P.
π»
Ghosted
Route Planning in Transportation Networks
R.I.P.
π»
Ghosted
Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration
R.I.P.
π»
Ghosted
Hierarchical Clustering: Objective Functions and Algorithms
R.I.P.
π»
Ghosted
Graph Isomorphism in Quasipolynomial Time
π
π
The Cartographer
Simulation optimization: A review of algorithms and applications
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
π»
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
π»
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
π»
Ghosted