R.I.P.
π»
Ghosted
Beyond the Bellman Fixed Point: Geometry and Fast Policy Identification in Value Iteration
April 19, 2026 Β· Grace Period Β· + Add venue
Authors
Donghwan Lee
arXiv ID
2604.17457
Category
math.OC: Optimization & Control
Cross-listed
cs.AI,
eess.SY
Citations
0
Abstract
Dynamic programming is one of the most fundamental methodologies for solving Markov decision problems. Among its many variants, Q-value iteration (Q-VI) is particularly important due to its conceptual simplicity and its classical contraction-based convergence guarantee. Despite the central role of this contraction property, it does not fully reveal the geometric structure of the Q-VI trajectory. In particular, when one is interested not only in the final limit $Q^*$ but also in when the induced greedy policy becomes effectively optimal, the standard contraction argument provides only a coarse characterization. To formalize this notion, we denote by $\mathcal X^*$ the set of $Q$-functions whose corresponding tie-broken greedy policies are optimal, referred to as the practically optimal solution set (POS). In this paper, we revisit discounted Q-VI through the lens of switching system theory and derive new geometric insights into its behavior. In particular, we show that although Q-VI does not reach $Q^*$ in finite time in general, it identifies the optimal action class in finite time. Furthermore, we prove that the distance from the iterate to a particular subset of $\mathcal X^*$ decays exponentially at a rate governed by the joint spectral radius (JSR) of a restricted switching family. This rate can be strictly faster than the standard $Ξ³$ rate when the restricted JSR is strictly smaller than $Ξ³$, while the convergence of the entire $Q$-function to $Q^*$ can still be dominated by the slower $Ξ³$ mode, where $Ξ³$ denotes the discount factor. These results reveal a two-stage geometric behavior of Q-VI: a fast convergence toward $\mathcal X_1$, followed by a slower convergence toward $Q^*$ in general.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Optimization & Control
R.I.P.
π»
Ghosted
Local SGD Converges Fast and Communicates Little
R.I.P.
π»
Ghosted
On Lazy Training in Differentiable Programming
π
π
The Cartographer
A Review on Bilevel Optimization: From Classical to Evolutionary Approaches and Applications
R.I.P.
π»
Ghosted
Learned Primal-dual Reconstruction
R.I.P.
π»
Ghosted