Variance-Reduced Off-Policy TDC Learning: Non-Asymptotic Convergence Analysis

October 26, 2020 ยท Declared Dead ยท ๐Ÿ› Neural Information Processing Systems

๐Ÿ‘ป CAUSE OF DEATH: Ghosted
No code link whatsoever

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Shaocong Ma, Yi Zhou, Shaofeng Zou arXiv ID 2010.13272 Category cs.LG: Machine Learning Cross-listed math.OC Citations 18 Venue Neural Information Processing Systems Last Checked 4 months ago
Abstract
Variance reduction techniques have been successfully applied to temporal-difference (TD) learning and help to improve the sample complexity in policy evaluation. However, the existing work applied variance reduction to either the less popular one time-scale TD algorithm or the two time-scale GTD algorithm but with a finite number of i.i.d.\ samples, and both algorithms apply to only the on-policy setting. In this work, we develop a variance reduction scheme for the two time-scale TDC algorithm in the off-policy setting and analyze its non-asymptotic convergence rate over both i.i.d.\ and Markovian samples. In the i.i.d.\ setting, our algorithm {matches the best-known lower bound $\tilde{O}(ฮต^{-1}$).} In the Markovian setting, our algorithm achieves the state-of-the-art sample complexity $O(ฮต^{-1} \log ฮต^{-1})$ that is near-optimal. Experiments demonstrate that the proposed variance-reduced TDC achieves a smaller asymptotic convergence error than both the conventional TDC and the variance-reduced TD.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Machine Learning

Died the same way โ€” ๐Ÿ‘ป Ghosted