Scale-free adaptive planning for deterministic dynamics & discounted rewards

April 20, 2026 ยท Grace Period ยท ๐Ÿ› ICML 2019

โณ Grace Period
This paper is less than 90 days old. We give authors time to release their code before passing judgment.
Authors Peter L. Bartlett, Victor Gabillon, Jennifer Healey, Michal Valko arXiv ID 2604.18312 Category cs.LG: Machine Learning Citations 0 Venue ICML 2019
Abstract
We address the problem of planning in an environment with deterministic dynamics and stochastic rewards with discounted returns. The optimal value function is not known, nor are the rewards bounded. We propose Platypoos, a simple scale-free planning algorithm that adapts to the unknown scale and smoothness of the reward function. We provide a sample complexity analysis for Platypoos that improves upon prior work and holds simultaneously over a broad range of discount factors and reward scales, without the algorithm knowing them. We also establish a matching lower bound showing our analysis is optimal up to constants.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Machine Learning