WARPD: World model Assisted Reactive Policy Diffusion

October 17, 2024 · Declared Dead · 🏛 NeurIPS 2025

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Shashank Hegde, Satyajeet Das, Gautam Salhotra, Gaurav S. Sukhatme arXiv ID 2410.14040 Category cs.LG: Machine Learning Cross-listed cs.AI, cs.RO Citations 1 Venue NeurIPS 2025 Last Checked 4 months ago

Abstract

With the increasing availability of open-source robotic data, imitation learning has become a promising approach for both manipulation and locomotion. Diffusion models are now widely used to train large, generalized policies that predict controls or trajectories, leveraging their ability to model multimodal action distributions. However, this generality comes at the cost of larger model sizes and slower inference, an acute limitation for robotic tasks requiring high control frequencies. Moreover, Diffusion Policy (DP), a popular trajectory-generation approach, suffers from a trade-off between performance and action horizon: fewer diffusion queries lead to larger trajectory chunks, which in turn accumulate tracking errors. To overcome these challenges, we introduce WARPD (World model Assisted Reactive Policy Diffusion), a method that generates closed-loop policies (weights for neural policies) directly, instead of open-loop trajectories. By learning behavioral distributions in parameter space rather than trajectory space, WARPD offers two major advantages: (1) extended action horizons with robustness to perturbations, while maintaining high task performance, and (2) significantly reduced inference costs. Empirically, WARPD outperforms DP in long-horizon and perturbed environments, and achieves multitask performance on par with DP while requiring only ~ 1/45th of the inference-time FLOPs per step.