Diffusion models have risen a promising approach to data-driven planning, and have demonstrated impressive robotic control, reinforcement learning, and video planning performance. Given an effective planner, an important question to consider is replanning -- when given plans should be regenerated due to both action execution error and external environment changes. Direct plan execution, without replanning, is problematic as errors from individual actions rapidly accumulate and environments are partially observable and stochastic. Simultaneously, replanning at each timestep incurs a substantial computational cost, and may prevent successful task execution, as different generated plans prevent consistent progress to any particular goal. In this paper, we explore how we may effectively replan with diffusion models. We propose a principled approach to determine when to replan, based on the diffusion model's estimated likelihood of existing generated plans. We further present an approach to replan existing trajectories to ensure that new plans follow the same goal state as the original trajectory, which may efficiently bootstrap off previously generated plans. We illustrate how a combination of our proposed additions significantly improves the performance of diffusion planners leading to 38% gains over past diffusion planning approaches on Maze2D, and further enables the handling of stochastic and long-horizon robotic control tasks.
Overview of our replanning approach RDM. The first figure shows the likelihood curve of the sampled trajectory as the environmental steps increase. There are two steps t and k with low probabilities. Step t corresponds to Replan from scratch in the middle figure, where RDM regenerates a completely new trajectory based only on the current state and from Gaussian noise. Step k corresponds to Replan with future in the right-most figure, where RDM shifts the current state as the first state and repeats the last state to fill the rest. RDM also introduces some timesteps of noise to the trajectory and denoises.
Visualization on Maze2D. (a) illustrates the likelihood curve of the sampled trajectory as the environmental steps increase, with a noticeable drop when the agent collides with the wall. (b) presents the sampled trajectory at the onset of the task. (c) demonstrates the actual trajectory of Diffuser, which follows a fixed interval replanning strategy. It is evident that the agent does not recognize the flawed part of the sampled trajectory, leading it to a less favorable state and resulting in wasted environmental steps. (d) shows the actual trajectory of RDM. In contrast to (c), the agent successfully detects collisions and replans a feasible trajectory.
We visualize the execution of Decision Diffuser and RDM in RLBench. RDM successfully detects failures in the initial plan and efficiently replans during the execution.
Decison Diffusion |
RDM |
Decison Diffusion |
RDM |