Skip to content

Commit 8a812e4

Browse files
Parth38sayakpaul
andauthored
Update value_guided_sampling.py (#6027)
* Update value_guided_sampling.py Changed the scheduler step function as predict_epsilon parameter is not there in latest DDPM Scheduler * Update value_guided_sampling.md Updated a link to a working notebook --------- Co-authored-by: Sayak Paul <[email protected]>
1 parent bf92e74 commit 8a812e4

File tree

2 files changed

+2
-2
lines changed

2 files changed

+2
-2
lines changed

docs/source/en/api/pipelines/value_guided_sampling.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ The abstract from the paper is:
2424

2525
*Model-based reinforcement learning methods often use learning only for the purpose of estimating an approximate dynamics model, offloading the rest of the decision-making work to classical trajectory optimizers. While conceptually simple, this combination has a number of empirical shortcomings, suggesting that learned models may not be well-suited to standard trajectory optimization. In this paper, we consider what it would look like to fold as much of the trajectory optimization pipeline as possible into the modeling problem, such that sampling from the model and planning with it become nearly identical. The core of our technical approach lies in a diffusion probabilistic model that plans by iteratively denoising trajectories. We show how classifier-guided sampling and image inpainting can be reinterpreted as coherent planning strategies, explore the unusual and useful properties of diffusion-based planning methods, and demonstrate the effectiveness of our framework in control settings that emphasize long-horizon decision-making and test-time flexibility.*
2626

27-
You can find additional information about the model on the [project page](https://diffusion-planning.github.io/), the [original codebase](https://github.com/jannerm/diffuser), or try it out in a demo [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/reinforcement_learning_with_diffusers.ipynb).
27+
You can find additional information about the model on the [project page](https://diffusion-planning.github.io/), the [original codebase](https://github.com/jannerm/diffuser), or try it out in a demo [notebook](https://colab.research.google.com/drive/1rXm8CX4ZdN5qivjJ2lhwhkOmt_m0CvU0#scrollTo=6HXJvhyqcITc&uniqifier=1).
2828

2929
The script to run the model is available [here](https://github.com/huggingface/diffusers/tree/main/examples/reinforcement_learning).
3030

src/diffusers/experimental/rl/value_guided_sampling.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -113,7 +113,7 @@ def run_diffusion(self, x, conditions, n_guide_steps, scale):
113113
prev_x = self.unet(x.permute(0, 2, 1), timesteps).sample.permute(0, 2, 1)
114114

115115
# TODO: verify deprecation of this kwarg
116-
x = self.scheduler.step(prev_x, i, x, predict_epsilon=False)["prev_sample"]
116+
x = self.scheduler.step(prev_x, i, x)["prev_sample"]
117117

118118
# apply conditions to the trajectory (set the initial state)
119119
x = self.reset_x0(x, conditions, self.action_dim)

0 commit comments

Comments
 (0)