Skip to content

Commit 4d13a63

Browse files
committed
sigmas and zero snr
1 parent 30835b2 commit 4d13a63

File tree

2 files changed

+120
-47
lines changed

2 files changed

+120
-47
lines changed

docs/source/en/using-diffusers/image_quality.md

Lines changed: 1 addition & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -12,54 +12,10 @@ specific language governing permissions and limitations under the License.
1212

1313
# Controlling image quality
1414

15-
The components of a diffusion model, like the UNet and scheduler, can be optimized to improve the quality of generated images leading to better image lighting and details. These techniques are especially useful if you don't have the resources to simply use a larger model for inference. You can enable these techniques during inference without any additional training.
15+
The components of a diffusion model, like the UNet and scheduler, can be optimized to improve the quality of generated images leading to better details. These techniques are especially useful if you don't have the resources to simply use a larger model for inference. You can enable these techniques during inference without any additional training.
1616

1717
This guide will show you how to turn these techniques on in your pipeline and how to configure them to improve the quality of your generated images.
1818

19-
## Lighting
20-
21-
The Stable Diffusion models aren't very good at generating images that are very bright or dark because the scheduler doesn't start sampling from the last timestep and it doesn't enforce a zero signal-to-noise ratio (SNR). The [Common Diffusion Noise Schedules and Sample Steps are Flawed](https://hf.co/papers/2305.08891) paper fixes these issues which are now available in some Diffusers schedulers.
22-
23-
> [!TIP]
24-
> For inference, you need a model that has been trained with *v_prediction*. To train your own model with *v_prediction*, add the following flag to the [train_text_to_image.py](https://github.com/huggingface/diffusers/blob/main/examples/text_to_image/train_text_to_image.py) or [train_text_to_image_lora.py](https://github.com/huggingface/diffusers/blob/main/examples/text_to_image/train_text_to_image_lora.py) scripts.
25-
>
26-
> ```bash
27-
> --prediction_type="v_prediction"
28-
> ```
29-
30-
For example, load the [ptx0/pseudo-journey-v2](https://hf.co/ptx0/pseudo-journey-v2) checkpoint which was trained with `v_prediction` and the [`DDIMScheduler`]. Now you should configure the following parameters in the [`DDIMScheduler`].
31-
32-
* `rescale_betas_zero_snr=True` to rescale the noise schedule to zero SNR
33-
* `timestep_spacing="trailing"` to start sampling from the last timestep
34-
35-
Set `guidance_rescale` in the pipeline to prevent over-exposure. A lower value increases brightness but some of the details may appear washed out.
36-
37-
```py
38-
from diffusers import DiffusionPipeline, DDIMScheduler
39-
40-
pipeline = DiffusionPipeline.from_pretrained("ptx0/pseudo-journey-v2", use_safetensors=True)
41-
42-
pipeline.scheduler = DDIMScheduler.from_config(
43-
pipeline.scheduler.config, rescale_betas_zero_snr=True, timestep_spacing="trailing"
44-
)
45-
pipeline.to("cuda")
46-
prompt = "cinematic photo of a snowy mountain at night with the northern lights aurora borealis overhead, 35mm photograph, film, professional, 4k, highly detailed"
47-
generator = torch.Generator(device="cpu").manual_seed(23)
48-
image = pipeline(prompt, guidance_rescale=0.7, generator=generator).images[0]
49-
image
50-
```
51-
52-
<div class="flex gap-4">
53-
<div>
54-
<img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/no-zero-snr.png"/>
55-
<figcaption class="mt-2 text-center text-sm text-gray-500">default Stable Diffusion v2-1 image</figcaption>
56-
</div>
57-
<div>
58-
<img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/zero-snr.png"/>
59-
<figcaption class="mt-2 text-center text-sm text-gray-500">image with zero SNR and trailing timestep spacing enabled</figcaption>
60-
</div>
61-
</div>
62-
6319
## Details
6420

6521
[FreeU](https://hf.co/papers/2309.11497) improves image details by rebalancing the UNet's backbone and skip connection weights. The skip connections can cause the model to overlook some of the backbone semantics which may lead to unnatural image details in the generated image. This technique does not require any additional training and can be applied on the fly during inference for tasks like image-to-image and text-to-video.

docs/source/en/using-diffusers/scheduler_features.md

Lines changed: 119 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ This guide will demonstrate how to use these features to improve inference quali
2323

2424
The timestep or noise schedule determines the amount of noise at each sampling step. The scheduler uses this to generate an image with the corresponding amount of noise at each step. The timestep schedule is generated from the scheduler's default configuration, but you can customize the scheduler to use new and optimized sampling schedules that aren't in Diffusers yet.
2525

26-
For example, [Align Your Steps (AYS)](https://research.nvidia.com/labs/toronto-ai/AlignYourSteps/) is a method for optimizing a sampling schedule to generate a high-quality image in as little as 10 steps. This optimal schedule for 10 steps was calculated to be:
26+
For example, [Align Your Steps (AYS)](https://research.nvidia.com/labs/toronto-ai/AlignYourSteps/) is a method for optimizing a sampling schedule to generate a high-quality image in as little as 10 steps. The optimal [10-step schedule](https://github.com/huggingface/diffusers/blob/a7bf77fc284810483f1e60afe34d1d27ad91ce2e/src/diffusers/schedulers/scheduling_utils.py#L51) for Stable Diffusion XL is:
2727

2828
```py
2929
from diffusers.schedulers import AysSchedules
@@ -41,7 +41,7 @@ pipeline = StableDiffusionXLPipeline.from_pretrained(
4141
torch_dtype=torch.float16,
4242
variant="fp16",
4343
).to("cuda")
44-
pipeline.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config, algorithm_type="sde-dpmsolver++")
44+
pipeline.scheduler = DPMSolverMultistepScheduler.from_config(pipeline.scheduler.config, algorithm_type="sde-dpmsolver++")
4545

4646
prompt = "A cinematic shot of a cute little rabbit wearing a jacket and doing a thumbs up"
4747
generator = torch.Generator(device="cpu").manual_seed(2487854446)
@@ -70,4 +70,121 @@ image = pipeline(
7070

7171
## Sigmas
7272

73+
The `sigmas` parameter is the amount of noise added at each timestep according to the timestep schedule. Like the `timesteps` parameter, you can customize the `sigmas` parameter to control how much noise is added at each step. When you use a custom `sigmas` value, the `timesteps` are calculated from the custom `sigmas` value and the default scheduler configuration is ignored.
74+
75+
For example, you can manually pass the [sigmas](https://github.com/huggingface/diffusers/blob/6529ee67ec02fcf58d2fd9242164ea002b351d75/src/diffusers/schedulers/scheduling_utils.py#L55) for something like the 10-step AYS schedule from before to the pipeline.
76+
77+
```py
78+
import torch
79+
80+
from diffusers import DiffusionPipeline, EulerDiscreteScheduler
81+
82+
model_id = "stabilityai/stable-diffusion-xl-base-1.0"
83+
pipeline = DiffusionPipeline.from_pretrained(
84+
"stabilityai/stable-diffusion-xl-base-1.0",
85+
torch_dtype=torch.float16,
86+
variant="fp16",
87+
).to("cuda")
88+
pipeline.scheduler = EulerDiscreteScheduler.from_config(pipeline.scheduler.config)
89+
90+
sigmas = [14.615, 6.315, 3.771, 2.181, 1.342, 0.862, 0.555, 0.380, 0.234, 0.113, 0.0]
91+
prompt = "anthropomorphic capybara wearing a suit and working with a computer"
92+
generator = torch.Generator(device='cuda').manual_seed(123)
93+
image = pipeline(
94+
prompt=prompt,
95+
num_inference_steps=10,
96+
sigmas=sigmas,
97+
generator=generator
98+
).images[0]
99+
```
100+
101+
When you take a look at the scheduler's `timesteps` parameter, you'll see that it is the same as the AYS timestep schedule because the `timestep` schedule is calculated from the `sigmas`.
102+
103+
```py
104+
print(f" timesteps: {pipe.scheduler.timesteps}")
105+
"timesteps: tensor([999., 845., 730., 587., 443., 310., 193., 116., 53., 13.], device='cuda:0')"
106+
```
107+
108+
### Karras sigmas
109+
110+
> [!TIP]
111+
> Refer to the scheduler API [overview](../api/schedulers/overview) for a list of schedulers that support Karras sigmas.
112+
113+
Karras scheduler's use the timestep schedule and sigmas from the [Elucidating the Design Space of Diffusion-Based Generative Models](https://hf.co/papers/2206.00364) paper. This scheduler variant applies a smaller amount of noise per step as it approaches the end of the sampling process compared to other schedulers, and can increase the level of details in the generated image.
114+
115+
Enable Karras sigmas by setting `use_karras_sigmas=True` in the scheduler.
116+
117+
```py
118+
import torch
119+
from diffusers import StableDiffusionXLPipeline, DPMSolverMultistepScheduler
120+
121+
pipeline = StableDiffusionXLPipeline.from_pretrained(
122+
"SG161222/RealVisXL_V4.0",
123+
torch_dtype=torch.float16,
124+
variant="fp16",
125+
).to("cuda")
126+
pipeline.scheduler = DPMSolverMultistepScheduler.from_config(pipeline.scheduler.config, algorithm_type="sde-dpmsolver++", use_karras_sigmas=True)
127+
128+
prompt = "A cinematic shot of a cute little rabbit wearing a jacket and doing a thumbs up"
129+
generator = torch.Generator(device="cpu").manual_seed(2487854446)
130+
image = pipeline(
131+
prompt=prompt,
132+
negative_prompt="",
133+
generator=generator,
134+
).images[0]
135+
```
136+
137+
<div class="flex gap-4">
138+
<div>
139+
<img class="rounded-xl" src="https://huggingface.co/datasets/stevhliu/testing-images/resolve/main/karras_sigmas_true.png"/>
140+
<figcaption class="mt-2 text-center text-sm text-gray-500">Karras sigmas enabled</figcaption>
141+
</div>
142+
<div>
143+
<img class="rounded-xl" src="https://huggingface.co/datasets/stevhliu/testing-images/resolve/main/karras_sigmas_false.png"/>
144+
<figcaption class="mt-2 text-center text-sm text-gray-500">Karras sigmas disabled</figcaption>
145+
</div>
146+
</div>
147+
73148
## Rescale noise schedule
149+
150+
In the [Common Diffusion Noise Schedules and Sample Steps are Flawed](https://hf.co/papers/2305.08891) paper, the authors discovered that common noise schedules allowed some signal to leak into the last timestep. This signal leakage at inference can cause models to only generate images with medium brightness. By enforcing a zero signal-to-noise ratio (SNR) for the timstep schedule and sampling from the last timestep, the model can be improved to generate very bright or dark images.
151+
152+
> [!TIP]
153+
> For inference, you need a model that has been trained with *v_prediction*. To train your own model with *v_prediction*, add the following flag to the [train_text_to_image.py](https://github.com/huggingface/diffusers/blob/main/examples/text_to_image/train_text_to_image.py) or [train_text_to_image_lora.py](https://github.com/huggingface/diffusers/blob/main/examples/text_to_image/train_text_to_image_lora.py) scripts.
154+
>
155+
> ```bash
156+
> --prediction_type="v_prediction"
157+
> ```
158+
159+
For example, load the [ptx0/pseudo-journey-v2](https://hf.co/ptx0/pseudo-journey-v2) checkpoint which was trained with `v_prediction` and the [`DDIMScheduler`]. Configure the following parameters in the [`DDIMScheduler`]:
160+
161+
* `rescale_betas_zero_snr=True` to rescale the noise schedule to zero SNR
162+
* `timestep_spacing="trailing"` to start sampling from the last timestep
163+
164+
Set `guidance_rescale` in the pipeline to prevent over-exposure. A lower value increases brightness but some of the details may appear washed out.
165+
166+
```py
167+
from diffusers import DiffusionPipeline, DDIMScheduler
168+
169+
pipeline = DiffusionPipeline.from_pretrained("ptx0/pseudo-journey-v2", use_safetensors=True)
170+
171+
pipeline.scheduler = DDIMScheduler.from_config(
172+
pipeline.scheduler.config, rescale_betas_zero_snr=True, timestep_spacing="trailing"
173+
)
174+
pipeline.to("cuda")
175+
prompt = "cinematic photo of a snowy mountain at night with the northern lights aurora borealis overhead, 35mm photograph, film, professional, 4k, highly detailed"
176+
generator = torch.Generator(device="cpu").manual_seed(23)
177+
image = pipeline(prompt, guidance_rescale=0.7, generator=generator).images[0]
178+
image
179+
```
180+
181+
<div class="flex gap-4">
182+
<div>
183+
<img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/no-zero-snr.png"/>
184+
<figcaption class="mt-2 text-center text-sm text-gray-500">default Stable Diffusion v2-1 image</figcaption>
185+
</div>
186+
<div>
187+
<img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/zero-snr.png"/>
188+
<figcaption class="mt-2 text-center text-sm text-gray-500">image with zero SNR and trailing timestep spacing enabled</figcaption>
189+
</div>
190+
</div>

0 commit comments

Comments
 (0)