Skip to content

Commit 63a61bd

Browse files
authored
Merge branch 'main' into feat/ci-benchmarking
2 parents b7eb3fb + 79a7ab9 commit 63a61bd

File tree

78 files changed

+4751
-627
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

78 files changed

+4751
-627
lines changed

.github/workflows/pr_test_fetcher.yml

Lines changed: 1 addition & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,6 @@
11
name: Fast tests for PRs - Test Fetcher
22

3-
on:
4-
pull_request:
5-
branches:
6-
- main
7-
push:
8-
branches:
9-
- ci-*
3+
on: workflow_dispatch
104

115
env:
126
DIFFUSERS_IS_CI: yes

.github/workflows/pr_tests.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -113,6 +113,7 @@ jobs:
113113
- name: Run example PyTorch CPU tests
114114
if: ${{ matrix.config.framework == 'pytorch_examples' }}
115115
run: |
116+
python -m pip install peft
116117
python -m pytest -n 2 --max-worker-restart=0 --dist=loadfile \
117118
--make-reports=tests_${{ matrix.config.report }} \
118119
examples

docs/source/en/_toctree.yml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -264,6 +264,10 @@
264264
title: ControlNet
265265
- local: api/pipelines/controlnet_sdxl
266266
title: ControlNet with Stable Diffusion XL
267+
- local: api/pipelines/controlnetxs
268+
title: ControlNet-XS
269+
- local: api/pipelines/controlnetxs_sdxl
270+
title: ControlNet-XS with Stable Diffusion XL
267271
- local: api/pipelines/cycle_diffusion
268272
title: Cycle Diffusion
269273
- local: api/pipelines/dance_diffusion

docs/source/en/api/attnprocessor.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,9 @@ An attention processor is a class for applying different types of attention mech
2020
## AttnProcessor2_0
2121
[[autodoc]] models.attention_processor.AttnProcessor2_0
2222

23+
## FusedAttnProcessor2_0
24+
[[autodoc]] models.attention_processor.FusedAttnProcessor2_0
25+
2326
## LoRAAttnProcessor
2427
[[autodoc]] models.attention_processor.LoRAAttnProcessor
2528

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License.
11+
-->
12+
13+
# ControlNet-XS
14+
15+
ControlNet-XS was introduced in [ControlNet-XS](https://vislearn.github.io/ControlNet-XS/) by Denis Zavadski and Carsten Rother. It is based on the observation that the control model in the [original ControlNet](https://huggingface.co/papers/2302.05543) can be made much smaller and still produce good results.
16+
17+
Like the original ControlNet model, you can provide an additional control image to condition and control Stable Diffusion generation. For example, if you provide a depth map, the ControlNet model generates an image that'll preserve the spatial information from the depth map. It is a more flexible and accurate way to control the image generation process.
18+
19+
ControlNet-XS generates images with comparable quality to a regular ControlNet, but it is 20-25% faster ([see benchmark](https://github.com/UmerHA/controlnet-xs-benchmark/blob/main/Speed%20Benchmark.ipynb) with StableDiffusion-XL) and uses ~45% less memory.
20+
21+
Here's the overview from the [project page](https://vislearn.github.io/ControlNet-XS/):
22+
23+
*With increasing computing capabilities, current model architectures appear to follow the trend of simply upscaling all components without validating the necessity for doing so. In this project we investigate the size and architectural design of ControlNet [Zhang et al., 2023] for controlling the image generation process with stable diffusion-based models. We show that a new architecture with as little as 1% of the parameters of the base model achieves state-of-the art results, considerably better than ControlNet in terms of FID score. Hence we call it ControlNet-XS. We provide the code for controlling StableDiffusion-XL [Podell et al., 2023] (Model B, 48M Parameters) and StableDiffusion 2.1 [Rombach et al. 2022] (Model B, 14M Parameters), all under openrail license.*
24+
25+
This model was contributed by [UmerHA](https://twitter.com/UmerHAdil). ❤️
26+
27+
<Tip>
28+
29+
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-components-across-pipelines) section to learn how to efficiently load the same components into multiple pipelines.
30+
31+
</Tip>
32+
33+
## StableDiffusionControlNetXSPipeline
34+
[[autodoc]] StableDiffusionControlNetXSPipeline
35+
- all
36+
- __call__
37+
38+
## StableDiffusionPipelineOutput
39+
[[autodoc]] pipelines.stable_diffusion.StableDiffusionPipelineOutput
Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License.
11+
-->
12+
13+
# ControlNet-XS with Stable Diffusion XL
14+
15+
ControlNet-XS was introduced in [ControlNet-XS](https://vislearn.github.io/ControlNet-XS/) by Denis Zavadski and Carsten Rother. It is based on the observation that the control model in the [original ControlNet](https://huggingface.co/papers/2302.05543) can be made much smaller and still produce good results.
16+
17+
Like the original ControlNet model, you can provide an additional control image to condition and control Stable Diffusion generation. For example, if you provide a depth map, the ControlNet model generates an image that'll preserve the spatial information from the depth map. It is a more flexible and accurate way to control the image generation process.
18+
19+
ControlNet-XS generates images with comparable quality to a regular ControlNet, but it is 20-25% faster ([see benchmark](https://github.com/UmerHA/controlnet-xs-benchmark/blob/main/Speed%20Benchmark.ipynb)) and uses ~45% less memory.
20+
21+
Here's the overview from the [project page](https://vislearn.github.io/ControlNet-XS/):
22+
23+
*With increasing computing capabilities, current model architectures appear to follow the trend of simply upscaling all components without validating the necessity for doing so. In this project we investigate the size and architectural design of ControlNet [Zhang et al., 2023] for controlling the image generation process with stable diffusion-based models. We show that a new architecture with as little as 1% of the parameters of the base model achieves state-of-the art results, considerably better than ControlNet in terms of FID score. Hence we call it ControlNet-XS. We provide the code for controlling StableDiffusion-XL [Podell et al., 2023] (Model B, 48M Parameters) and StableDiffusion 2.1 [Rombach et al. 2022] (Model B, 14M Parameters), all under openrail license.*
24+
25+
This model was contributed by [UmerHA](https://twitter.com/UmerHAdil). ❤️
26+
27+
<Tip warning={true}>
28+
29+
🧪 Many of the SDXL ControlNet checkpoints are experimental, and there is a lot of room for improvement. Feel free to open an [Issue](https://github.com/huggingface/diffusers/issues/new/choose) and leave us feedback on how we can improve!
30+
31+
</Tip>
32+
33+
<Tip>
34+
35+
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-components-across-pipelines) section to learn how to efficiently load the same components into multiple pipelines.
36+
37+
</Tip>
38+
39+
## StableDiffusionXLControlNetXSPipeline
40+
[[autodoc]] StableDiffusionXLControlNetXSPipeline
41+
- all
42+
- __call__
43+
44+
## StableDiffusionPipelineOutput
45+
[[autodoc]] pipelines.stable_diffusion.StableDiffusionPipelineOutput

docs/source/en/api/pipelines/overview.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,8 @@ The table below lists all the pipelines currently available in 🤗 Diffusers an
4040
| [Consistency Models](consistency_models) | unconditional image generation |
4141
| [ControlNet](controlnet) | text2image, image2image, inpainting |
4242
| [ControlNet with Stable Diffusion XL](controlnet_sdxl) | text2image |
43+
| [ControlNet-XS](controlnetxs) | text2image |
44+
| [ControlNet-XS with Stable Diffusion XL](controlnetxs_sdxl) | text2image |
4345
| [Cycle Diffusion](cycle_diffusion) | image2image |
4446
| [Dance Diffusion](dance_diffusion) | unconditional audio generation |
4547
| [DDIM](ddim) | unconditional image generation |
@@ -71,6 +73,7 @@ The table below lists all the pipelines currently available in 🤗 Diffusers an
7173
| [Stable Diffusion](stable_diffusion/overview) | text2image, image2image, depth2image, inpainting, image variation, latent upscaler, super-resolution |
7274
| [Stable Diffusion Model Editing](model_editing) | model editing |
7375
| [Stable Diffusion XL](stable_diffusion/stable_diffusion_xl) | text2image, image2image, inpainting |
76+
| [Stable Diffusion XL Turbo](stable_diffusion/sdxl_turbo) | text2image, image2image, inpainting |
7477
| [Stable unCLIP](stable_unclip) | text2image, image variation |
7578
| [Stochastic Karras VE](stochastic_karras_ve) | unconditional image generation |
7679
| [T2I-Adapter](stable_diffusion/adapter) | text2image |

docs/source/en/api/pipelines/stable_diffusion/sdxl_turbo.md

Lines changed: 2 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -20,34 +20,16 @@ The abstract from the paper is:
2020

2121
## Tips
2222

23-
- SDXL Turbo uses the exact same architecture as [SDXL](./stable_diffusion_xl).
23+
- SDXL Turbo uses the exact same architecture as [SDXL](./stable_diffusion_xl), which means it also has the same API. Please refer to the [SDXL](./stable_diffusion_xl) API reference for more details.
2424
- SDXL Turbo should disable guidance scale by setting `guidance_scale=0.0`
2525
- SDXL Turbo should use `timestep_spacing='trailing'` for the scheduler and use between 1 and 4 steps.
2626
- SDXL Turbo has been trained to generate images of size 512x512.
2727
- SDXL Turbo is open-access, but not open-source meaning that one might have to buy a model license in order to use it for commercial applications. Make sure to read the [official model card](https://huggingface.co/stabilityai/sdxl-turbo) to learn more.
2828

2929
<Tip>
3030

31-
To learn how to use SDXL Turbo for various tasks, how to optimize performance, and other usage examples, take a look at the [Stable Diffusion XL](../../../using-diffusers/sdxl_turbo) guide.
31+
To learn how to use SDXL Turbo for various tasks, how to optimize performance, and other usage examples, take a look at the [SDXL Turbo](../../../using-diffusers/sdxl_turbo) guide.
3232

3333
Check out the [Stability AI](https://huggingface.co/stabilityai) Hub organization for the official base and refiner model checkpoints!
3434

3535
</Tip>
36-
37-
## StableDiffusionXLPipeline
38-
39-
[[autodoc]] StableDiffusionXLPipeline
40-
- all
41-
- __call__
42-
43-
## StableDiffusionXLImg2ImgPipeline
44-
45-
[[autodoc]] StableDiffusionXLImg2ImgPipeline
46-
- all
47-
- __call__
48-
49-
## StableDiffusionXLInpaintPipeline
50-
51-
[[autodoc]] StableDiffusionXLInpaintPipeline
52-
- all
53-
- __call__

docs/source/en/using-diffusers/push_to_hub.md

Lines changed: 1 addition & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -174,10 +174,4 @@ Set `private=True` in the [`~diffusers.utils.PushToHubMixin.push_to_hub`] functi
174174
controlnet.push_to_hub("my-controlnet-model-private", private=True)
175175
```
176176

177-
Private repositories are only visible to you, and other users won't be able to clone the repository and your repository won't appear in search results. Even if a user has the URL to your private repository, they'll receive a `404 - Sorry, we can't find the page you are looking for.`
178-
179-
To load a model, scheduler, or pipeline from private or gated repositories, set `use_auth_token=True`:
180-
181-
```py
182-
model = ControlNetModel.from_pretrained("your-namespace/my-controlnet-model-private", use_auth_token=True)
183-
```
177+
Private repositories are only visible to you, and other users won't be able to clone the repository and your repository won't appear in search results. Even if a user has the URL to your private repository, they'll receive a `404 - Sorry, we can't find the page you are looking for`. You must be [logged in](https://huggingface.co/docs/huggingface_hub/quick-start#login) to load a model from a private repository.

examples/advanced_diffusion_training/train_dreambooth_lora_sdxl_advanced.py

Lines changed: 47 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -133,7 +133,7 @@ def save_model_card(
133133
diffusers_imports_pivotal = """from huggingface_hub import hf_hub_download
134134
from safetensors.torch import load_file
135135
"""
136-
diffusers_example_pivotal = f"""embedding_path = hf_hub_download(repo_id="{repo_id}", filename="embeddings.safetensors", repo_type="model")
136+
diffusers_example_pivotal = f"""embedding_path = hf_hub_download(repo_id='{repo_id}', filename="embeddings.safetensors", repo_type="model")
137137
state_dict = load_file(embedding_path)
138138
pipeline.load_textual_inversion(state_dict["clip_l"], token=["<s0>", "<s1>"], text_encoder=pipe.text_encoder, tokenizer=pipe.tokenizer)
139139
pipeline.load_textual_inversion(state_dict["clip_g"], token=["<s0>", "<s1>"], text_encoder=pipe.text_encoder_2, tokenizer=pipe.tokenizer_2)
@@ -145,8 +145,7 @@ def save_model_card(
145145
to trigger concept `{key}` → use `{tokens}` in your prompt \n
146146
"""
147147

148-
yaml = f"""
149-
---
148+
yaml = f"""---
150149
tags:
151150
- stable-diffusion-xl
152151
- stable-diffusion-xl-diffusers
@@ -159,7 +158,7 @@ def save_model_card(
159158
instance_prompt: {instance_prompt}
160159
license: openrail++
161160
---
162-
"""
161+
"""
163162

164163
model_card = f"""
165164
# SDXL LoRA DreamBooth - {repo_id}
@@ -170,14 +169,6 @@ def save_model_card(
170169
171170
### These are {repo_id} LoRA adaption weights for {base_model}.
172171
173-
The weights were trained using [DreamBooth](https://dreambooth.github.io/).
174-
175-
LoRA for the text encoder was enabled: {train_text_encoder}.
176-
177-
Pivotal tuning was enabled: {train_text_encoder_ti}.
178-
179-
Special VAE used for training: {vae_path}.
180-
181172
## Trigger words
182173
183174
{trigger_str}
@@ -196,11 +187,24 @@ def save_model_card(
196187
197188
For more details, including weighting, merging and fusing LoRAs, check the [documentation on loading LoRAs in diffusers](https://huggingface.co/docs/diffusers/main/en/using-diffusers/loading_adapters)
198189
199-
## Download model (use it with UIs such as AUTO1111, Comfy, SD.Next, Invoke)
190+
## Download model
191+
192+
### Use it with UIs such as AUTOMATIC1111, Comfy UI, SD.Next, Invoke
193+
194+
- Download the LoRA *.safetensors [here](/{repo_id}/blob/main/pytorch_lora_weights.safetensors). Rename it and place it on your Lora folder.
195+
- Download the text embeddings *.safetensors [here](/{repo_id}/blob/main/embeddings.safetensors). Rename it and place it on it on your embeddings folder.
196+
197+
All [Files & versions](/{repo_id}/tree/main).
200198
201-
Weights for this model are available in Safetensors format.
199+
## Details
202200
203-
[Download]({repo_id}/tree/main) them in the Files & versions tab.
201+
The weights were trained using [🧨 diffusers Advanced Dreambooth Training Script](https://github.com/huggingface/diffusers/blob/main/examples/advanced_diffusion_training/train_dreambooth_lora_sdxl_advanced.py).
202+
203+
LoRA for the text encoder was enabled. {train_text_encoder}.
204+
205+
Pivotal tuning was enabled: {train_text_encoder_ti}.
206+
207+
Special VAE used for training: {vae_path}.
204208
205209
"""
206210
with open(os.path.join(repo_folder, "README.md"), "w") as f:
@@ -667,6 +671,12 @@ def parse_args(input_args=None):
667671
default=4,
668672
help=("The dimension of the LoRA update matrices."),
669673
)
674+
parser.add_argument(
675+
"--cache_latents",
676+
action="store_true",
677+
default=False,
678+
help="Cache the VAE latents",
679+
)
670680

671681
if input_args is not None:
672682
args = parser.parse_args(input_args)
@@ -1170,6 +1180,7 @@ def main(args):
11701180
revision=args.revision,
11711181
variant=args.variant,
11721182
)
1183+
vae_scaling_factor = vae.config.scaling_factor
11731184
unet = UNet2DConditionModel.from_pretrained(
11741185
args.pretrained_model_name_or_path, subfolder="unet", revision=args.revision, variant=args.variant
11751186
)
@@ -1600,6 +1611,20 @@ def compute_text_embeddings(prompt, text_encoders, tokenizers):
16001611
args.validation_prompt = args.validation_prompt.replace(token_abs, "".join(token_replacement))
16011612
print("validation prompt:", args.validation_prompt)
16021613

1614+
if args.cache_latents:
1615+
latents_cache = []
1616+
for batch in tqdm(train_dataloader, desc="Caching latents"):
1617+
with torch.no_grad():
1618+
batch["pixel_values"] = batch["pixel_values"].to(
1619+
accelerator.device, non_blocking=True, dtype=torch.float32
1620+
)
1621+
latents_cache.append(vae.encode(batch["pixel_values"]).latent_dist)
1622+
1623+
if args.validation_prompt is None:
1624+
del vae
1625+
if torch.cuda.is_available():
1626+
torch.cuda.empty_cache()
1627+
16031628
# Scheduler and math around the number of training steps.
16041629
overrode_max_train_steps = False
16051630
num_update_steps_per_epoch = math.ceil(len(train_dataloader) / args.gradient_accumulation_steps)
@@ -1715,9 +1740,7 @@ def compute_text_embeddings(prompt, text_encoders, tokenizers):
17151740
unet.train()
17161741
for step, batch in enumerate(train_dataloader):
17171742
with accelerator.accumulate(unet):
1718-
pixel_values = batch["pixel_values"].to(dtype=vae.dtype)
17191743
prompts = batch["prompts"]
1720-
# print(prompts)
17211744
# encode batch prompts when custom prompts are provided for each image -
17221745
if train_dataset.custom_instance_prompts:
17231746
if freeze_text_encoder:
@@ -1729,9 +1752,13 @@ def compute_text_embeddings(prompt, text_encoders, tokenizers):
17291752
tokens_one = tokenize_prompt(tokenizer_one, prompts, add_special_tokens)
17301753
tokens_two = tokenize_prompt(tokenizer_two, prompts, add_special_tokens)
17311754

1732-
# Convert images to latent space
1733-
model_input = vae.encode(pixel_values).latent_dist.sample()
1734-
model_input = model_input * vae.config.scaling_factor
1755+
if args.cache_latents:
1756+
model_input = latents_cache[step].sample()
1757+
else:
1758+
pixel_values = batch["pixel_values"].to(dtype=vae.dtype)
1759+
model_input = vae.encode(pixel_values).latent_dist.sample()
1760+
1761+
model_input = model_input * vae_scaling_factor
17351762
if args.pretrained_vae_model_name_or_path is None:
17361763
model_input = model_input.to(weight_dtype)
17371764

examples/community/README.md

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -512,7 +512,6 @@ device = torch.device('cpu' if not has_cuda else 'cuda')
512512
pipe = DiffusionPipeline.from_pretrained(
513513
"CompVis/stable-diffusion-v1-4",
514514
safety_checker=None,
515-
use_auth_token=True,
516515
custom_pipeline="imagic_stable_diffusion",
517516
scheduler = DDIMScheduler(beta_start=0.00085, beta_end=0.012, beta_schedule="scaled_linear", clip_sample=False, set_alpha_to_one=False)
518517
).to(device)
@@ -552,7 +551,6 @@ device = th.device('cpu' if not has_cuda else 'cuda')
552551

553552
pipe = DiffusionPipeline.from_pretrained(
554553
"CompVis/stable-diffusion-v1-4",
555-
use_auth_token=True,
556554
custom_pipeline="seed_resize_stable_diffusion"
557555
).to(device)
558556

@@ -588,7 +586,6 @@ generator = th.Generator("cuda").manual_seed(0)
588586

589587
pipe = DiffusionPipeline.from_pretrained(
590588
"CompVis/stable-diffusion-v1-4",
591-
use_auth_token=True,
592589
custom_pipeline="/home/mark/open_source/diffusers/examples/community/"
593590
).to(device)
594591

@@ -607,7 +604,6 @@ image.save('./seed_resize/seed_resize_{w}_{h}_image.png'.format(w=width, h=heigh
607604

608605
pipe_compare = DiffusionPipeline.from_pretrained(
609606
"CompVis/stable-diffusion-v1-4",
610-
use_auth_token=True,
611607
custom_pipeline="/home/mark/open_source/diffusers/examples/community/"
612608
).to(device)
613609

0 commit comments

Comments
 (0)