Skip to content

Commit 7c1bb9a

Browse files
authored
Merge branch 'main' into postprocessing-refactor
2 parents acf0d60 + 7e6886f commit 7c1bb9a

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

42 files changed

+1757
-117
lines changed

docs/source/en/api/loaders.mdx

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,3 +36,7 @@ API to load such adapter neural networks via the [`loaders.py` module](https://g
3636
### LoraLoaderMixin
3737

3838
[[autodoc]] loaders.LoraLoaderMixin
39+
40+
### FromCkptMixin
41+
42+
[[autodoc]] loaders.FromCkptMixin

docs/source/en/api/pipelines/stable_diffusion/controlnet.mdx

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -308,6 +308,7 @@ All checkpoints can be found under the authors' namespace [lllyasviel](https://h
308308
- disable_vae_slicing
309309
- enable_xformers_memory_efficient_attention
310310
- disable_xformers_memory_efficient_attention
311+
- load_textual_inversion
311312

312313
## FlaxStableDiffusionControlNetPipeline
313314
[[autodoc]] FlaxStableDiffusionControlNetPipeline

docs/source/en/api/pipelines/stable_diffusion/depth2img.mdx

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,4 +30,7 @@ Available Checkpoints are:
3030
- enable_attention_slicing
3131
- disable_attention_slicing
3232
- enable_xformers_memory_efficient_attention
33-
- disable_xformers_memory_efficient_attention
33+
- disable_xformers_memory_efficient_attention
34+
- load_textual_inversion
35+
- load_lora_weights
36+
- save_lora_weights

docs/source/en/api/pipelines/stable_diffusion/img2img.mdx

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,11 @@ proposed by Chenlin Meng, Yutong He, Yang Song, Jiaming Song, Jiajun Wu, Jun-Yan
3030
- disable_attention_slicing
3131
- enable_xformers_memory_efficient_attention
3232
- disable_xformers_memory_efficient_attention
33+
- load_textual_inversion
34+
- from_ckpt
35+
- load_lora_weights
36+
- save_lora_weights
3337

3438
[[autodoc]] FlaxStableDiffusionImg2ImgPipeline
3539
- all
36-
- __call__
40+
- __call__

docs/source/en/api/pipelines/stable_diffusion/inpaint.mdx

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,10 @@ Available checkpoints are:
3131
- disable_attention_slicing
3232
- enable_xformers_memory_efficient_attention
3333
- disable_xformers_memory_efficient_attention
34+
- load_textual_inversion
35+
- load_lora_weights
36+
- save_lora_weights
3437

3538
[[autodoc]] FlaxStableDiffusionInpaintPipeline
3639
- all
37-
- __call__
40+
- __call__

docs/source/en/api/pipelines/stable_diffusion/pix2pix.mdx

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -68,3 +68,6 @@ images[0].save("snowy_mountains.png")
6868
[[autodoc]] StableDiffusionInstructPix2PixPipeline
6969
- __call__
7070
- all
71+
- load_textual_inversion
72+
- load_lora_weights
73+
- save_lora_weights

docs/source/en/api/pipelines/stable_diffusion/text2img.mdx

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,10 @@ Available Checkpoints are:
3939
- disable_xformers_memory_efficient_attention
4040
- enable_vae_tiling
4141
- disable_vae_tiling
42+
- load_textual_inversion
43+
- from_ckpt
44+
- load_lora_weights
45+
- save_lora_weights
4246

4347
[[autodoc]] FlaxStableDiffusionPipeline
4448
- all

docs/source/en/training/controlnet.mdx

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -113,6 +113,29 @@ accelerate launch train_controlnet.py \
113113
--gradient_accumulation_steps=4
114114
```
115115

116+
## Training with multiple GPUs
117+
118+
`accelerate` allows for seamless multi-GPU training. Follow the instructions [here](https://huggingface.co/docs/accelerate/basic_tutorials/launch)
119+
for running distributed training with `accelerate`. Here is an example command:
120+
121+
```bash
122+
export MODEL_DIR="runwayml/stable-diffusion-v1-5"
123+
export OUTPUT_DIR="path to save model"
124+
125+
accelerate launch --mixed_precision="fp16" --multi_gpu train_controlnet.py \
126+
--pretrained_model_name_or_path=$MODEL_DIR \
127+
--output_dir=$OUTPUT_DIR \
128+
--dataset_name=fusing/fill50k \
129+
--resolution=512 \
130+
--learning_rate=1e-5 \
131+
--validation_image "./conditioning_image_1.png" "./conditioning_image_2.png" \
132+
--validation_prompt "red circle with blue background" "cyan circle with brown floral background" \
133+
--train_batch_size=4 \
134+
--mixed_precision="fp16" \
135+
--tracker_project_name="controlnet-demo" \
136+
--report_to=wandb
137+
```
138+
116139
## Example results
117140

118141
#### After 300 steps with batch size 8

docs/source/en/training/instructpix2pix.mdx

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -126,6 +126,27 @@ accelerate launch --mixed_precision="fp16" train_instruct_pix2pix.py \
126126

127127
***Note: In the original paper, the authors observed that even when the model is trained with an image resolution of 256x256, it generalizes well to bigger resolutions such as 512x512. This is likely because of the larger dataset they used during training.***
128128

129+
## Training with multiple GPUs
130+
131+
`accelerate` allows for seamless multi-GPU training. Follow the instructions [here](https://huggingface.co/docs/accelerate/basic_tutorials/launch)
132+
for running distributed training with `accelerate`. Here is an example command:
133+
134+
```bash
135+
accelerate launch --mixed_precision="fp16" --multi_gpu train_instruct_pix2pix.py \
136+
--pretrained_model_name_or_path=runwayml/stable-diffusion-v1-5 \
137+
--dataset_name=sayakpaul/instructpix2pix-1000-samples \
138+
--use_ema \
139+
--enable_xformers_memory_efficient_attention \
140+
--resolution=512 --random_flip \
141+
--train_batch_size=4 --gradient_accumulation_steps=4 --gradient_checkpointing \
142+
--max_train_steps=15000 \
143+
--checkpointing_steps=5000 --checkpoints_total_limit=1 \
144+
--learning_rate=5e-05 --lr_warmup_steps=0 \
145+
--conditioning_dropout_prob=0.05 \
146+
--mixed_precision=fp16 \
147+
--seed=42
148+
```
149+
129150
## Inference
130151

131152
Once training is complete, we can perform inference:

docs/source/en/training/text2image.mdx

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -106,6 +106,31 @@ accelerate launch train_text_to_image.py \
106106
--lr_scheduler="constant" --lr_warmup_steps=0 \
107107
--output_dir=${OUTPUT_DIR}
108108
```
109+
110+
#### Training with multiple GPUs
111+
112+
`accelerate` allows for seamless multi-GPU training. Follow the instructions [here](https://huggingface.co/docs/accelerate/basic_tutorials/launch)
113+
for running distributed training with `accelerate`. Here is an example command:
114+
115+
```bash
116+
export MODEL_NAME="CompVis/stable-diffusion-v1-4"
117+
export dataset_name="lambdalabs/pokemon-blip-captions"
118+
119+
accelerate launch --mixed_precision="fp16" --multi_gpu train_text_to_image.py \
120+
--pretrained_model_name_or_path=$MODEL_NAME \
121+
--dataset_name=$dataset_name \
122+
--use_ema \
123+
--resolution=512 --center_crop --random_flip \
124+
--train_batch_size=1 \
125+
--gradient_accumulation_steps=4 \
126+
--gradient_checkpointing \
127+
--max_train_steps=15000 \
128+
--learning_rate=1e-05 \
129+
--max_grad_norm=1 \
130+
--lr_scheduler="constant" --lr_warmup_steps=0 \
131+
--output_dir="sd-pokemon-model"
132+
```
133+
109134
</pt>
110135
<jax>
111136
With Flax, it's possible to train a Stable Diffusion model faster on TPUs and GPUs thanks to [@duongna211](https://github.com/duongna21). This is very efficient on TPU hardware but works great on GPUs too. The Flax training script doesn't support features like gradient checkpointing or gradient accumulation yet, so you'll need a GPU with at least 30GB of memory or a TPU v3.

docs/source/en/training/unconditional_training.mdx

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -122,6 +122,26 @@ accelerate launch train_unconditional.py \
122122
<img src="https://user-images.githubusercontent.com/26864830/180248200-928953b4-db38-48db-b0c6-8b740fe6786f.png"/>
123123
</div>
124124

125+
### Training with multiple GPUs
126+
127+
`accelerate` allows for seamless multi-GPU training. Follow the instructions [here](https://huggingface.co/docs/accelerate/basic_tutorials/launch)
128+
for running distributed training with `accelerate`. Here is an example command:
129+
130+
```bash
131+
accelerate launch --mixed_precision="fp16" --multi_gpu train_unconditional.py \
132+
--dataset_name="huggan/pokemon" \
133+
--resolution=64 --center_crop --random_flip \
134+
--output_dir="ddpm-ema-pokemon-64" \
135+
--train_batch_size=16 \
136+
--num_epochs=100 \
137+
--gradient_accumulation_steps=1 \
138+
--use_ema \
139+
--learning_rate=1e-4 \
140+
--lr_warmup_steps=500 \
141+
--mixed_precision="fp16" \
142+
--logger="wandb"
143+
```
144+
125145
## Finetuning with your own data
126146

127147
There are two ways to finetune a model on your own dataset:

examples/community/README.md

Lines changed: 32 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ MagicMix | Diffusion Pipeline for semantic mixing of an image and a text prompt
3131
| UnCLIP Image Interpolation Pipeline | Diffusion Pipeline that allows passing two images/image_embeddings and produces images while interpolating between their image-embeddings | [UnCLIP Image Interpolation Pipeline](#unclip-image-interpolation-pipeline) | - | [Naga Sai Abhinay Devarinti](https://github.com/Abhinay1997/) |
3232
| DDIM Noise Comparative Analysis Pipeline | Investigating how the diffusion models learn visual concepts from each noise level (which is a contribution of [P2 weighting (CVPR 2022)](https://arxiv.org/abs/2204.00227)) | [DDIM Noise Comparative Analysis Pipeline](#ddim-noise-comparative-analysis-pipeline) | - |[Aengus (Duc-Anh)](https://github.com/aengusng8) |
3333
| CLIP Guided Img2Img Stable Diffusion Pipeline | Doing CLIP guidance for image to image generation with Stable Diffusion | [CLIP Guided Img2Img Stable Diffusion](#clip-guided-img2img-stable-diffusion) | - | [Nipun Jindal](https://github.com/nipunjindal/) |
34-
34+
| TensorRT Stable Diffusion Pipeline | Accelerates the Stable Diffusion Text2Image Pipeline using TensorRT | [TensorRT Stable Diffusion Pipeline](#tensorrt-text2image-stable-diffusion-pipeline) | - |[Asfiya Baig](https://github.com/asfiyab-nvidia) |
3535

3636

3737
To load a custom pipeline you just need to pass the `custom_pipeline` argument to `DiffusionPipeline`, as one of the files in `diffusers/examples/community`. Feel free to send a PR with your own pipelines, we will merge them quickly.
@@ -1130,3 +1130,34 @@ Init Image
11301130
Output Image
11311131

11321132
![img2img_clip_guidance](https://huggingface.co/datasets/njindal/images/resolve/main/clip_guided_img2img.jpg)
1133+
1134+
### TensorRT Text2Image Stable Diffusion Pipeline
1135+
1136+
The TensorRT Pipeline can be used to accelerate the Text2Image Stable Diffusion Inference run.
1137+
1138+
NOTE: The ONNX conversions and TensorRT engine build may take up to 30 minutes.
1139+
1140+
```python
1141+
import torch
1142+
from diffusers import DDIMScheduler
1143+
from diffusers.pipelines.stable_diffusion import StableDiffusionPipeline
1144+
1145+
# Use the DDIMScheduler scheduler here instead
1146+
scheduler = DDIMScheduler.from_pretrained("stabilityai/stable-diffusion-2-1",
1147+
subfolder="scheduler")
1148+
1149+
pipe = StableDiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-2-1",
1150+
custom_pipeline="stable_diffusion_tensorrt_txt2img",
1151+
revision='fp16',
1152+
torch_dtype=torch.float16,
1153+
scheduler=scheduler,)
1154+
1155+
# re-use cached folder to save ONNX models and TensorRT Engines
1156+
pipe.set_cached_folder("stabilityai/stable-diffusion-2-1", revision='fp16',)
1157+
1158+
pipe = pipe.to("cuda")
1159+
1160+
prompt = "a beautiful photograph of Mt. Fuji during cherry blossom"
1161+
image = pipe(prompt).images[0]
1162+
image.save('tensorrt_mt_fuji.png')
1163+
```

0 commit comments

Comments
 (0)