huggingface
diff --git a/‎docs/source/en/api/loaders.mdx
Lines changed: 4 additions & 0 deletions b/‎docs/source/en/api/loaders.mdx
Lines changed: 4 additions & 0 deletions
diff --git a/‎docs/source/en/api/pipelines/stable_diffusion/controlnet.mdx
Lines changed: 1 addition & 0 deletions b/‎docs/source/en/api/pipelines/stable_diffusion/controlnet.mdx
Lines changed: 1 addition & 0 deletions
diff --git a/‎docs/source/en/api/pipelines/stable_diffusion/depth2img.mdx
Lines changed: 4 additions & 1 deletion b/‎docs/source/en/api/pipelines/stable_diffusion/depth2img.mdx
Lines changed: 4 additions & 1 deletion
diff --git a/‎docs/source/en/api/pipelines/stable_diffusion/img2img.mdx
Lines changed: 5 additions & 1 deletion b/‎docs/source/en/api/pipelines/stable_diffusion/img2img.mdx
Lines changed: 5 additions & 1 deletion
diff --git a/‎docs/source/en/api/pipelines/stable_diffusion/inpaint.mdx
Lines changed: 4 additions & 1 deletion b/‎docs/source/en/api/pipelines/stable_diffusion/inpaint.mdx
Lines changed: 4 additions & 1 deletion
diff --git a/‎docs/source/en/api/pipelines/stable_diffusion/pix2pix.mdx
Lines changed: 3 additions & 0 deletions b/‎docs/source/en/api/pipelines/stable_diffusion/pix2pix.mdx
Lines changed: 3 additions & 0 deletions
diff --git a/‎docs/source/en/api/pipelines/stable_diffusion/text2img.mdx
Lines changed: 4 additions & 0 deletions b/‎docs/source/en/api/pipelines/stable_diffusion/text2img.mdx
Lines changed: 4 additions & 0 deletions
diff --git a/‎docs/source/en/training/controlnet.mdx
Lines changed: 23 additions & 0 deletions b/‎docs/source/en/training/controlnet.mdx
Lines changed: 23 additions & 0 deletions
diff --git a/‎docs/source/en/training/instructpix2pix.mdx
Lines changed: 21 additions & 0 deletions b/‎docs/source/en/training/instructpix2pix.mdx
Lines changed: 21 additions & 0 deletions
diff --git a/‎docs/source/en/training/text2image.mdx
Lines changed: 25 additions & 0 deletions b/‎docs/source/en/training/text2image.mdx
Lines changed: 25 additions & 0 deletions
diff --git a/‎docs/source/en/training/unconditional_training.mdx
Lines changed: 20 additions & 0 deletions b/‎docs/source/en/training/unconditional_training.mdx
Lines changed: 20 additions & 0 deletions
diff --git a/‎examples/community/README.md
Lines changed: 32 additions & 1 deletion b/‎examples/community/README.md
Lines changed: 32 additions & 1 deletion
@@ -36,3 +36,7 @@ API to load such adapter neural networks via the [`loaders.py` module](https://g
 ### LoraLoaderMixin
 
 [[autodoc]] loaders.LoraLoaderMixin
+
+### FromCkptMixin
+
+[[autodoc]] loaders.FromCkptMixin
@@ -308,6 +308,7 @@ All checkpoints can be found under the authors' namespace [lllyasviel](https://h
 	- disable_vae_slicing
 	- enable_xformers_memory_efficient_attention
 	- disable_xformers_memory_efficient_attention
+	- load_textual_inversion
 
 ## FlaxStableDiffusionControlNetPipeline
 [[autodoc]] FlaxStableDiffusionControlNetPipeline
 
@@ -30,4 +30,7 @@ Available Checkpoints are:
 	- enable_attention_slicing
 	- disable_attention_slicing
 	- enable_xformers_memory_efficient_attention
-	- disable_xformers_memory_efficient_attention
+	- disable_xformers_memory_efficient_attention
+	- load_textual_inversion
+	- load_lora_weights
+	- save_lora_weights
@@ -30,7 +30,11 @@ proposed by Chenlin Meng, Yutong He, Yang Song, Jiaming Song, Jiajun Wu, Jun-Yan
 	- disable_attention_slicing
 	- enable_xformers_memory_efficient_attention
 	- disable_xformers_memory_efficient_attention
+	- load_textual_inversion
+	- from_ckpt
+	- load_lora_weights
+	- save_lora_weights
 
 [[autodoc]] FlaxStableDiffusionImg2ImgPipeline
 	- all
-	- __call__
+	- __call__
@@ -31,7 +31,10 @@ Available checkpoints are:
 	- disable_attention_slicing
 	- enable_xformers_memory_efficient_attention
 	- disable_xformers_memory_efficient_attention
+	- load_textual_inversion
+	- load_lora_weights
+	- save_lora_weights
 
 [[autodoc]] FlaxStableDiffusionInpaintPipeline
 	- all
-	- __call__
+	- __call__
@@ -68,3 +68,6 @@ images[0].save("snowy_mountains.png")
 [[autodoc]] StableDiffusionInstructPix2PixPipeline
 	- __call__
 	- all
+	- load_textual_inversion
+	- load_lora_weights
+	- save_lora_weights
@@ -39,6 +39,10 @@ Available Checkpoints are:
 	- disable_xformers_memory_efficient_attention
 	- enable_vae_tiling
 	- disable_vae_tiling
+	- load_textual_inversion
+	- from_ckpt
+	- load_lora_weights
+	- save_lora_weights
 
 [[autodoc]] FlaxStableDiffusionPipeline
 	- all
 
@@ -113,6 +113,29 @@ accelerate launch train_controlnet.py \
  --gradient_accumulation_steps=4
 ```
 
+## Training with multiple GPUs
+
+`accelerate` allows for seamless multi-GPU training. Follow the instructions [here](https://huggingface.co/docs/accelerate/basic_tutorials/launch)
+for running distributed training with `accelerate`. Here is an example command:
+
+```bash 
+export MODEL_DIR="runwayml/stable-diffusion-v1-5"
+export OUTPUT_DIR="path to save model"
+
+accelerate launch --mixed_precision="fp16" --multi_gpu train_controlnet.py \
+ --pretrained_model_name_or_path=$MODEL_DIR \
+ --output_dir=$OUTPUT_DIR \
+ --dataset_name=fusing/fill50k \
+ --resolution=512 \
+ --learning_rate=1e-5 \
+ --validation_image "./conditioning_image_1.png" "./conditioning_image_2.png" \
+ --validation_prompt "red circle with blue background" "cyan circle with brown floral background" \
+ --train_batch_size=4 \
+ --mixed_precision="fp16" \
+ --tracker_project_name="controlnet-demo" \
+ --report_to=wandb
+```
+
 ## Example results
 
 #### After 300 steps with batch size 8
 
@@ -126,6 +126,27 @@ accelerate launch --mixed_precision="fp16" train_instruct_pix2pix.py \
 
  ***Note: In the original paper, the authors observed that even when the model is trained with an image resolution of 256x256, it generalizes well to bigger resolutions such as 512x512. This is likely because of the larger dataset they used during training.***
 
+ ## Training with multiple GPUs
+
+`accelerate` allows for seamless multi-GPU training. Follow the instructions [here](https://huggingface.co/docs/accelerate/basic_tutorials/launch)
+for running distributed training with `accelerate`. Here is an example command:
+
+```bash 
+accelerate launch --mixed_precision="fp16" --multi_gpu train_instruct_pix2pix.py \
+ --pretrained_model_name_or_path=runwayml/stable-diffusion-v1-5 \
+ --dataset_name=sayakpaul/instructpix2pix-1000-samples \
+ --use_ema \
+ --enable_xformers_memory_efficient_attention \
+ --resolution=512 --random_flip \
+ --train_batch_size=4 --gradient_accumulation_steps=4 --gradient_checkpointing \
+ --max_train_steps=15000 \
+ --checkpointing_steps=5000 --checkpoints_total_limit=1 \
+ --learning_rate=5e-05 --lr_warmup_steps=0 \
+ --conditioning_dropout_prob=0.05 \
+ --mixed_precision=fp16 \
+ --seed=42 
+```
+
  ## Inference
 
  Once training is complete, we can perform inference:
 
@@ -106,6 +106,31 @@ accelerate launch train_text_to_image.py \
   --lr_scheduler="constant" --lr_warmup_steps=0 \
   --output_dir=${OUTPUT_DIR}
 ```
+
+#### Training with multiple GPUs
+
+`accelerate` allows for seamless multi-GPU training. Follow the instructions [here](https://huggingface.co/docs/accelerate/basic_tutorials/launch)
+for running distributed training with `accelerate`. Here is an example command:
+
+```bash
+export MODEL_NAME="CompVis/stable-diffusion-v1-4"
+export dataset_name="lambdalabs/pokemon-blip-captions"
+
+accelerate launch --mixed_precision="fp16" --multi_gpu  train_text_to_image.py \
+  --pretrained_model_name_or_path=$MODEL_NAME \
+  --dataset_name=$dataset_name \
+  --use_ema \
+  --resolution=512 --center_crop --random_flip \
+  --train_batch_size=1 \
+  --gradient_accumulation_steps=4 \
+  --gradient_checkpointing \
+  --max_train_steps=15000 \ 
+  --learning_rate=1e-05 \
+  --max_grad_norm=1 \
+  --lr_scheduler="constant" --lr_warmup_steps=0 \
+  --output_dir="sd-pokemon-model" 
+```
+
 </pt>
 <jax>
 With Flax, it's possible to train a Stable Diffusion model faster on TPUs and GPUs thanks to [@duongna211](https://github.com/duongna21). This is very efficient on TPU hardware but works great on GPUs too. The Flax training script doesn't support features like gradient checkpointing or gradient accumulation yet, so you'll need a GPU with at least 30GB of memory or a TPU v3.
 
@@ -122,6 +122,26 @@ accelerate launch train_unconditional.py \
     <img src="https://user-images.githubusercontent.com/26864830/180248200-928953b4-db38-48db-b0c6-8b740fe6786f.png"/>
 </div>
 
+### Training with multiple GPUs
+
+`accelerate` allows for seamless multi-GPU training. Follow the instructions [here](https://huggingface.co/docs/accelerate/basic_tutorials/launch)
+for running distributed training with `accelerate`. Here is an example command:
+
+```bash
+accelerate launch --mixed_precision="fp16" --multi_gpu train_unconditional.py \
+  --dataset_name="huggan/pokemon" \
+  --resolution=64 --center_crop --random_flip \
+  --output_dir="ddpm-ema-pokemon-64" \
+  --train_batch_size=16 \
+  --num_epochs=100 \
+  --gradient_accumulation_steps=1 \
+  --use_ema \
+  --learning_rate=1e-4 \
+  --lr_warmup_steps=500 \
+  --mixed_precision="fp16" \
+  --logger="wandb"
+```
+
 ## Finetuning with your own data
 
 There are two ways to finetune a model on your own dataset:
 
@@ -31,7 +31,7 @@ MagicMix | Diffusion Pipeline for semantic mixing of an image and a text prompt
 | UnCLIP Image Interpolation Pipeline | Diffusion Pipeline that allows passing two images/image_embeddings and produces images while interpolating between their image-embeddings | [UnCLIP Image Interpolation Pipeline](#unclip-image-interpolation-pipeline)                   | -                                                                                                                                                                                                                  | [Naga Sai Abhinay Devarinti](https://github.com/Abhinay1997/) | 
 | DDIM Noise Comparative Analysis Pipeline | Investigating how the diffusion models learn visual concepts from each noise level (which is a contribution of [P2 weighting (CVPR 2022)](https://arxiv.org/abs/2204.00227)) | [DDIM Noise Comparative Analysis Pipeline](#ddim-noise-comparative-analysis-pipeline) | - |[Aengus (Duc-Anh)](https://github.com/aengusng8) |
 | CLIP Guided Img2Img Stable Diffusion Pipeline | Doing CLIP guidance for image to image generation with Stable Diffusion | [CLIP Guided Img2Img Stable Diffusion](#clip-guided-img2img-stable-diffusion) | - | [Nipun Jindal](https://github.com/nipunjindal/) | 
-
+| TensorRT Stable Diffusion Pipeline | Accelerates the Stable Diffusion Text2Image Pipeline using TensorRT | [TensorRT Stable Diffusion Pipeline](#tensorrt-text2image-stable-diffusion-pipeline) | - |[Asfiya Baig](https://github.com/asfiyab-nvidia) |
 
 
 To load a custom pipeline you just need to pass the `custom_pipeline` argument to `DiffusionPipeline`, as one of the files in `diffusers/examples/community`. Feel free to send a PR with your own pipelines, we will merge them quickly.
@@ -1130,3 +1130,34 @@ Init Image
 Output Image
 
 ![img2img_clip_guidance](https://huggingface.co/datasets/njindal/images/resolve/main/clip_guided_img2img.jpg)
+
+### TensorRT Text2Image Stable Diffusion Pipeline
+
+The TensorRT Pipeline can be used to accelerate the Text2Image Stable Diffusion Inference run.
+
+NOTE: The ONNX conversions and TensorRT engine build may take up to 30 minutes.
+
+```python
+import torch
+from diffusers import DDIMScheduler
+from diffusers.pipelines.stable_diffusion import StableDiffusionPipeline
+
+# Use the DDIMScheduler scheduler here instead
+scheduler = DDIMScheduler.from_pretrained("stabilityai/stable-diffusion-2-1",
+                                            subfolder="scheduler")
+
+pipe = StableDiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-2-1",
+                                                custom_pipeline="stable_diffusion_tensorrt_txt2img",
+                                                revision='fp16',
+                                                torch_dtype=torch.float16,
+                                                scheduler=scheduler,)
+
+# re-use cached folder to save ONNX models and TensorRT Engines
+pipe.set_cached_folder("stabilityai/stable-diffusion-2-1", revision='fp16',)
+
+pipe = pipe.to("cuda")
+
+prompt = "a beautiful photograph of Mt. Fuji during cherry blossom"
+image = pipe(prompt).images[0]
+image.save('tensorrt_mt_fuji.png')
+```