Why Dreambooth script (diffusers edition) has 35% less performance and using from 4Gig vRAM more than Dreambooth script (ShivamShrirao edition )??!

### Describe the bug

Hi!
I did some train my face with 15 images using from dreambooth script from these repos: 
**https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/train_dreambooth.py (diffusers edition)
https://github.com/ShivamShrirao/diffusers/blob/main/examples/dreambooth/train_dreambooth.py (ShivamShrirao Edition)**

When I starting to train with (ShivamShrirao Edition) with same base model and same configs and same max steps for 256 resolution I am getting training speed about 2.25 it/s and 12G VRAM and total time was about 9 min. but when I am using from (diffusers edition) of this dream booth script with the same everything I am getting 1.65 it/s and 14G VRAM and 17min !!

Why?! What is the main problem? However, everything is the same!!

Training progress in diffuser edition:
<img width="849" alt="Screenshot 2023-05-17 at 12 22 58 PM" src="https://github.com/huggingface/diffusers/assets/38086978/d791b098-646c-4e51-905b-289580eb2a3b">

Training progress in ShivamShrirao edition:
<img width="561" alt="Screenshot 2023-05-17 at 12 24 21 PM" src="https://github.com/huggingface/diffusers/assets/38086978/e278df54-a6ad-4ad9-bd85-f082d693a5df">


**NOTE:"I also change the scheduler of ShivamShrirao edition to the same like diffusers edition and the problem was not solved!"**

**And another question that I have is: what is the main difference between StableDiffusionPipeline with DiffusionPipeline?!**

@patrickvonplaten @sayakpaul @williamberman

### Reproduction

**(Diffusers Edition)**
!accelerate launch train_dreambooth_diffusers.py \
  --pretrained_model_name_or_path='XpucT/Deliberate' \
  --pretrained_vae_name_or_path="stabilityai/sd-vae-ft-mse" \
  --output_dir="./trained-models/diffuser-dreambooth/" \
  --instance_prompt="fd2556641d354203bf609cf9104d3d32" \
  --class_prompt="photo of a man" \
  --instance_data_dir="../ai-avatar-generator/datasets/instance-images/1fdfdf359622486c94e742e3e1a975cb/croped/" \
  --class_data_dir="./datasets/class-man/" \
  --enable_xformers_memory_efficient_attention \
  --with_prior_preservation \
  --prior_loss_weight=0.1 \
  --seed=7813 \
  --train_text_encoder \
  --resolution=256 \
  --train_batch_size=1 \
  --mixed_precision="fp16" \
  --use_8bit_adam \
  --gradient_accumulation_steps=1 \
  --learning_rate=1e-7 \
  --lr_scheduler="constant" \
  --lr_warmup_steps=120 \
  --num_class_images=180 \
  --max_train_steps=1200





**(ShivamShrirao edition)**
!accelerate launch scripts/train_dreambooth.py \
  --pretrained_model_name_or_path='XpucT/Deliberate' \
  --pretrained_vae_name_or_path="stabilityai/sd-vae-ft-mse" \
  --output_dir="./trained-models/normal-dreambooth/" \
  --instance_prompt="fd2556641d354203bf609cf9104d3d32" \
  --class_prompt="photo of a man" \
  --instance_data_dir="../ai-avatar-generator/datasets/instance-images/20ef9f463e2a45069fd70f9368590ce4/" \
  --class_data_dir="./datasets/class-man/" \
  --with_prior_preservation \
  --prior_loss_weight=0.1 \
  --seed=7813 \
  --train_text_encoder \
  --resolution=256 \
  --train_batch_size=1 \
  --mixed_precision="fp16" \
  --use_8bit_adam \
  --gradient_accumulation_steps=1 \
  --learning_rate=1e-7 \
  --lr_scheduler="constant" \
  --lr_warmup_steps=120 \
  --num_class_images=180 \
  --max_train_steps=1200 

### Logs

```shell
(Diffusers Edition)

/opt/conda/envs/hf/lib/python3.11/site-packages/accelerate/accelerator.py:258: FutureWarning: `logging_dir` is deprecated and will be removed in version 0.18.0 of 🤗 Accelerate. Use `project_dir` instead.
  warnings.warn(
05/16/2023 16:26:46 - INFO - __main__ - Distributed environment: DistributedType.NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda

Mixed precision type: fp16

You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
{'clip_sample_range', 'sample_max_value', 'dynamic_thresholding_ratio', 'thresholding', 'variance_type'} was not found in config. Values will be initialized to default values.
{'projection_class_embeddings_input_dim', 'addition_embed_type_num_heads', 'time_embedding_act_fn', 'time_cond_proj_dim', 'resnet_out_scale_factor', 'time_embedding_dim', 'cross_attention_norm', 'time_embedding_type', 'timestep_post_act', 'conv_in_kernel', 'mid_block_only_cross_attention', 'addition_embed_type', 'conv_out_kernel', 'class_embeddings_concat', 'resnet_skip_time_act', 'encoder_hid_dim'} was not found in config. Values will be initialized to default values.

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
bin /opt/conda/envs/hf/lib/python3.11/site-packages/bitsandbytes/libbitsandbytes_cuda113.so
/opt/conda/envs/hf/lib/python3.11/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: /opt/conda/envs/hf did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
/opt/conda/envs/hf/lib/python3.11/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('//matplotlib_inline.backend_inline'), PosixPath('module')}
  warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
/opt/conda/envs/hf/lib/python3.11/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/usr/local/cuda/lib64/libcudart.so.11.0'), PosixPath('/usr/local/cuda/lib64/libcudart.so')}.. We'll flip a coin and try one of these, in order to fail forward.
Either way, this might cause trouble in the future:
If you get `CUDA error: invalid device function` errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env.
  warn(msg)
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so.11.0
CUDA SETUP: Highest compute capability among GPUs detected: 7.5
CUDA SETUP: Detected CUDA version 113
CUDA SETUP: Loading binary /opt/conda/envs/hf/lib/python3.11/site-packages/bitsandbytes/libbitsandbytes_cuda113.so...
05/16/2023 16:27:14 - INFO - __main__ - ***** Running training *****
05/16/2023 16:27:14 - INFO - __main__ -   Num examples = 180
05/16/2023 16:27:14 - INFO - __main__ -   Num batches each epoch = 180
05/16/2023 16:27:14 - INFO - __main__ -   Num Epochs = 7
05/16/2023 16:27:14 - INFO - __main__ -   Instantaneous batch size per device = 1
05/16/2023 16:27:14 - INFO - __main__ -   Total train batch size (w. parallel, distributed & accumulation) = 1
05/16/2023 16:27:14 - INFO - __main__ -   Gradient Accumulation steps = 1
05/16/2023 16:27:14 - INFO - __main__ -   Total optimization steps = 1200
Steps:  42%|████▌      | 500/1200 [05:45<07:11,  1.62it/s, loss=0.0434, lr=1e-7]05/16/2023 16:32:59 - INFO - accelerate.accelerator - Saving current state to ./trained-models/diffuser-dreambooth/checkpoint-500
Configuration saved in ./trained-models/diffuser-dreambooth/checkpoint-500/unet/config.json
Model weights saved in ./trained-models/diffuser-dreambooth/checkpoint-500/unet/diffusion_pytorch_model.bin
05/16/2023 16:33:38 - INFO - accelerate.checkpointing - Optimizer state saved in ./trained-models/diffuser-dreambooth/checkpoint-500/optimizer.bin
05/16/2023 16:33:38 - INFO - accelerate.checkpointing - Scheduler state saved in ./trained-models/diffuser-dreambooth/checkpoint-500/scheduler.bin
05/16/2023 16:33:38 - INFO - accelerate.checkpointing - Gradient scaler state saved in ./trained-models/diffuser-dreambooth/checkpoint-500/scaler.pt
05/16/2023 16:33:38 - INFO - accelerate.checkpointing - Random states saved in ./trained-models/diffuser-dreambooth/checkpoint-500/random_states_0.pkl
05/16/2023 16:33:38 - INFO - __main__ - Saved state to ./trained-models/diffuser-dreambooth/checkpoint-500
Steps:  83%|████████▎ | 1000/1200 [11:46<02:10,  1.53it/s, loss=0.0251, lr=1e-7]05/16/2023 16:39:00 - INFO - accelerate.accelerator - Saving current state to ./trained-models/diffuser-dreambooth/checkpoint-1000
Configuration saved in ./trained-models/diffuser-dreambooth/checkpoint-1000/unet/config.json
Model weights saved in ./trained-models/diffuser-dreambooth/checkpoint-1000/unet/diffusion_pytorch_model.bin
05/16/2023 16:39:42 - INFO - accelerate.checkpointing - Optimizer state saved in ./trained-models/diffuser-dreambooth/checkpoint-1000/optimizer.bin
05/16/2023 16:39:42 - INFO - accelerate.checkpointing - Scheduler state saved in ./trained-models/diffuser-dreambooth/checkpoint-1000/scheduler.bin
05/16/2023 16:39:42 - INFO - accelerate.checkpointing - Gradient scaler state saved in ./trained-models/diffuser-dreambooth/checkpoint-1000/scaler.pt
05/16/2023 16:39:42 - INFO - accelerate.checkpointing - Random states saved in ./trained-models/diffuser-dreambooth/checkpoint-1000/random_states_0.pkl
05/16/2023 16:39:42 - INFO - __main__ - Saved state to ./trained-models/diffuser-dreambooth/checkpoint-1000
Steps: 100%|██████████| 1200/1200 [14:37<00:00,  1.54it/s, loss=0.0528, lr=1e-7]{'scaling_factor'} was not found in config. Values will be initialized to default values.
/opt/conda/envs/hf/lib/python3.11/site-packages/safetensors/torch.py:99: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  with safe_open(filename, framework="pt", device=device) as f:
/opt/conda/envs/hf/lib/python3.11/site-packages/torch/_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  return self.fget.__get__(instance, owner)()
/opt/conda/envs/hf/lib/python3.11/site-packages/torch/storage.py:899: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  storage = cls(wrap_storage=untyped_storage)
safety_checker/model.safetensors not found
`text_config_dict` is provided which will be used to initialize `CLIPTextConfig`. The value `text_config["id2label"]` will be overriden.
/opt/conda/envs/hf/lib/python3.11/site-packages/transformers/models/clip/feature_extraction_clip.py:28: FutureWarning: The class CLIPFeatureExtractor is deprecated and will be removed in version 5 of Transformers. Please use CLIPImageProcessor instead.
  warnings.warn(
Configuration saved in ./trained-models/diffuser-dreambooth/vae/config.json
Model weights saved in ./trained-models/diffuser-dreambooth/vae/diffusion_pytorch_model.bin
Configuration saved in ./trained-models/diffuser-dreambooth/unet/config.json
Model weights saved in ./trained-models/diffuser-dreambooth/unet/diffusion_pytorch_model.bin
Configuration saved in ./trained-models/diffuser-dreambooth/scheduler/scheduler_config.json
Configuration saved in ./trained-models/diffuser-dreambooth/model_index.json
Steps: 100%|██████████| 1200/1200 [15:34<00:00,  1.28it/s, loss=0.0528, lr=1e-7]


-----------------------------------------------------------------------------------------------


(ShivamShrirao Edition)

/opt/conda/envs/hf/lib/python3.11/site-packages/accelerate/accelerator.py:258: FutureWarning: `logging_dir` is deprecated and will be removed in version 0.18.0 of 🤗 Accelerate. Use `project_dir` instead.
  warnings.warn(

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
bin /opt/conda/envs/hf/lib/python3.11/site-packages/bitsandbytes/libbitsandbytes_cuda113.so
/opt/conda/envs/hf/lib/python3.11/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: /opt/conda/envs/hf did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
/opt/conda/envs/hf/lib/python3.11/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('//matplotlib_inline.backend_inline'), PosixPath('module')}
  warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
/opt/conda/envs/hf/lib/python3.11/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/usr/local/cuda/lib64/libcudart.so'), PosixPath('/usr/local/cuda/lib64/libcudart.so.11.0')}.. We'll flip a coin and try one of these, in order to fail forward.
Either way, this might cause trouble in the future:
If you get `CUDA error: invalid device function` errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env.
  warn(msg)
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 7.5
CUDA SETUP: Detected CUDA version 113
CUDA SETUP: Loading binary /opt/conda/envs/hf/lib/python3.11/site-packages/bitsandbytes/libbitsandbytes_cuda113.so...
/opt/conda/envs/hf/lib/python3.11/site-packages/diffusers/configuration_utils.py:215: FutureWarning: It is deprecated to pass a pretrained model name or path to `from_config`.If you were trying to load a scheduler, please use <class 'diffusers.schedulers.scheduling_ddpm.DDPMScheduler'>.from_pretrained(...) instead. Otherwise, please make sure to pass a configuration dictionary instead. This functionality will be removed in v1.0.0.
  deprecate("config-passed-as-path", "1.0.0", deprecation_message, standard_warn=False)
Caching latents: 100%|████████████████████████| 180/180 [00:24<00:00,  7.39it/s]
05/17/2023 07:34:55 - INFO - __main__ - ***** Running training *****
05/17/2023 07:34:55 - INFO - __main__ -   Num examples = 180
05/17/2023 07:34:55 - INFO - __main__ -   Num batches each epoch = 180
05/17/2023 07:34:55 - INFO - __main__ -   Num Epochs = 7
05/17/2023 07:34:55 - INFO - __main__ -   Instantaneous batch size per device = 1
05/17/2023 07:34:55 - INFO - __main__ -   Total train batch size (w. parallel, distributed & accumulation) = 1
05/17/2023 07:34:55 - INFO - __main__ -   Gradient Accumulation steps = 1
05/17/2023 07:34:55 - INFO - __main__ -   Total optimization steps = 1200
Steps: 100%|███████████| 1200/1200 [09:01<00:00,  2.25it/s, loss=0.158, lr=1e-7]/opt/conda/envs/hf/lib/python3.11/site-packages/safetensors/torch.py:99: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  with safe_open(filename, framework="pt", device=device) as f:
/opt/conda/envs/hf/lib/python3.11/site-packages/torch/_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  return self.fget.__get__(instance, owner)()
/opt/conda/envs/hf/lib/python3.11/site-packages/torch/storage.py:899: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  storage = cls(wrap_storage=untyped_storage)
/opt/conda/envs/hf/lib/python3.11/site-packages/transformers/models/clip/feature_extraction_clip.py:28: FutureWarning: The class CLIPFeatureExtractor is deprecated and will be removed in version 5 of Transformers. Please use CLIPImageProcessor instead.
  warnings.warn(
You have disabled the safety checker for <class 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline'> by passing `safety_checker=None`. Ensure that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling it only for use-cases that involve analyzing network behavior or auditing its results. For more information, please have a look at https://github.com/huggingface/diffusers/pull/254 .
[*] Weights saved at ./trained-models/normal-dreambooth/1200
Steps: 100%|███████████| 1200/1200 [09:34<00:00,  2.09it/s, loss=0.158, lr=1e-7]
```


### System Info

GPU: NVIDIA Tesla T4  (16G VRAM)
OS: Debian GNU/Linux 10 (buster) x86_64
Host: Google Compute Engine 
Shell: bash 5.0.3 
CPU: Intel Xeon (2) @ 2.199GHz
Memory: 6388MiB / 12010MiB 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Why Dreambooth script (diffusers edition) has 35% less performance and using from 4Gig vRAM more than Dreambooth script (ShivamShrirao edition )??! #3461

Describe the bug

Reproduction

Logs

System Info

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Why Dreambooth script (diffusers edition) has 35% less performance and using from 4Gig vRAM more than Dreambooth script (ShivamShrirao edition )??! #3461

Description

Describe the bug

Reproduction

Logs

System Info

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions