Description
Describe the bug
Hi, I'm trying to fine-tuning stabilityai/stable-diffusion-3-medium-diffusers and using the official diffuser scripts. The train process was normal expect the loss cannot reduce. I hope to add a validation prompt to see if every thing works ok, so I used Ctrl-C stopped the training process and then added the --validation_prompt and --validation_epochs params. However, when I tried to re-start the train I only found the error below. I tried to use other checkpoints and delete that two params, however no one works.
Reproduction
Here are the checkpoints link: https://drive.google.com/drive/folders/16RbJa_W4H7aQiGf7QhTXVEJV53LPuS8n?usp=sharing , https://drive.google.com/drive/folders/1zT3LmB7SNtavHP3tgbodTgE13VT0cYvb?usp=sharing
The train command is: accelerate launch train_dreambooth_lora_sd3.py --pretrained_model_name_or_path="stabilityai/stable-diffusion-3-medium-diffusers" --output_dir=sd3-lora --instance_data_dir="sample-imgs" --instance_prompt="xxx" --resolution=900 --train_batch_size=1 --train_text_encoder --gradient_accumulation_steps=16 --optimizer="adamw" --learning_rate=1e-6 --text_encoder_lr=1e-6 --lr_scheduler="cosine" --lr_warmup_steps=500 --max_train_steps=4000 --rank=32 --seed="42" --gradient_checkpointing --resume_from_checkpoint latest --center_crop --report_to="wandb" --checkpointing_steps 20 --checkpoints_total_limit 3 --validation_prompt="xxx" --validation_epochs=1
Logs
07/24/2024 08:16:14 - INFO - __main__ - ***** Running training *****
07/24/2024 08:16:14 - INFO - __main__ - Num examples = 93
07/24/2024 08:16:14 - INFO - __main__ - Num batches each epoch = 93
07/24/2024 08:16:14 - INFO - __main__ - Num Epochs = 667
07/24/2024 08:16:14 - INFO - __main__ - Instantaneous batch size per device = 1
07/24/2024 08:16:14 - INFO - __main__ - Total train batch size (w. parallel, distributed & accumulation) = 16
07/24/2024 08:16:14 - INFO - __main__ - Gradient Accumulation steps = 16
07/24/2024 08:16:14 - INFO - __main__ - Total optimization steps = 4000
Resuming from checkpoint checkpoint-1100
07/24/2024 08:16:14 - INFO - accelerate.accelerator - Loading states from sd3-lora/checkpoint-1100
Traceback (most recent call last):
File "/workspace/sd3-fine-tune/train_dreambooth_lora_sd3.py", line 1876, in <module>
main(args)
File "/workspace/sd3-fine-tune/train_dreambooth_lora_sd3.py", line 1576, in main
accelerator.load_state(os.path.join(args.output_dir, path))
File "/opt/conda/lib/python3.10/site-packages/accelerate/accelerator.py", line 3131, in load_state
hook(models, input_dir)
File "/workspace/sd3-fine-tune/train_dreambooth_lora_sd3.py", line 1291, in load_model_hook
_set_state_dict_into_text_encoder(lora_state_dict, prefix="text_encoder.", text_encoder=text_encoder_one_)
File "/root/diffusers/src/diffusers/training_utils.py", line 221, in _set_state_dict_into_text_encoder
set_peft_model_state_dict(text_encoder, text_encoder_state_dict, adapter_name="default")
File "/opt/conda/lib/python3.10/site-packages/peft/utils/save_and_load.py", line 353, in set_peft_model_state_dict
load_result = model.load_state_dict(peft_model_state_dict, strict=False)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2153, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for CLIPTextModelWithProjection:
size mismatch for text_model.encoder.layers.0.self_attn.k_proj.lora_A.default.weight: copying a param with shape torch.Size([32, 1280]) from checkpoint, the shape in current model is torch.Size([32, 768]).
size mismatch for text_model.encoder.layers.0.self_attn.k_proj.lora_B.default.weight: copying a param with shape torch.Size([1280, 32]) from checkpoint, the shape in current model is torch.Size([768, 32]).
size mismatch for text_model.encoder.layers.0.self_attn.v_proj.lora_A.default.weight: copying a param with shape torch.Size([32, 1280]) from checkpoint, the shape in current model is torch.Size([32, 768]).
size mismatch for text_model.encoder.layers.0.self_attn.v_proj.lora_B.default.weight: copying a param with shape torch.Size([1280, 32]) from checkpoint, the shape in current model is torch.Size([768, 32]).
size mismatch for text_model.encoder.layers.0.self_attn.q_proj.lora_A.default.weight: copying a param with shape torch.Size([32, 1280]) from checkpoint, the shape in current model is torch.Size([32, 768]).
size mismatch for text_model.encoder.layers.0.self_attn.q_proj.lora_B.default.weight: copying a param with shape torch.Size([1280, 32]) from checkpoint, the shape in current model is torch.Size([768, 32]).
size mismatch for text_model.encoder.layers.0.self_attn.out_proj.lora_A.default.weight: copying a param with shape torch.Size([32, 1280]) from checkpoint, the shape in current model is torch.Size([32, 768]).
size mismatch for text_model.encoder.layers.0.self_attn.out_proj.lora_B.default.weight: copying a param with shape torch.Size([1280, 32]) from checkpoint, the shape in current model is torch.Size([768, 32]).
size mismatch for text_model.encoder.layers.1.self_attn.k_proj.lora_A.default.weight: copying a param with shape torch.Size([32, 1280]) from checkpoint, the shape in current model is torch.Size([32, 768]).
size mismatch for text_model.encoder.layers.1.self_attn.k_proj.lora_B.default.weight: copying a param with shape torch.Size([1280, 32]) from checkpoint, the shape in current model is torch.Size([768, 32]).
size mismatch for text_model.encoder.layers.1.self_attn.v_proj.lora_A.default.weight: copying a param with shape torch.Size([32, 1280]) from checkpoint, the shape in current model is torch.Size([32, 768]).
size mismatch for text_model.encoder.layers.1.self_attn.v_proj.lora_B.default.weight: copying a param with shape torch.Size([1280, 32]) from checkpoint, the shape in current model is torch.Size([768, 32]).
size mismatch for text_model.encoder.layers.1.self_attn.q_proj.lora_A.default.weight: copying a param with shape torch.Size([32, 1280]) from checkpoint, the shape in current model is torch.Size([32, 768]).
size mismatch for text_model.encoder.layers.1.self_attn.q_proj.lora_B.default.weight: copying a param with shape torch.Size([1280, 32]) from checkpoint, the shape in current model is torch.Size([768, 32]).
size mismatch for text_model.encoder.layers.1.self_attn.out_proj.lora_A.default.weight: copying a param with shape torch.Size([32, 1280]) from checkpoint, the shape in current model is torch.Size([32, 768]).
size mismatch for text_model.encoder.layers.1.self_attn.out_proj.lora_B.default.weight: copying a param with shape torch.Size([1280, 32]) from checkpoint, the shape in current model is torch.Size([768, 32]).
size mismatch for text_model.encoder.layers.2.self_attn.k_proj.lora_A.default.weight: copying a param with shape torch.Size([32, 1280]) from checkpoint, the shape in current model is torch.Size([32, 768]).
size mismatch for text_model.encoder.layers.2.self_attn.k_proj.lora_B.default.weight: copying a param with shape torch.Size([1280, 32]) from checkpoint, the shape in current model is torch.Size([768, 32]).
size mismatch for text_model.encoder.layers.2.self_attn.v_proj.lora_A.default.weight: copying a param with shape torch.Size([32, 1280]) from checkpoint, the shape in current model is torch.Size([32, 768]).
size mismatch for text_model.encoder.layers.2.self_attn.v_proj.lora_B.default.weight: copying a param with shape torch.Size([1280, 32]) from checkpoint, the shape in current model is torch.Size([768, 32]).
size mismatch for text_model.encoder.layers.2.self_attn.q_proj.lora_A.default.weight: copying a param with shape torch.Size([32, 1280]) from checkpoint, the shape in current model is torch.Size([32, 768]).
size mismatch for text_model.encoder.layers.2.self_attn.q_proj.lora_B.default.weight: copying a param with shape torch.Size([1280, 32]) from checkpoint, the shape in current model is torch.Size([768, 32]).
size mismatch for text_model.encoder.layers.2.self_attn.out_proj.lora_A.default.weight: copying a param with shape torch.Size([32, 1280]) from checkpoint, the shape in current model is torch.Size([32, 768]).
size mismatch for text_model.encoder.layers.2.self_attn.out_proj.lora_B.default.weight: copying a param with shape torch.Size([1280, 32]) from checkpoint, the shape in current model is torch.Size([768, 32]).
size mismatch for text_model.encoder.layers.3.self_attn.k_proj.lora_A.default.weight: copying a param with shape torch.Size([32, 1280]) from checkpoint, the shape in current model is torch.Size([32, 768]).
size mismatch for text_model.encoder.layers.3.self_attn.k_proj.lora_B.default.weight: copying a param with shape torch.Size([1280, 32]) from checkpoint, the shape in current model is torch.Size([768, 32]).
size mismatch for text_model.encoder.layers.3.self_attn.v_proj.lora_A.default.weight: copying a param with shape torch.Size([32, 1280]) from checkpoint, the shape in current model is torch.Size([32, 768]).
size mismatch for text_model.encoder.layers.3.self_attn.v_proj.lora_B.default.weight: copying a param with shape torch.Size([1280, 32]) from checkpoint, the shape in current model is torch.Size([768, 32]).
size mismatch for text_model.encoder.layers.3.self_attn.q_proj.lora_A.default.weight: copying a param with shape torch.Size([32, 1280]) from checkpoint, the shape in current model is torch.Size([32, 768]).
size mismatch for text_model.encoder.layers.3.self_attn.q_proj.lora_B.default.weight: copying a param with shape torch.Size([1280, 32]) from checkpoint, the shape in current model is torch.Size([768, 32]).
size mismatch for text_model.encoder.layers.3.self_attn.out_proj.lora_A.default.weight: copying a param with shape torch.Size([32, 1280]) from checkpoint, the shape in current model is torch.Size([32, 768]).
size mismatch for text_model.encoder.layers.3.self_attn.out_proj.lora_B.default.weight: copying a param with shape torch.Size([1280, 32]) from checkpoint, the shape in current model is torch.Size([768, 32]).
size mismatch for text_model.encoder.layers.4.self_attn.k_proj.lora_A.default.weight: copying a param with shape torch.Size([32, 1280]) from checkpoint, the shape in current model is torch.Size([32, 768]).
size mismatch for text_model.encoder.layers.4.self_attn.k_proj.lora_B.default.weight: copying a param with shape torch.Size([1280, 32]) from checkpoint, the shape in current model is torch.Size([768, 32]).
size mismatch for text_model.encoder.layers.4.self_attn.v_proj.lora_A.default.weight: copying a param with shape torch.Size([32, 1280]) from checkpoint, the shape in current model is torch.Size([32, 768]).
size mismatch for text_model.encoder.layers.4.self_attn.v_proj.lora_B.default.weight: copying a param with shape torch.Size([1280, 32]) from checkpoint, the shape in current model is torch.Size([768, 32]).
size mismatch for text_model.encoder.layers.4.self_attn.q_proj.lora_A.default.weight: copying a param with shape torch.Size([32, 1280]) from checkpoint, the shape in current model is torch.Size([32, 768]).
size mismatch for text_model.encoder.layers.4.self_attn.q_proj.lora_B.default.weight: copying a param with shape torch.Size([1280, 32]) from checkpoint, the shape in current model is torch.Size([768, 32]).
size mismatch for text_model.encoder.layers.4.self_attn.out_proj.lora_A.default.weight: copying a param with shape torch.Size([32, 1280]) from checkpoint, the shape in current model is torch.Size([32, 768]).
size mismatch for text_model.encoder.layers.4.self_attn.out_proj.lora_B.default.weight: copying a param with shape torch.Size([1280, 32]) from checkpoint, the shape in current model is torch.Size([768, 32]).
size mismatch for text_model.encoder.layers.5.self_attn.k_proj.lora_A.default.weight: copying a param with shape torch.Size([32, 1280]) from checkpoint, the shape in current model is torch.Size([32, 768]).
size mismatch for text_model.encoder.layers.5.self_attn.k_proj.lora_B.default.weight: copying a param with shape torch.Size([1280, 32]) from checkpoint, the shape in current model is torch.Size([768, 32]).
size mismatch for text_model.encoder.layers.5.self_attn.v_proj.lora_A.default.weight: copying a param with shape torch.Size([32, 1280]) from checkpoint, the shape in current model is torch.Size([32, 768]).
size mismatch for text_model.encoder.layers.5.self_attn.v_proj.lora_B.default.weight: copying a param with shape torch.Size([1280, 32]) from checkpoint, the shape in current model is torch.Size([768, 32]).
size mismatch for text_model.encoder.layers.5.self_attn.q_proj.lora_A.default.weight: copying a param with shape torch.Size([32, 1280]) from checkpoint, the shape in current model is torch.Size([32, 768]).
size mismatch for text_model.encoder.layers.5.self_attn.q_proj.lora_B.default.weight: copying a param with shape torch.Size([1280, 32]) from checkpoint, the shape in current model is torch.Size([768, 32]).
size mismatch for text_model.encoder.layers.5.self_attn.out_proj.lora_A.default.weight: copying a param with shape torch.Size([32, 1280]) from checkpoint, the shape in current model is torch.Size([32, 768]).
size mismatch for text_model.encoder.layers.5.self_attn.out_proj.lora_B.default.weight: copying a param with shape torch.Size([1280, 32]) from checkpoint, the shape in current model is torch.Size([768, 32]).
size mismatch for text_model.encoder.layers.6.self_attn.k_proj.lora_A.default.weight: copying a param with shape torch.Size([32, 1280]) from checkpoint, the shape in current model is torch.Size([32, 768]).
size mismatch for text_model.encoder.layers.6.self_attn.k_proj.lora_B.default.weight: copying a param with shape torch.Size([1280, 32]) from checkpoint, the shape in current model is torch.Size([768, 32]).
size mismatch for text_model.encoder.layers.6.self_attn.v_proj.lora_A.default.weight: copying a param with shape torch.Size([32, 1280]) from checkpoint, the shape in current model is torch.Size([32, 768]).
size mismatch for text_model.encoder.layers.6.self_attn.v_proj.lora_B.default.weight: copying a param with shape torch.Size([1280, 32]) from checkpoint, the shape in current model is torch.Size([768, 32]).
size mismatch for text_model.encoder.layers.6.self_attn.q_proj.lora_A.default.weight: copying a param with shape torch.Size([32, 1280]) from checkpoint, the shape in current model is torch.Size([32, 768]).
size mismatch for text_model.encoder.layers.6.self_attn.q_proj.lora_B.default.weight: copying a param with shape torch.Size([1280, 32]) from checkpoint, the shape in current model is torch.Size([768, 32]).
size mismatch for text_model.encoder.layers.6.self_attn.out_proj.lora_A.default.weight: copying a param with shape torch.Size([32, 1280]) from checkpoint, the shape in current model is torch.Size([32, 768]).
size mismatch for text_model.encoder.layers.6.self_attn.out_proj.lora_B.default.weight: copying a param with shape torch.Size([1280, 32]) from checkpoint, the shape in current model is torch.Size([768, 32]).
size mismatch for text_model.encoder.layers.7.self_attn.k_proj.lora_A.default.weight: copying a param with shape torch.Size([32, 1280]) from checkpoint, the shape in current model is torch.Size([32, 768]).
size mismatch for text_model.encoder.layers.7.self_attn.k_proj.lora_B.default.weight: copying a param with shape torch.Size([1280, 32]) from checkpoint, the shape in current model is torch.Size([768, 32]).
size mismatch for text_model.encoder.layers.7.self_attn.v_proj.lora_A.default.weight: copying a param with shape torch.Size([32, 1280]) from checkpoint, the shape in current model is torch.Size([32, 768]).
size mismatch for text_model.encoder.layers.7.self_attn.v_proj.lora_B.default.weight: copying a param with shape torch.Size([1280, 32]) from checkpoint, the shape in current model is torch.Size([768, 32]).
size mismatch for text_model.encoder.layers.7.self_attn.q_proj.lora_A.default.weight: copying a param with shape torch.Size([32, 1280]) from checkpoint, the shape in current model is torch.Size([32, 768]).
size mismatch for text_model.encoder.layers.7.self_attn.q_proj.lora_B.default.weight: copying a param with shape torch.Size([1280, 32]) from checkpoint, the shape in current model is torch.Size([768, 32]).
size mismatch for text_model.encoder.layers.7.self_attn.out_proj.lora_A.default.weight: copying a param with shape torch.Size([32, 1280]) from checkpoint, the shape in current model is torch.Size([32, 768]).
size mismatch for text_model.encoder.layers.7.self_attn.out_proj.lora_B.default.weight: copying a param with shape torch.Size([1280, 32]) from checkpoint, the shape in current model is torch.Size([768, 32]).
size mismatch for text_model.encoder.layers.8.self_attn.k_proj.lora_A.default.weight: copying a param with shape torch.Size([32, 1280]) from checkpoint, the shape in current model is torch.Size([32, 768]).
size mismatch for text_model.encoder.layers.8.self_attn.k_proj.lora_B.default.weight: copying a param with shape torch.Size([1280, 32]) from checkpoint, the shape in current model is torch.Size([768, 32]).
size mismatch for text_model.encoder.layers.8.self_attn.v_proj.lora_A.default.weight: copying a param with shape torch.Size([32, 1280]) from checkpoint, the shape in current model is torch.Size([32, 768]).
size mismatch for text_model.encoder.layers.8.self_attn.v_proj.lora_B.default.weight: copying a param with shape torch.Size([1280, 32]) from checkpoint, the shape in current model is torch.Size([768, 32]).
size mismatch for text_model.encoder.layers.8.self_attn.q_proj.lora_A.default.weight: copying a param with shape torch.Size([32, 1280]) from checkpoint, the shape in current model is torch.Size([32, 768]).
size mismatch for text_model.encoder.layers.8.self_attn.q_proj.lora_B.default.weight: copying a param with shape torch.Size([1280, 32]) from checkpoint, the shape in current model is torch.Size([768, 32]).
size mismatch for text_model.encoder.layers.8.self_attn.out_proj.lora_A.default.weight: copying a param with shape torch.Size([32, 1280]) from checkpoint, the shape in current model is torch.Size([32, 768]).
size mismatch for text_model.encoder.layers.8.self_attn.out_proj.lora_B.default.weight: copying a param with shape torch.Size([1280, 32]) from checkpoint, the shape in current model is torch.Size([768, 32]).
size mismatch for text_model.encoder.layers.9.self_attn.k_proj.lora_A.default.weight: copying a param with shape torch.Size([32, 1280]) from checkpoint, the shape in current model is torch.Size([32, 768]).
size mismatch for text_model.encoder.layers.9.self_attn.k_proj.lora_B.default.weight: copying a param with shape torch.Size([1280, 32]) from checkpoint, the shape in current model is torch.Size([768, 32]).
size mismatch for text_model.encoder.layers.9.self_attn.v_proj.lora_A.default.weight: copying a param with shape torch.Size([32, 1280]) from checkpoint, the shape in current model is torch.Size([32, 768]).
size mismatch for text_model.encoder.layers.9.self_attn.v_proj.lora_B.default.weight: copying a param with shape torch.Size([1280, 32]) from checkpoint, the shape in current model is torch.Size([768, 32]).
size mismatch for text_model.encoder.layers.9.self_attn.q_proj.lora_A.default.weight: copying a param with shape torch.Size([32, 1280]) from checkpoint, the shape in current model is torch.Size([32, 768]).
size mismatch for text_model.encoder.layers.9.self_attn.q_proj.lora_B.default.weight: copying a param with shape torch.Size([1280, 32]) from checkpoint, the shape in current model is torch.Size([768, 32]).
size mismatch for text_model.encoder.layers.9.self_attn.out_proj.lora_A.default.weight: copying a param with shape torch.Size([32, 1280]) from checkpoint, the shape in current model is torch.Size([32, 768]).
size mismatch for text_model.encoder.layers.9.self_attn.out_proj.lora_B.default.weight: copying a param with shape torch.Size([1280, 32]) from checkpoint, the shape in current model is torch.Size([768, 32]).
size mismatch for text_model.encoder.layers.10.self_attn.k_proj.lora_A.default.weight: copying a param with shape torch.Size([32, 1280]) from checkpoint, the shape in current model is torch.Size([32, 768]).
size mismatch for text_model.encoder.layers.10.self_attn.k_proj.lora_B.default.weight: copying a param with shape torch.Size([1280, 32]) from checkpoint, the shape in current model is torch.Size([768, 32]).
size mismatch for text_model.encoder.layers.10.self_attn.v_proj.lora_A.default.weight: copying a param with shape torch.Size([32, 1280]) from checkpoint, the shape in current model is torch.Size([32, 768]).
size mismatch for text_model.encoder.layers.10.self_attn.v_proj.lora_B.default.weight: copying a param with shape torch.Size([1280, 32]) from checkpoint, the shape in current model is torch.Size([768, 32]).
size mismatch for text_model.encoder.layers.10.self_attn.q_proj.lora_A.default.weight: copying a param with shape torch.Size([32, 1280]) from checkpoint, the shape in current model is torch.Size([32, 768]).
size mismatch for text_model.encoder.layers.10.self_attn.q_proj.lora_B.default.weight: copying a param with shape torch.Size([1280, 32]) from checkpoint, the shape in current model is torch.Size([768, 32]).
size mismatch for text_model.encoder.layers.10.self_attn.out_proj.lora_A.default.weight: copying a param with shape torch.Size([32, 1280]) from checkpoint, the shape in current model is torch.Size([32, 768]).
size mismatch for text_model.encoder.layers.10.self_attn.out_proj.lora_B.default.weight: copying a param with shape torch.Size([1280, 32]) from checkpoint, the shape in current model is torch.Size([768, 32]).
size mismatch for text_model.encoder.layers.11.self_attn.k_proj.lora_A.default.weight: copying a param with shape torch.Size([32, 1280]) from checkpoint, the shape in current model is torch.Size([32, 768]).
size mismatch for text_model.encoder.layers.11.self_attn.k_proj.lora_B.default.weight: copying a param with shape torch.Size([1280, 32]) from checkpoint, the shape in current model is torch.Size([768, 32]).
size mismatch for text_model.encoder.layers.11.self_attn.v_proj.lora_A.default.weight: copying a param with shape torch.Size([32, 1280]) from checkpoint, the shape in current model is torch.Size([32, 768]).
size mismatch for text_model.encoder.layers.11.self_attn.v_proj.lora_B.default.weight: copying a param with shape torch.Size([1280, 32]) from checkpoint, the shape in current model is torch.Size([768, 32]).
size mismatch for text_model.encoder.layers.11.self_attn.q_proj.lora_A.default.weight: copying a param with shape torch.Size([32, 1280]) from checkpoint, the shape in current model is torch.Size([32, 768]).
size mismatch for text_model.encoder.layers.11.self_attn.q_proj.lora_B.default.weight: copying a param with shape torch.Size([1280, 32]) from checkpoint, the shape in current model is torch.Size([768, 32]).
size mismatch for text_model.encoder.layers.11.self_attn.out_proj.lora_A.default.weight: copying a param with shape torch.Size([32, 1280]) from checkpoint, the shape in current model is torch.Size([32, 768]).
size mismatch for text_model.encoder.layers.11.self_attn.out_proj.lora_B.default.weight: copying a param with shape torch.Size([1280, 32]) from checkpoint, the shape in current model is torch.Size([768, 32]).
Traceback (most recent call last):
File "/opt/conda/bin/accelerate", line 8, in <module>
sys.exit(main())
File "/opt/conda/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main
args.func(args)
File "/opt/conda/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1106, in launch_command
simple_launcher(args)
File "/opt/conda/lib/python3.10/site-packages/accelerate/commands/launch.py", line 704, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/opt/conda/bin/python', 'train_dreambooth_lora_sd3.py', '--pretrained_model_name_or_path=stabilityai/stable-diffusion-3-medium-diffusers', '--output_dir=coconut-lora', '--instance_data_dir=sample-imgs', '--instance_prompt=xxx', '--resolution=900', '--train_batch_size=1', '--train_text_encoder', '--gradient_accumulation_steps=16', '--optimizer=adamw', '--learning_rate=1e-6', '--text_encoder_lr=1e-6', '--lr_scheduler=cosine', '--lr_warmup_steps=500', '--max_train_steps=4000', '--rank=32', '--seed=42', '--gradient_checkpointing', '--resume_from_checkpoint', 'latest', '--center_crop', '--report_to=wandb', '--checkpointing_steps', '20', '--checkpoints_total_limit', '3', '--validation_prompt=xxx', '--validation_epochs=1']' returned non-zero exit status 1.
System Info
🤗 Diffusers version: 0.30.0.dev0
Platform: Linux-6.5.0-25-generic-x86_64-with-glibc2.35
Running on a notebook?: No
Running on Google Colab?: No
Python version: 3.10.13
PyTorch version (GPU?): 2.2.1 (True)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Huggingface_hub version: 0.24.0
Transformers version: 4.42.4
Accelerate version: 0.32.1
PEFT version: 0.11.1
Bitsandbytes version: not installed
Safetensors version: 0.4.3
xFormers version: not installed
Accelerator: NVIDIA L40s, 49152 MiB VRAM
Using GPU in script?: NVIDIA L40s, 49152 MiB VRAM
Using distributed or parallel set-up in script?: No