Skip to content

Support for resume_from_checkpoint in examples/dreambooth/train_dreambooth_lora.py #3346

Closed
@abhijitpal1247

Description

@abhijitpal1247

Describe the bug

In train_dreambooth_lora.py we still have accelerator.load_state(os.path.join(args.output_dir, path)) for resuming from a checkpoint, while we are not saving the state. This results in the following error: FileNotFoundError: [Errno 2] No such file or directory:
'/home/ec2-user/ssl/Jupyter/exp/models/lora/checkpoint-250/pytorch_model.bin'.

Reproduction

Resuming from a checkpoint from the train_dreambooth_lora.py would reproduce the error.
train_dreambooth_lora.py --resume_from_checkpoint path

Logs

error: FileNotFoundError: [Errno 2] No such file or directory: 
'/home/ec2-user/ssl/Jupyter/exp/models/lora/checkpoint-250/pytorch_model.bin'.

System Info

  • diffusers version: 0.16.1
  • Platform: Linux-4.14.301-224.520.amzn2.x86_64-x86_64-with-glibc2.26
  • Python version: 3.10.9
  • PyTorch version (GPU?): 1.13.1+cu117 (True)
  • Huggingface_hub version: 0.14.1
  • Transformers version: 4.27.3
  • Accelerate version: 0.18.0
  • xFormers version: 0.0.16
  • Using GPU in script?: yes
  • Using distributed or parallel set-up in script?: No

Metadata

Metadata

Labels

bugSomething isn't workingstaleIssues that haven't received updates

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions