Skip to content

train_dreambooth_lora_flux.py distributed bugs #9161

Open
@neuron-party

Description

@neuron-party

Describe the bug

AttributeError when running model parallel distributed training with accelerate

Reproduction

accelerate launch --config_file train_dreambooth_lora_flux.py
--resolution=1024
--mixed_precision=bf16
--pretrained_model_name_or_path=black-forest-labels/FLUX.1-dev
--num_validation_images=8
--validation_epochs=100
--rank=16
--train_batch_size=1
--learning_rate=1e-4
--guidance_scale=3.5
--checkpointing_steps=200
--instance_prompt=xyz
--instance_data_dir=xyz
--output_dir=xyz
--logging_dir=xyz
--validation_prompt=xyz

accelerate config:

compute_environment: LOCAL_MACHINE
deepspeed_config: {}
distributed_type: MULTI_GPU
fsdp_config: {}
machine_rank: 0
main_process_ip: null
main_process_port: null
main_training_function: main
mixed_precision: bf16
num_machines: 1
num_processes: 2
use_cpu: false
gpu_ids: '0, 1'
rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false

Logs

if transformer.config.guidance_embeds:

AttributeError: DistributedDataParallel object has no attribute config

System Info

diffusers from source
accelerate==0.33.0
transformers==4.44.1

training on A100s

Who can help?

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingstaleIssues that haven't received updates

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions