Skip to content

RuntimeError: Input type (c10::Half) and bias type (float) mismatch in training_text_to_image_lora_sdxl.py #4619

Closed
@mnslarcher

Description

@mnslarcher

Describe the bug

I'm encountering the same error as described in the closed issue #4478.

I'm currently running the train_text_to_image_lora_sdxl.py script, and the VAE give me the following error:

RuntimeError: Input type (c10::Half) and bias type (float) should be the same

See "Reproduction", "Logs", and "System Info" for all the details.

Any idea why? Do you need more details or do you want I run other experiments?

Thanks!

Reproduction

export MODEL_NAME="stabilityai/stable-diffusion-xl-base-1.0"
export DATASET_NAME="lambdalabs/pokemon-blip-captions"

accelerate launch train_text_to_image_lora_sdxl.py \
  --pretrained_model_name_or_path=$MODEL_NAME \
  --dataset_name=$DATASET_NAME \
  --caption_column="text" \
  --resolution=1024 \
  --random_flip \
  --train_batch_size=1 \
  --num_train_epochs=2 \
  --gradient_accumulation_steps=1 \
  --checkpointing_steps=500 \
  --learning_rate=1e-04 \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --dataloader_num_workers=0 \
  --report_to="wandb" \
  --seed=42 \
  --output_dir="sd-pokemon-model-lora-sdxl-txt" \
  --train_text_encoder \
  --validation_prompt="cute dragon creature" \
  --mixed_precision="fp16" \
  --rank=4

Logs

08/15/2023 18:41:26 - INFO - __main__ - Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda

Mixed precision type: fp16

You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
{'dynamic_thresholding_ratio', 'clip_sample_range', 'thresholding', 'variance_type'} was not found in config. Values will be initialized to default values.
wandb: Currently logged in as: mnslarcher. Use `wandb login --relogin` to force relogin
wandb: Tracking run with wandb version 0.15.8
wandb: Run data is saved locally in /home/mnslarcher/ai/sd-xl-hands/wandb/run-20230815_184142-flioaupp
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run wobbly-resonance-5
wandb: ⭐️ View project at https://wandb.ai/mnslarcher/text2image-fine-tune
wandb: 🚀 View run at https://wandb.ai/mnslarcher/text2image-fine-tune/runs/flioaupp
08/15/2023 18:41:46 - INFO - __main__ - ***** Running training *****
08/15/2023 18:41:46 - INFO - __main__ -   Num examples = 833
08/15/2023 18:41:46 - INFO - __main__ -   Num Epochs = 2
08/15/2023 18:41:46 - INFO - __main__ -   Instantaneous batch size per device = 1
08/15/2023 18:41:46 - INFO - __main__ -   Total train batch size (w. parallel, distributed & accumulation) = 1
08/15/2023 18:41:46 - INFO - __main__ -   Gradient Accumulation steps = 1
08/15/2023 18:41:46 - INFO - __main__ -   Total optimization steps = 1666
Steps:   0%|                                                                                                                                                                              | 0/1666 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/home/mnslarcher/ai/sd-xl-hands/train_text_to_image_lora_sdxl.py", line 1281, in <module>
    main(args)
  File "/home/mnslarcher/ai/sd-xl-hands/train_text_to_image_lora_sdxl.py", line 1008, in main
    model_input = vae.encode(pixel_values).latent_dist.sample()
  File "/home/mnslarcher/anaconda3/envs/hands/lib/python3.10/site-packages/diffusers/utils/accelerate_utils.py", line 46, in wrapper
    return method(self, *args, **kwargs)
  File "/home/mnslarcher/anaconda3/envs/hands/lib/python3.10/site-packages/diffusers/models/autoencoder_kl.py", line 242, in encode
    h = self.encoder(x)
  File "/home/mnslarcher/anaconda3/envs/hands/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/mnslarcher/anaconda3/envs/hands/lib/python3.10/site-packages/diffusers/models/vae.py", line 110, in forward
    sample = self.conv_in(sample)
  File "/home/mnslarcher/anaconda3/envs/hands/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/mnslarcher/anaconda3/envs/hands/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 463, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/home/mnslarcher/anaconda3/envs/hands/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 459, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Input type (c10::Half) and bias type (float) should be the same
wandb: Waiting for W&B process to finish... (failed 1). Press Control-C to abort syncing.
wandb: 🚀 View run wobbly-resonance-5 at: https://wandb.ai/mnslarcher/text2image-fine-tune/runs/flioaupp
wandb: Synced 6 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)
wandb: Find logs at: ./wandb/run-20230815_184142-flioaupp/logs
Traceback (most recent call last):
  File "/home/mnslarcher/anaconda3/envs/hands/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/home/mnslarcher/anaconda3/envs/hands/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 45, in main
    args.func(args)
  File "/home/mnslarcher/anaconda3/envs/hands/lib/python3.10/site-packages/accelerate/commands/launch.py", line 979, in launch_command
    simple_launcher(args)
  File "/home/mnslarcher/anaconda3/envs/hands/lib/python3.10/site-packages/accelerate/commands/launch.py", line 628, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/home/mnslarcher/anaconda3/envs/hands/bin/python', 'train_text_to_image_lora_sdxl.py', '--pretrained_model_name_or_path=stabilityai/stable-diffusion-xl-base-1.0', '--dataset_name=lambdalabs/pokemon-blip-captions', '--caption_column=text', '--resolution=1024', '--random_flip', '--train_batch_size=1', '--num_train_epochs=2', '--gradient_accumulation_steps=1', '--checkpointing_steps=500', '--learning_rate=1e-04', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--dataloader_num_workers=0', '--report_to=wandb', '--seed=42', '--output_dir=sd-pokemon-model-lora-sdxl-txt', '--train_text_encoder', '--validation_prompt=cute dragon creature', '--mixed_precision=fp16', '--rank=4']' returned non-zero exit status 1.

System Info

OS Name: Ubuntu 22.04.3 LTS
GPU: NVIDIA GeForce RTX 4090

diffusers-cli env:

- `diffusers` version: 0.19.3
- Platform: Linux-6.2.0-26-generic-x86_64-with-glibc2.35
- Python version: 3.10.12
- PyTorch version (GPU?): 2.0.1+cu117 (True)
- Huggingface_hub version: 0.16.4
- Transformers version: 4.31.0
- Accelerate version: 0.21.0
- xFormers version: not installed
- Using GPU in script?: YES
- Using distributed or parallel set-up in script?: NO

enviroment.yml (conda):

name: myenv
channels:
  - defaults
dependencies:
  - nb_conda_kernels
  - ipykernel
  - jupyter
  - pip
  - python=3.10
  - pip:
    - accelerate==0.21.0
    - datasets==2.14.4
    - diffusers==0.19.3
    - ftfy==6.1.1
    - Jinja2==3.1.2
    - tensorboard==2.14.0
    - torch==2.0.1
    - torchvision==0.15.2
    - transformers==4.31.0
    - wandb==0.15.8

default_config.yaml:

compute_environment: LOCAL_MACHINE
distributed_type: 'NO'
downcast_bf16: 'no'
gpu_ids: all
machine_rank: 0
main_training_function: main
mixed_precision: fp16
num_machines: 1
num_processes: 1
rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
use_cpu: false

Who can help?

@sayak

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions