Closed
Description
Describe the bug
I'm encountering the same error as described in the closed issue #4478.
I'm currently running the train_text_to_image_lora_sdxl.py script, and the VAE give me the following error:
RuntimeError: Input type (c10::Half) and bias type (float) should be the same
See "Reproduction", "Logs", and "System Info" for all the details.
Any idea why? Do you need more details or do you want I run other experiments?
Thanks!
Reproduction
export MODEL_NAME="stabilityai/stable-diffusion-xl-base-1.0"
export DATASET_NAME="lambdalabs/pokemon-blip-captions"
accelerate launch train_text_to_image_lora_sdxl.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--dataset_name=$DATASET_NAME \
--caption_column="text" \
--resolution=1024 \
--random_flip \
--train_batch_size=1 \
--num_train_epochs=2 \
--gradient_accumulation_steps=1 \
--checkpointing_steps=500 \
--learning_rate=1e-04 \
--lr_scheduler="constant" \
--lr_warmup_steps=0 \
--dataloader_num_workers=0 \
--report_to="wandb" \
--seed=42 \
--output_dir="sd-pokemon-model-lora-sdxl-txt" \
--train_text_encoder \
--validation_prompt="cute dragon creature" \
--mixed_precision="fp16" \
--rank=4
Logs
08/15/2023 18:41:26 - INFO - __main__ - Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda
Mixed precision type: fp16
You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
{'dynamic_thresholding_ratio', 'clip_sample_range', 'thresholding', 'variance_type'} was not found in config. Values will be initialized to default values.
wandb: Currently logged in as: mnslarcher. Use `wandb login --relogin` to force relogin
wandb: Tracking run with wandb version 0.15.8
wandb: Run data is saved locally in /home/mnslarcher/ai/sd-xl-hands/wandb/run-20230815_184142-flioaupp
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run wobbly-resonance-5
wandb: ⭐️ View project at https://wandb.ai/mnslarcher/text2image-fine-tune
wandb: 🚀 View run at https://wandb.ai/mnslarcher/text2image-fine-tune/runs/flioaupp
08/15/2023 18:41:46 - INFO - __main__ - ***** Running training *****
08/15/2023 18:41:46 - INFO - __main__ - Num examples = 833
08/15/2023 18:41:46 - INFO - __main__ - Num Epochs = 2
08/15/2023 18:41:46 - INFO - __main__ - Instantaneous batch size per device = 1
08/15/2023 18:41:46 - INFO - __main__ - Total train batch size (w. parallel, distributed & accumulation) = 1
08/15/2023 18:41:46 - INFO - __main__ - Gradient Accumulation steps = 1
08/15/2023 18:41:46 - INFO - __main__ - Total optimization steps = 1666
Steps: 0%| | 0/1666 [00:00<?, ?it/s]Traceback (most recent call last):
File "/home/mnslarcher/ai/sd-xl-hands/train_text_to_image_lora_sdxl.py", line 1281, in <module>
main(args)
File "/home/mnslarcher/ai/sd-xl-hands/train_text_to_image_lora_sdxl.py", line 1008, in main
model_input = vae.encode(pixel_values).latent_dist.sample()
File "/home/mnslarcher/anaconda3/envs/hands/lib/python3.10/site-packages/diffusers/utils/accelerate_utils.py", line 46, in wrapper
return method(self, *args, **kwargs)
File "/home/mnslarcher/anaconda3/envs/hands/lib/python3.10/site-packages/diffusers/models/autoencoder_kl.py", line 242, in encode
h = self.encoder(x)
File "/home/mnslarcher/anaconda3/envs/hands/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/mnslarcher/anaconda3/envs/hands/lib/python3.10/site-packages/diffusers/models/vae.py", line 110, in forward
sample = self.conv_in(sample)
File "/home/mnslarcher/anaconda3/envs/hands/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/mnslarcher/anaconda3/envs/hands/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 463, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/home/mnslarcher/anaconda3/envs/hands/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 459, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Input type (c10::Half) and bias type (float) should be the same
wandb: Waiting for W&B process to finish... (failed 1). Press Control-C to abort syncing.
wandb: 🚀 View run wobbly-resonance-5 at: https://wandb.ai/mnslarcher/text2image-fine-tune/runs/flioaupp
wandb: Synced 6 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)
wandb: Find logs at: ./wandb/run-20230815_184142-flioaupp/logs
Traceback (most recent call last):
File "/home/mnslarcher/anaconda3/envs/hands/bin/accelerate", line 8, in <module>
sys.exit(main())
File "/home/mnslarcher/anaconda3/envs/hands/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 45, in main
args.func(args)
File "/home/mnslarcher/anaconda3/envs/hands/lib/python3.10/site-packages/accelerate/commands/launch.py", line 979, in launch_command
simple_launcher(args)
File "/home/mnslarcher/anaconda3/envs/hands/lib/python3.10/site-packages/accelerate/commands/launch.py", line 628, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/home/mnslarcher/anaconda3/envs/hands/bin/python', 'train_text_to_image_lora_sdxl.py', '--pretrained_model_name_or_path=stabilityai/stable-diffusion-xl-base-1.0', '--dataset_name=lambdalabs/pokemon-blip-captions', '--caption_column=text', '--resolution=1024', '--random_flip', '--train_batch_size=1', '--num_train_epochs=2', '--gradient_accumulation_steps=1', '--checkpointing_steps=500', '--learning_rate=1e-04', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--dataloader_num_workers=0', '--report_to=wandb', '--seed=42', '--output_dir=sd-pokemon-model-lora-sdxl-txt', '--train_text_encoder', '--validation_prompt=cute dragon creature', '--mixed_precision=fp16', '--rank=4']' returned non-zero exit status 1.
System Info
OS Name: Ubuntu 22.04.3 LTS
GPU: NVIDIA GeForce RTX 4090
diffusers-cli env:
- `diffusers` version: 0.19.3
- Platform: Linux-6.2.0-26-generic-x86_64-with-glibc2.35
- Python version: 3.10.12
- PyTorch version (GPU?): 2.0.1+cu117 (True)
- Huggingface_hub version: 0.16.4
- Transformers version: 4.31.0
- Accelerate version: 0.21.0
- xFormers version: not installed
- Using GPU in script?: YES
- Using distributed or parallel set-up in script?: NO
enviroment.yml (conda):
name: myenv
channels:
- defaults
dependencies:
- nb_conda_kernels
- ipykernel
- jupyter
- pip
- python=3.10
- pip:
- accelerate==0.21.0
- datasets==2.14.4
- diffusers==0.19.3
- ftfy==6.1.1
- Jinja2==3.1.2
- tensorboard==2.14.0
- torch==2.0.1
- torchvision==0.15.2
- transformers==4.31.0
- wandb==0.15.8
default_config.yaml:
compute_environment: LOCAL_MACHINE
distributed_type: 'NO'
downcast_bf16: 'no'
gpu_ids: all
machine_rank: 0
main_training_function: main
mixed_precision: fp16
num_machines: 1
num_processes: 1
rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
use_cpu: false