Description
Describe the bug
There's a hardcode somewhere for 77 tokens, when it should be using the dimensions of what is actually in the model.
I have a diffusers-layout SD1.5 model, with LongCLIP.
https://huggingface.co/opendiffusionai/xllsd-alpha0
I can pull it locally, then convert to single file format, with
python convert_diffusers_to_original_stable_diffusion.py
--use_safetensors
--model_path $SRCM
--checkpoint_path $DESTM
But then if I try to convert it back, I get size errors for the text encoder not being 77 size.
I should point out that the model WORKS PROPERLY for diffusion, when loaded in diffusers format, so I dont have some funky broken model here.
Reproduction
from transformers import CLIPTextModel, CLIPTokenizer
from diffusers import StableDiffusionPipeline, AutoencoderKL
import torch
pipe = StableDiffusionPipeline.from_single_file(
"XLLsd-phase0.safetensors",
torch_dtype=torch.float32,
use_safetensors=True)
outname = "XLLsd_recreate"
pipe.save_pretrained(outname, safe_serialization=False)
Logs
venv/lib/python3.12/site-packages/diffusers/models/model_loading_utils.py", line 230, in load_model_dict_into_meta
raise ValueError(
ValueError: Cannot load because text_model.embeddings.position_embedding.weight expected shape torch.Size([77, 768]), but got torch.Size([248, 768]). If you want to instead overwrite randomly initialized weights, please make sure to pass both `low_cpu_mem_usage=False` and `ignore_mismatched_sizes=True`. For more information, see also: https://github.com/huggingface/diffusers/issues/1619#issuecomment-1345604389 as an example.
System Info
- 🤗 Diffusers version: 0.32.2
- Platform: Linux-6.8.0-55-generic-x86_64-with-glibc2.39
- Running on Google Colab?: No
- Python version: 3.12.3
- PyTorch version (GPU?): 2.6.0+cu124 (True)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Huggingface_hub version: 0.29.3
- Transformers version: 4.50.0
- Accelerate version: 1.5.2
- PEFT version: not installed
- Bitsandbytes version: 0.45.2
- Safetensors version: 0.5.3
- xFormers version: not installed
- Accelerator: NVIDIA GeForce RTX 4090, 24564 MiB
Who can help?
No response