Closed
Description
Describe the bug
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument index in method wrapper_CUDA__index_select)
Reproduction
import json
import time
import torch
import gc
from diffusers.models import HunyuanVideoTransformer3DModel
from diffusers.utils import export_to_video
from diffusers import HunyuanVideoPipeline
from diffusers import BitsAndBytesConfig
from transformers import LlamaModel
from diffusers.hooks import apply_group_offloading
model_id = "hunyuanvideo-community/HunyuanVideo"
quantization_config = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.bfloat16)
text_encoder = LlamaModel.from_pretrained(
model_id,
subfolder="text_encoder",
quantization_config=quantization_config,
torch_dtype=torch.float16
)
transformer = HunyuanVideoTransformer3DModel.from_pretrained(
model_id,
subfolder="transformer",
quantization_config=quantization_config,
torch_dtype=torch.bfloat16
)
pipe = HunyuanVideoPipeline.from_pretrained(
model_id,
text_encoder=text_encoder,
transformer=transformer,
torch_dtype=torch.float16,
)
apply_group_offloading(
pipe.transformer,
offload_type="leaf_level",
offload_device=torch.device("cpu"),
onload_device=torch.device("cuda"),
)
apply_group_offloading(
pipe.text_encoder,
offload_device=torch.device("cpu"),
onload_device=torch.device("cuda"),
offload_type="leaf_level"
)
apply_group_offloading(
pipe.vae,
offload_device=torch.device("cpu"),
onload_device=torch.device("cuda"),
offload_type="leaf_level"
)
prompt="A cat wearing sunglasses and working as a lifeguard at pool."
generator = torch.Generator().manual_seed(181201)
output = pipe(
prompt,
width=512,
height=320,
num_frames=17,
num_inference_steps=30,
generator=generator
)[0]
export_to_video(output, "hunyuan_test.mp4", fps=8)
Log
(venv) C:\aiOWN\diffuser_webui>python hunyuan_test.py
`low_cpu_mem_usage` was None, now default to True since model is quantized.
Downloading shards: 100%|██████████████████████████████████████████| 4/4 [00:00<00:00, 356.54it/s]
Loading checkpoint shards: 100%|████████████████████████████████████| 4/4 [00:35<00:00, 8.81s/it]
Fetching 6 files: 100%|█████████████████████████████████████████████████████| 6/6 [00:00<?, ?it/s]
Loading pipeline components...: 100%|███████████████████████████████| 7/7 [00:04<00:00, 1.70it/s]
Traceback (most recent call last):
File "C:\aiOWN\diffuser_webui\hunyuan_test.py", line 62, in <module>
output = pipe(
File "C:\aiOWN\diffuser_webui\venv\lib\site-packages\torch\utils\_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "C:\aiOWN\diffuser_webui\venv\lib\site-packages\diffusers\pipelines\hunyuan_video\pipeline_hunyuan_video.py", line 598, in __call__
prompt_embeds, pooled_prompt_embeds, prompt_attention_mask = self.encode_prompt(
File "C:\aiOWN\diffuser_webui\venv\lib\site-packages\diffusers\pipelines\hunyuan_video\pipeline_hunyuan_video.py", line 330, in encode_prompt
pooled_prompt_embeds = self._get_clip_prompt_embeds(
File "C:\aiOWN\diffuser_webui\venv\lib\site-packages\diffusers\pipelines\hunyuan_video\pipeline_hunyuan_video.py", line 296, in _get_clip_prompt_embeds
prompt_embeds = self.text_encoder_2(text_input_ids.to(device), output_hidden_states=False).pooler_output
File "C:\aiOWN\diffuser_webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\aiOWN\diffuser_webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "C:\aiOWN\diffuser_webui\venv\lib\site-packages\transformers\models\clip\modeling_clip.py", line 1049, in forward
return self.text_model(
File "C:\aiOWN\diffuser_webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\aiOWN\diffuser_webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "C:\aiOWN\diffuser_webui\venv\lib\site-packages\transformers\models\clip\modeling_clip.py", line 940, in forward
hidden_states = self.embeddings(input_ids=input_ids, position_ids=position_ids)
File "C:\aiOWN\diffuser_webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\aiOWN\diffuser_webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "C:\aiOWN\diffuser_webui\venv\lib\site-packages\transformers\models\clip\modeling_clip.py", line 285, in forward
inputs_embeds = self.token_embedding(input_ids)
File "C:\aiOWN\diffuser_webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\aiOWN\diffuser_webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "C:\aiOWN\diffuser_webui\venv\lib\site-packages\torch\nn\modules\sparse.py", line 190, in forward
return F.embedding(
File "C:\aiOWN\diffuser_webui\venv\lib\site-packages\torch\nn\functional.py", line 2551, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument index in method wrapper_CUDA__index_select)
System Info
- 🤗 Diffusers version: 0.33.0.dev0
- Platform: Windows-10-10.0.26100-SP0
- Running on Google Colab?: No
- Python version: 3.10.11
- PyTorch version (GPU?): 2.5.1+cu124 (True)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Huggingface_hub version: 0.27.1
- Transformers version: 4.48.1
- Accelerate version: 1.4.0.dev0
- PEFT version: 0.14.0
- Bitsandbytes version: 0.45.1
- Safetensors version: 0.5.2
- xFormers version: not installed
- Accelerator: NVIDIA GeForce RTX 4060 Laptop GPU, 8188 MiB
- Using GPU in script?: <fill in>
- Using distributed or parallel set-up in script?: <fill in>
C:\Users\nitin>echo %CUDA_PATH%
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6