Open
Description
Discussed in #10036
Originally posted by ic-synth November 27, 2024
Describe the bug
The memory consumption for CogVideoX decoder in diffusers 0.31.0 version consumes significantly more memory. To the point where the model goes OOM even on 80G H100 GPUs with a relatively modest frame count.
I include two profiles for very small input tensors of only 5 frames where its visible how much larger the VAE memory consumption is.
Memory footprints for different input sizes are shown below. As you can see, with latest version memory keeps growing with frame count.
Reproduction
Run CogVideoXDecoder3D
model with diffusers 0.30.3 and 0.31.0 on the inputs of the same shape and measure the memory consumption as the frame count increases.
# Code requires a GPU with 50+ gigabytes on RAM
import torch
import diffusers
from diffusers import AutoencoderKLCogVideoX
with torch.no_grad():
vae = AutoencoderKLCogVideoX().to(dtype=torch.bfloat16).eval()
vae.decoder = vae.decoder.to(device="cuda:0")
input_tensor = torch.randn(1,16,5,96,170).to(device="cuda:0", dtype=torch.bfloat16)
print("Decoding ... Input size:", input_tensor.shape, "Diffuser version", diffusers.__version__)
vae.decode(input_tensor)
print(torch.cuda.max_memory_allocated() / (1024 ** 3) , torch.cuda.max_memory_reserved() / (1024 ** 3))
Logs
No response
System Info
Python 3.11.
Diffusers 0.30.3 vs 0.31.0
Who can help?
Metadata
Metadata
Type
Projects
Status
Todo