Skip to content

CogvideoX VAE decoder consumes significantly more memory in the latest version #10091

Open
@sayakpaul

Description

@sayakpaul

Discussed in #10036

Originally posted by ic-synth November 27, 2024

Describe the bug

The memory consumption for CogVideoX decoder in diffusers 0.31.0 version consumes significantly more memory. To the point where the model goes OOM even on 80G H100 GPUs with a relatively modest frame count.
I include two profiles for very small input tensors of only 5 frames where its visible how much larger the VAE memory consumption is.

Memory footprints for different input sizes are shown below. As you can see, with latest version memory keeps growing with frame count.
diffusers version
diffusers version 3

Reproduction

Run CogVideoXDecoder3D model with diffusers 0.30.3 and 0.31.0 on the inputs of the same shape and measure the memory consumption as the frame count increases.

# Code requires a GPU with 50+ gigabytes on RAM 
import torch
import diffusers
from diffusers import AutoencoderKLCogVideoX

with torch.no_grad():
    vae = AutoencoderKLCogVideoX().to(dtype=torch.bfloat16).eval()
    vae.decoder = vae.decoder.to(device="cuda:0")
    input_tensor = torch.randn(1,16,5,96,170).to(device="cuda:0", dtype=torch.bfloat16)
    print("Decoding ... Input size:", input_tensor.shape, "Diffuser version", diffusers.__version__)
    vae.decode(input_tensor)
    print(torch.cuda.max_memory_allocated() / (1024 ** 3) , torch.cuda.max_memory_reserved() / (1024 ** 3))

Logs

No response

System Info

Python 3.11.
Diffusers 0.30.3 vs 0.31.0

Who can help?

@sayakpaul @DN6 @yiyixuxu

Metadata

Metadata

Labels

bugSomething isn't workingroadmapAdd to current release roadmapwip

Type

No type

Projects

Status

Todo

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions