[Text-to-video] Add `torch.compile()` compatibility #3949

sayakpaul · 2023-07-05T04:22:14Z

What does this PR do?

Description

torch.compile() for the repeat_interleave() function was added in a nightly build. See: pytorch/pytorch#99929.

So, once I upgraded to Torch 2.1 nightly, the issue went away. However, there were other issues which are fixed in this PR. The PR takes inspiration from #3313.

Even though we're able to successfully compile the model, it takes a hefty amount of time after torch.compile() is called on the UNet:

import torch
from diffusers import DiffusionPipeline
from diffusers.utils import export_to_video
from PIL import Image


pipe = DiffusionPipeline.from_pretrained("cerspense/zeroscope_v2_576w", torch_dtype=torch.float16)
pipe.to("cuda")
pipe.enable_vae_slicing()

pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True)

prompt = "Darth Vader is surfing on waves"
video_frames = pipe(prompt, num_inference_steps=40, height=320, width=576, num_frames=36).frames
video_path = export_to_video(video_frames, output_video_path="video_576_darth_vader_36.mp4")

The first call pipe is really time-consuming which is understandable because that is when the compiled UNet model is also used for the first time. But even in the subsequent calls, the timing doesn't seem to improve much. In my experiments, I actually found the runtime to be performing much better without torch.compile().

Let me know if anything is unclear.

I leaving the outputs of the progress bars:

With torch.compile()

Without torch.compile()

My explorations can be found in this Colab Notebook.

HuggingFaceDocBuilderDev · 2023-07-05T04:28:35Z

The documentation is not available anymore as the PR was closed or merged.

patrickvonplaten · 2023-07-05T09:43:23Z

Great job! Looks clean - also ok for me to not add a test at the moment, given the problems we have with memory leaks. Good to merge for me

sayakpaul · 2023-07-05T09:47:02Z

Do we have a handle on why these leaks happen? Why does torch.compile() perform so poorly when we 3D inputs like videos?

* use sample directly instead of the dataclass. * more usage of directly samples instead of dataclasses * more usage of directly samples instead of dataclasses * use direct sample in the pipeline. * direct usage of sample in the img2img case.

sayakpaul added 5 commits July 5, 2023 08:55

use sample directly instead of the dataclass.

25cfcb1

more usage of directly samples instead of dataclasses

2cd1460

more usage of directly samples instead of dataclasses

a63443b

use direct sample in the pipeline.

5958b7f

direct usage of sample in the img2img case.

dd5d0e4

sayakpaul requested a review from patrickvonplaten July 5, 2023 04:22

sayakpaul marked this pull request as ready for review July 5, 2023 04:22

sayakpaul mentioned this pull request Jul 5, 2023

torch.compile doesn't seem to be working for text-to-video pipelines #3915

Closed

patrickvonplaten approved these changes Jul 5, 2023

View reviewed changes

sayakpaul merged commit b62d9a1 into main Jul 6, 2023

sayakpaul deleted the fix/unet3d-compile branch July 6, 2023 09:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Text-to-video] Add `torch.compile()` compatibility #3949

[Text-to-video] Add `torch.compile()` compatibility #3949

sayakpaul commented Jul 5, 2023

HuggingFaceDocBuilderDev commented Jul 5, 2023 •

edited

Loading

patrickvonplaten commented Jul 5, 2023

sayakpaul commented Jul 5, 2023

[Text-to-video] Add torch.compile() compatibility #3949

[Text-to-video] Add torch.compile() compatibility #3949

Conversation

sayakpaul commented Jul 5, 2023

What does this PR do?

Description

HuggingFaceDocBuilderDev commented Jul 5, 2023 • edited Loading

patrickvonplaten commented Jul 5, 2023

sayakpaul commented Jul 5, 2023

[Text-to-video] Add `torch.compile()` compatibility #3949

[Text-to-video] Add `torch.compile()` compatibility #3949

HuggingFaceDocBuilderDev commented Jul 5, 2023 •

edited

Loading