Skip to content

Interpolate fix on cuda for large output tensors #10067

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Dec 2, 2024
Merged

Conversation

pcuenca
Copy link
Member

@pcuenca pcuenca commented Dec 1, 2024

Fixes #10040.

As described in pytorch/pytorch#141831, there is a silent bug in PyTorch (CUDA) that causes the result of the upsampling operation to be wrong when the output size is beyond a threshold.

Before submitting

cc @sayakpaul @asomoza as they are involved in #10040. In addition to that issue, I think the silent failure mode could come up in other cases as well and would be difficult to track.

@pcuenca pcuenca changed the title Interpolate fix Interpolate fix on cuda for large output tensors Dec 1, 2024
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Pedro. The fix indeed looks like 🧠

I think this should be tested as well. Could we add a test here?

scale_factor = (
2 if output_size is None else max([f / s for f, s in zip(output_size, hidden_states.shape[-2:])])
)
if hidden_states.numel() * scale_factor > pow(2, 31):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider keeping pow(2, 31) in a variable and then using it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel that pow(2, 31) is easy to understand and provides explicit documentation about what's happening. Using another level of indirection wouldn't help, unless we are concerned about performance, in which case I can use a constant.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Along the same lines as #10067 (comment).

@sayakpaul sayakpaul requested a review from yiyixuxu December 1, 2024 13:28
@@ -165,6 +165,14 @@ def forward(self, hidden_states: torch.Tensor, output_size: Optional[int] = None
# if `output_size` is passed we force the interpolation output
# size and do not make use of `scale_factor=2`
if self.interpolate:
# upsample_nearest_nhwc also fails when the number of output elements is large
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to check for channels_first or channels_last memory layout?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My assumption is it will behave the same way for any non-contiguous layouts, and that calling .contiguous() on a contiguous tensor will be a no-op. I'm also mimicking the same structure as in lines 161-163 above.

@pcuenca
Copy link
Member Author

pcuenca commented Dec 2, 2024

I think this should be tested as well. Could we add a test here?

Sure, I can add some tests. They will require GPU and a good amount of GPU RAM, but I think they should fit in the CI environment.

@sayakpaul
Copy link
Member

Sure, I can add some tests. They will require GPU and a good amount of GPU RAM, but I think they should fit in the CI environment.

Do you have an estimate on how much? 16GB shouldn't be enough?

@pcuenca
Copy link
Member Author

pcuenca commented Dec 2, 2024

Sure, I can add some tests. They will require GPU and a good amount of GPU RAM, but I think they should fit in the CI environment.

Do you have an estimate on how much? 16GB shouldn't be enough?

I think so, yes, I'll check.

Copy link
Collaborator

@yiyixuxu yiyixuxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the fix!
is this same bug as #984 (that's been there for 2 years?)

@yiyixuxu
Copy link
Collaborator

yiyixuxu commented Dec 2, 2024

also I'm cool with the code as it is!
for the test @sayakpaul do you want to help add one?

@yiyixuxu yiyixuxu merged commit 2312b27 into main Dec 2, 2024
18 checks passed
@yiyixuxu yiyixuxu deleted the interpolate-fix branch December 2, 2024 23:34
lawrence-cj pushed a commit to lawrence-cj/diffusers that referenced this pull request Dec 4, 2024
* Workaround for upscale with large output tensors.

Fixes huggingface#10040.

* Fix scale when output_size is given

* Style

---------

Co-authored-by: Sayak Paul <[email protected]>
@pcuenca
Copy link
Member Author

pcuenca commented Dec 5, 2024

Thanks a lot for merging, and my apologies for being slow to react!

is this same bug as #984 (that's been there for 2 years?)

Not the same, but related. That issue was fixed a while ago, we could check the minimum PyTorch version and only apply this similar workaround if necessary.

sayakpaul added a commit that referenced this pull request Dec 23, 2024
* Workaround for upscale with large output tensors.

Fixes #10040.

* Fix scale when output_size is given

* Style

---------

Co-authored-by: Sayak Paul <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG - STABLE DIFFUSION 3] Grey images generated
4 participants