-
Notifications
You must be signed in to change notification settings - Fork 6k
Interpolate fix on cuda for large output tensors #10067
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, Pedro. The fix indeed looks like 🧠
I think this should be tested as well. Could we add a test here?
scale_factor = ( | ||
2 if output_size is None else max([f / s for f, s in zip(output_size, hidden_states.shape[-2:])]) | ||
) | ||
if hidden_states.numel() * scale_factor > pow(2, 31): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider keeping pow(2, 31)
in a variable and then using it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel that pow(2, 31)
is easy to understand and provides explicit documentation about what's happening. Using another level of indirection wouldn't help, unless we are concerned about performance, in which case I can use a constant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Along the same lines as #10067 (comment).
@@ -165,6 +165,14 @@ def forward(self, hidden_states: torch.Tensor, output_size: Optional[int] = None | |||
# if `output_size` is passed we force the interpolation output | |||
# size and do not make use of `scale_factor=2` | |||
if self.interpolate: | |||
# upsample_nearest_nhwc also fails when the number of output elements is large |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to check for channels_first
or channels_last
memory layout?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My assumption is it will behave the same way for any non-contiguous layouts, and that calling .contiguous()
on a contiguous tensor will be a no-op. I'm also mimicking the same structure as in lines 161-163 above.
Sure, I can add some tests. They will require GPU and a good amount of GPU RAM, but I think they should fit in the CI environment. |
Do you have an estimate on how much? 16GB shouldn't be enough? |
I think so, yes, I'll check. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the fix!
is this same bug as #984 (that's been there for 2 years?)
also I'm cool with the code as it is! |
* Workaround for upscale with large output tensors. Fixes huggingface#10040. * Fix scale when output_size is given * Style --------- Co-authored-by: Sayak Paul <[email protected]>
Thanks a lot for merging, and my apologies for being slow to react!
Not the same, but related. That issue was fixed a while ago, we could check the minimum PyTorch version and only apply this similar workaround if necessary. |
* Workaround for upscale with large output tensors. Fixes #10040. * Fix scale when output_size is given * Style --------- Co-authored-by: Sayak Paul <[email protected]>
Fixes #10040.
As described in pytorch/pytorch#141831, there is a silent bug in PyTorch (CUDA) that causes the result of the upsampling operation to be wrong when the output size is beyond a threshold.
Before submitting
documentation guidelines, and
here are tips on formatting docstrings.
cc @sayakpaul @asomoza as they are involved in #10040. In addition to that issue, I think the silent failure mode could come up in other cases as well and would be difficult to track.