speedup hunyuan encoder causal mask generation #10764

dabeschte · 2025-02-11T09:28:49Z

What does this PR do?

The original causal attention mask generation for the hunyuan encoder is very slow, especially when the tensor is created on the GPU, because it needs to make 1000s of calls (for loop over sequence length).

I tried to compile it...which works and makes it fast too, but compilation unfortunately also takes a long time when using a long sequence length.

This implementation is ~20-70x faster depending on the sequence lengths and since it is re-created for every SDPA, this accumulates to multiple seconds per step for larger videos

This PR is a duplicate of the PR to the original HunyuanVideo repository Tencent-Hunyuan/HunyuanVideo#208

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
[ X] Did you write any new necessary tests?

Who can review?

not sure tbh

a-r-r-o-w

Hey, thank you so much! It was on my mind to address this after the integration as it looked weird, but I forgot about it :/ This is super cool

HuggingFaceDocBuilderDev · 2025-02-11T09:59:35Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

a-r-r-o-w · 2025-02-11T10:03:22Z

@dabeschte Could you look into the failing test? We can merge after that :)

dabeschte · 2025-02-11T10:25:57Z

thanks for the quick review.
yeah, it really does look weird to loop over the sequence lengths - and even more so when you profile model :D

I had a small bug in how I used the testing framework (looking at you, Cursor ;))
I didn't want to install all dependencies to get the tests to run - but this time I tested it in isolation to ensure at least this test is working correctly.

speedup causal mask generation

9ba4496

a-r-r-o-w approved these changes Feb 11, 2025

View reviewed changes

fixing hunyuan attn mask test case

22c8673

a-r-r-o-w merged commit 8ae8008 into huggingface:main Feb 11, 2025
12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

speedup hunyuan encoder causal mask generation #10764

speedup hunyuan encoder causal mask generation #10764

Uh oh!

dabeschte commented Feb 11, 2025 •

edited

Loading

Uh oh!

a-r-r-o-w left a comment

Uh oh!

HuggingFaceDocBuilderDev commented Feb 11, 2025

Uh oh!

a-r-r-o-w commented Feb 11, 2025

Uh oh!

dabeschte commented Feb 11, 2025

Uh oh!

Uh oh!

Uh oh!

speedup hunyuan encoder causal mask generation #10764

speedup hunyuan encoder causal mask generation #10764

Uh oh!

Conversation

dabeschte commented Feb 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

a-r-r-o-w left a comment

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Feb 11, 2025

Uh oh!

a-r-r-o-w commented Feb 11, 2025

Uh oh!

dabeschte commented Feb 11, 2025

Uh oh!

Uh oh!

Uh oh!

dabeschte commented Feb 11, 2025 •

edited

Loading