Skip to content

speedup hunyuan encoder causal mask generation #10764

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Feb 11, 2025

Conversation

dabeschte
Copy link
Contributor

@dabeschte dabeschte commented Feb 11, 2025

What does this PR do?

The original causal attention mask generation for the hunyuan encoder is very slow, especially when the tensor is created on the GPU, because it needs to make 1000s of calls (for loop over sequence length).

I tried to compile it...which works and makes it fast too, but compilation unfortunately also takes a long time when using a long sequence length.

This implementation is ~20-70x faster depending on the sequence lengths and since it is re-created for every SDPA, this accumulates to multiple seconds per step for larger videos

This PR is a duplicate of the PR to the original HunyuanVideo repository Tencent-Hunyuan/HunyuanVideo#208

Before submitting

Who can review?

not sure tbh

Copy link
Member

@a-r-r-o-w a-r-r-o-w left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, thank you so much! It was on my mind to address this after the integration as it looked weird, but I forgot about it :/ This is super cool

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@a-r-r-o-w
Copy link
Member

@dabeschte Could you look into the failing test? We can merge after that :)

@dabeschte
Copy link
Contributor Author

thanks for the quick review.
yeah, it really does look weird to loop over the sequence lengths - and even more so when you profile model :D

I had a small bug in how I used the testing framework (looking at you, Cursor ;))
I didn't want to install all dependencies to get the tests to run - but this time I tested it in isolation to ensure at least this test is working correctly.

@a-r-r-o-w a-r-r-o-w merged commit 8ae8008 into huggingface:main Feb 11, 2025
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants