Open
Description
Describe the bug
In the attention implementation of SD3, attention masks currently are not used. This will result in inconsistent outputs for the different values max_seq_length
where padding exists in text tokens as the attention scores of padding tokens are non-zero. This issue has been discussed in #8628, and is created to track the progress of fixing this problem.
Thanks @sayakpaul for the discussion.
Reproduction
n/a
Logs
No response
System Info
n/a
Who can help?
No response