-
Notifications
You must be signed in to change notification settings - Fork 17
Transform: support sdpa to flash attention kernel conversion #131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
206fead
to
65dfab8
Compare
0049c9c
to
c3fcf97
Compare
79b277d
to
d69856f
Compare
738986c
to
97f85b4
Compare
d69856f
to
823be69
Compare
823be69
to
02f519b
Compare
8716a96
to
0457df5
Compare
9ed0b84
to
e13ec10
Compare
As for performance evaluation, there are two issues
Next steps for performance alignment is
|
Please try brgemm instead of matmul, which can provide better performance result. |
I dumped the final llvm IR, and verified that the current performance is collected with brgemm invoked. Previously when brgemm was not in effect, the performance is 10x worse. I think I need to do more detailed analysis to find where the performance gap exists exactly. |
f959a73
to
ed5180d
Compare
974b8ca
to
fd013ca
Compare
b3bf8dc
to
23dfa97
Compare
52164c4
to
f741fbd
Compare
Latest performance: <style> </style>
|
Current observed gap from v1 are the following:
|
Tracking issue #147.
TODO: