-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[NVPTX] Improve folding to mad with immediate 1 #93628
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mul(m, n+1) -> mad(m,n,m)
makes sense.
mul(m, select(1, n)) -> select(mul(m,n), m)
-- not so much. Perhaps I'm missing something.
@@ -5614,17 +5614,98 @@ static SDValue TryMULWIDECombine(SDNode *N, | |||
return DCI.DAG.getNode(Opc, DL, MulType, TruncLHS, TruncRHS); | |||
} | |||
|
|||
static SDValue matchMADConstOnePattern(SDValue X, SDValue Add) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
X
is unused.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
return SDValue(); | ||
|
||
SDValue Y = Add->getOperand(0); | ||
ConstantSDNode *Const = dyn_cast<ConstantSDNode>(Add->getOperand(1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we guaranteed to have const operand to be last? I think we normalize them, but I'm not 100% sure it's always the case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I've added the other case as well just in case.
|
||
SDValue Y = Add->getOperand(0); | ||
ConstantSDNode *Const = dyn_cast<ConstantSDNode>(Add->getOperand(1)); | ||
if (!Const || Const->getZExtValue() != 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit. Phrasing the condition in positive terms would be more readable, IMO.
if (Const && Const->getZExtValue() == 1) return Y;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
@@ -0,0 +1,101 @@ | |||
; RUN: llc < %s -march=nvptx -mcpu=sm_20 -O1 | FileCheck %s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another test which could use autogenerated CHECK patterns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
ret i32 %mul | ||
} | ||
|
||
; Transpose (mul (select)) if it can then be folded to mad |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it buy us anything?
mul(m,select(1,n))
will probably have the same performance as select(mul(m,n), m)
as the critical path will always have mul
and select
, just in different order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By itself this transform doesn't help much, I agree. However, if m
or n
are add(x,1)
then it enables the other transformation. In the code we're checking for this case and only running the transformation when it would enable further folding. A rare case to be sure, but better to support it than not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This kind of optimization is not target-specific and should probably be done somewhere in instcombine. Perhaps move the optimization of mul(m,select(1,n))
there as a separate patch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instcombine already canonicalizes in the opposite direction, select(mul(m,n), m) -> mul(m,select(1,n))
. I think this is target specific because it is only worth doing to improve mad
folding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK.
ret i32 %mul | ||
} | ||
|
||
; Transpose (mul (select)) if it can then be folded to mad |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK.
|
||
unsigned ConstOpNo = 1; | ||
auto *Const = dyn_cast<ConstantSDNode>(Select->getOperand(ConstOpNo)); | ||
if (!Const || Const->getZExtValue() != 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like we could extract the common pattern into a helper function:
bool isConstOne(Operand) {
const auto *Const = dyn_cast<ConstantSDNode>(Operand);
return Const && Const->getZExtValue() == 1;
}
and then use it in handful of instances of this pattern throughout the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5 | ||
; RUN: llc < %s -mtriple=nvptx -mcpu=sm_20 -O1 | FileCheck %s | ||
; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_20 -O1 | FileCheck %s | ||
; RUN: %if ptxas %{ llc < %s -mtriple=nvptx -mcpu=sm_20 -O1 | %ptxas-verify %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be disabled with newer ptxas.
RUN: %if ptxas && !ptxas-12.0 %{ llc < %s -march=nvptx -mcpu=sm_20 | %ptxas-verify %}
Extend NVPTX DAG combining logic to distribute a mul instruction across an add of 1 into a mad where possible. In addition, add support for transposing a mul through a select with an option of 1, if that would allow further mul folding.