[NVPTX] Improve folding to mad with immediate 1 #93628

AlexMaclean · 2024-05-29T01:01:26Z

Extend NVPTX DAG combining logic to distribute a mul instruction across an add of 1 into a mad where possible. In addition, add support for transposing a mul through a select with an option of 1, if that would allow further mul folding.

Artem-B

mul(m, n+1) -> mad(m,n,m) makes sense.

mul(m, select(1, n)) -> select(mul(m,n), m) -- not so much. Perhaps I'm missing something.

Artem-B · 2024-05-29T01:38:42Z

llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp

@@ -5614,17 +5614,98 @@ static SDValue TryMULWIDECombine(SDNode *N,
  return DCI.DAG.getNode(Opc, DL, MulType, TruncLHS, TruncRHS);
 }

+static SDValue matchMADConstOnePattern(SDValue X, SDValue Add) {


X is unused.

Artem-B · 2024-05-29T01:40:04Z

llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp

+    return SDValue();
+
+  SDValue Y = Add->getOperand(0);
+  ConstantSDNode *Const = dyn_cast<ConstantSDNode>(Add->getOperand(1));


Are we guaranteed to have const operand to be last? I think we normalize them, but I'm not 100% sure it's always the case.

Good point, I've added the other case as well just in case.

Artem-B · 2024-05-29T01:44:13Z

llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp

+
+  SDValue Y = Add->getOperand(0);
+  ConstantSDNode *Const = dyn_cast<ConstantSDNode>(Add->getOperand(1));
+  if (!Const || Const->getZExtValue() != 1)


Nit. Phrasing the condition in positive terms would be more readable, IMO.
if (Const && Const->getZExtValue() == 1) return Y;

Artem-B · 2024-05-29T01:47:34Z

llvm/test/CodeGen/NVPTX/combine-mad.ll

@@ -0,0 +1,101 @@
+; RUN: llc < %s -march=nvptx -mcpu=sm_20 -O1 | FileCheck %s


Another test which could use autogenerated CHECK patterns.

Artem-B · 2024-05-29T01:53:33Z

llvm/test/CodeGen/NVPTX/combine-mad.ll

+  ret i32 %mul
+}
+
+; Transpose (mul (select)) if it can then be folded to mad


Does it buy us anything?

mul(m,select(1,n)) will probably have the same performance as select(mul(m,n), m) as the critical path will always have mul and select, just in different order.

By itself this transform doesn't help much, I agree. However, if m or n are add(x,1) then it enables the other transformation. In the code we're checking for this case and only running the transformation when it would enable further folding. A rare case to be sure, but better to support it than not.

This kind of optimization is not target-specific and should probably be done somewhere in instcombine. Perhaps move the optimization of mul(m,select(1,n)) there as a separate patch?

instcombine already canonicalizes in the opposite direction, select(mul(m,n), m) -> mul(m,select(1,n)). I think this is target specific because it is only worth doing to improve mad folding.

Artem-B · 2024-05-29T22:21:09Z

llvm/test/CodeGen/NVPTX/combine-mad.ll

+  ret i32 %mul
+}
+
+; Transpose (mul (select)) if it can then be folded to mad


Artem-B · 2024-05-29T22:28:41Z

llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp

+
+  unsigned ConstOpNo = 1;
+  auto *Const = dyn_cast<ConstantSDNode>(Select->getOperand(ConstOpNo));
+  if (!Const || Const->getZExtValue() != 1) {


It looks like we could extract the common pattern into a helper function:

bool isConstOne(Operand) { const auto *Const = dyn_cast<ConstantSDNode>(Operand); return Const && Const->getZExtValue() == 1; }

and then use it in handful of instances of this pattern throughout the code.

Artem-B · 2024-05-30T21:37:15Z

llvm/test/CodeGen/NVPTX/combine-mad.ll

+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc < %s -mtriple=nvptx -mcpu=sm_20 -O1 | FileCheck %s
+; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_20 -O1 | FileCheck %s
+; RUN: %if ptxas %{ llc < %s -mtriple=nvptx -mcpu=sm_20 -O1 | %ptxas-verify %}


This needs to be disabled with newer ptxas.

RUN: %if ptxas && !ptxas-12.0 %{ llc < %s -march=nvptx -mcpu=sm_20 | %ptxas-verify %}

[NVPTX] Improve folding to mad with immediate 1

5b9b98a

AlexMaclean added the backend:NVPTX label May 29, 2024

AlexMaclean requested a review from Artem-B May 29, 2024 01:01

AlexMaclean self-assigned this May 29, 2024

Artem-B reviewed May 29, 2024

View reviewed changes

address comments

af084ad

Artem-B approved these changes May 29, 2024

View reviewed changes

address comments

d0c2760

AlexMaclean merged commit f32ebab into llvm:main May 30, 2024
5 of 7 checks passed

Artem-B reviewed May 30, 2024

View reviewed changes

AlexMaclean mentioned this pull request May 31, 2024

[NVPTX] disable combine-mad test for newer ptxas #93919

Merged

AlexMaclean mentioned this pull request Jun 28, 2024

[NVPTX] remove store.params of undef #96940

Merged

		@@ -0,0 +1,101 @@
		; RUN: llc < %s -march=nvptx -mcpu=sm_20 -O1 \| FileCheck %s

[NVPTX] Improve folding to mad with immediate 1 #93628

[NVPTX] Improve folding to mad with immediate 1 #93628

Uh oh!

Conversation

AlexMaclean commented May 29, 2024

Uh oh!

Artem-B left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!