[NVPTX] Support copysign PTX instruction #107800

AlexMaclean · 2024-09-09T02:33:25Z

Lower fcopysign SDNodes into copysign PTX instructions where possible. See PTX ISA: 9.7.3.2. Floating Point Instructions: copysign.

llvmbot · 2024-09-09T02:33:58Z

@llvm/pr-subscribers-backend-nvptx

Author: Alex MacLean (AlexMaclean)

Changes

Lower fcopysign SDNodes into copysign PTX instructions where possible. See PTX ISA: 9.7.3.2. Floating Point Instructions: copysign.

Full diff: https://github.com/llvm/llvm-project/pull/107800.diff

3 Files Affected:

(modified) llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp (+2-2)
(modified) llvm/lib/Target/NVPTX/NVPTXIntrinsics.td (+14)
(added) llvm/test/CodeGen/NVPTX/copysign.ll (+39)

diff --git a/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp b/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
index bb76ffdfd99d7b..6a046f5cd7c4d7 100644
--- a/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
+++ b/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
@@ -838,8 +838,8 @@ NVPTXTargetLowering::NVPTXTargetLowering(const NVPTXTargetMachine &TM,
   setOperationAction(ISD::FCOPYSIGN, MVT::v2f16, Expand);
   setOperationAction(ISD::FCOPYSIGN, MVT::bf16, Expand);
   setOperationAction(ISD::FCOPYSIGN, MVT::v2bf16, Expand);
-  setOperationAction(ISD::FCOPYSIGN, MVT::f32, Expand);
-  setOperationAction(ISD::FCOPYSIGN, MVT::f64, Expand);
+  setOperationAction(ISD::FCOPYSIGN, MVT::f32, Legal);
+  setOperationAction(ISD::FCOPYSIGN, MVT::f64, Legal);
 
   // These map to corresponding instructions for f32/f64. f16 must be
   // promoted to f32. v2f16 is expanded to f16, which is then promoted
diff --git a/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td b/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
index 0c883093dd0a54..1b817ec07ce3cf 100644
--- a/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
+++ b/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
@@ -977,6 +977,20 @@ def INT_NVVM_FABS_F : F_MATH_1<"abs.f32 \t$dst, $src0;", Float32Regs,
 def INT_NVVM_FABS_D : F_MATH_1<"abs.f64 \t$dst, $src0;", Float64Regs,
   Float64Regs, int_nvvm_fabs_d>;
 
+//
+// copysign
+//
+
+def COPYSIGN_F :
+    NVPTXInst<(outs Float32Regs:$dst), (ins Float32Regs:$src0, Float32Regs:$src1),
+              "copysign.f32 \t$dst, $src0, $src1;",
+              [(set Float32Regs:$dst, (fcopysign Float32Regs:$src0, Float32Regs:$src1))]>;
+
+def COPYSIGN_D :
+    NVPTXInst<(outs Float64Regs:$dst), (ins Float64Regs:$src0, Float64Regs:$src1),
+              "copysign.f64 \t$dst, $src0, $src1;",
+              [(set Float64Regs:$dst, (fcopysign Float64Regs:$src0, Float64Regs:$src1))]>;
+
 //
 // Abs, Neg bf16, bf16x2
 //
diff --git a/llvm/test/CodeGen/NVPTX/copysign.ll b/llvm/test/CodeGen/NVPTX/copysign.ll
new file mode 100644
index 00000000000000..1b2061060a3dd9
--- /dev/null
+++ b/llvm/test/CodeGen/NVPTX/copysign.ll
@@ -0,0 +1,39 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc < %s -march=nvptx64 -mcpu=sm_20 -verify-machineinstrs | FileCheck %s
+; RUN: %if ptxas %{ llc < %s -march=nvptx64 -mcpu=sm_20 -verify-machineinstrs | %ptxas-verify %}
+
+target triple = "nvptx64-nvidia-cuda"
+target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v16:16:16-v32:32:32-v64:64:64-v128:128:128-n16:32:64"
+
+define float @fcopysign_f(float %a, float %b) {
+; CHECK-LABEL: fcopysign_f(
+; CHECK:       {
+; CHECK-NEXT:    .reg .f32 %f<4>;
+; CHECK-EMPTY:
+; CHECK-NEXT:  // %bb.0:
+; CHECK-NEXT:    ld.param.f32 %f1, [fcopysign_f_param_0];
+; CHECK-NEXT:    ld.param.f32 %f2, [fcopysign_f_param_1];
+; CHECK-NEXT:    copysign.f32 %f3, %f1, %f2;
+; CHECK-NEXT:    st.param.f32 [func_retval0+0], %f3;
+; CHECK-NEXT:    ret;
+  %val = call float @llvm.copysign.f32(float %a, float %b)
+  ret float %val
+}
+
+define double @fcopysign_d(double %a, double %b) {
+; CHECK-LABEL: fcopysign_d(
+; CHECK:       {
+; CHECK-NEXT:    .reg .f64 %fd<4>;
+; CHECK-EMPTY:
+; CHECK-NEXT:  // %bb.0:
+; CHECK-NEXT:    ld.param.f64 %fd1, [fcopysign_d_param_0];
+; CHECK-NEXT:    ld.param.f64 %fd2, [fcopysign_d_param_1];
+; CHECK-NEXT:    copysign.f64 %fd3, %fd1, %fd2;
+; CHECK-NEXT:    st.param.f64 [func_retval0+0], %fd3;
+; CHECK-NEXT:    ret;
+  %val = call double @llvm.copysign.f64(double %a, double %b)
+  ret double %val
+}
+
+declare float @llvm.copysign.f32(float, float)
+declare double @llvm.copysign.f64(double, double)

Artem-B

LGTM.

For my own education -- is there hardware support for this instruction in SASS? Or would the copysign get expanded to the bit twiddling LLVM does?

AlexMaclean · 2024-09-09T22:51:29Z

LGTM.

For my own education -- is there hardware support for this instruction in SASS? Or would the copysign get expanded to the bit twiddling LLVM does?

Here is a quick comparison of the SASS before/after this change: https://godbolt.org/z/snWTGEG81. copysign is still getting expanded but it looks like ptxas is doing a better job then the SelectionDAG (2 vs. 3 instructions).

Artem-B · 2024-09-09T23:35:12Z

it looks like ptxas is doing a better job then the SelectionDAG (2 vs. 3 instructions).

2x LOP3.LUT look like a likely win over SHF, ISETP, FSEL.

We should probably start using lop3 more often, as it could potentially handle other common operations a bit better than the discrete logical ops we use now.

pranavk · 2024-09-10T18:07:38Z

This is making some of our internal XLA tests fail. I talked to @Artem-B offline and it seems we missed case of copysign from a constant. I will have to revert this to unblock us.

pranavk · 2024-09-10T18:11:56Z

I am working on some reproducers for you but that may take few hours.

AlexMaclean · 2024-09-10T18:13:06Z

Sounds good, Apologies! Some reproducers would be great!

This reverts commit b0d2411. Reverting because the original commit misses case of copysign from a constant.

pranavk · 2024-09-10T18:26:17Z

@AlexMaclean I have this I can share:

LLVM ERROR: Cannot select: t31: f32 = fcopysign ConstantFP:f32<5.000000e-01>, t17
  t19: f32 = ConstantFP<5.000000e-01>
  t17: f16,ch = load<(invariant load (s16) from %ir.7, addrspace 1)> t0, t16, undef:i64
    t16: i64 = add t10, t34
      t10: i64 = addrspacecast[0 -> 1] t35
        t35: i64,ch = load<(dereferenceable invariant load (s64) from `ptr addrspace(101) null`, addrspace 101)> t0, TargetExternalSymbol:i64'foo', undef:i64
          t1: i64 = TargetExternalSymbol'foo'
          t3: i64 = undef
      t34: i64 = NVPTXISD::MUL_WIDE_UNSIGNED t12, Constant:i32<2>
        t12: i32 = llvm.nvvm.read.ptx.sreg.tid.x TargetConstant:i64<8487>
          t11: i64 = TargetConstant<8487>
        t33: i32 = Constant<2>
    t3: i64 = undef

I will be working on getting the LLVM IR for you unless above is sufficient. Please let me know.

AlexMaclean · 2024-09-10T20:51:29Z

Thanks, this is enough, it seems the problem is not the constant but that the second arg is of f16 type

Artem-B · 2024-09-10T21:26:29Z

Huh. I'm surprised to see copysign used with mixed FP types.

Lower `fcopysign` SDNodes into `copysign` PTX instructions where possible. See [PTX ISA: 9.7.3.2. Floating Point Instructions: copysign] (https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#floating-point-instructions-copysign).

AlexMaclean · 2024-09-11T00:59:28Z

Yea it seems like many targets have had to work around this property of FCOPYSIGN nodes. One option would be to update ISel to handle these cases but I went ahead and added a TLI hook to prevent this sort of folding all together. #108125

AlexMaclean requested a review from Artem-B September 9, 2024 02:33

AlexMaclean self-assigned this Sep 9, 2024

llvmbot added the backend:NVPTX label Sep 9, 2024

AlexMaclean force-pushed the dev/amaclean/upstream-copysign-nvptx branch from ce7afff to bd78827 Compare September 9, 2024 02:59

Artem-B approved these changes Sep 9, 2024

View reviewed changes

[NVPTX] Support copysign PTX instruction

d5e0ff5

AlexMaclean force-pushed the dev/amaclean/upstream-copysign-nvptx branch from bd78827 to d5e0ff5 Compare September 9, 2024 23:33

AlexMaclean merged commit b0d2411 into llvm:main Sep 10, 2024
6 of 7 checks passed

pranavk added a commit that referenced this pull request Sep 10, 2024

Revert "[NVPTX] Support copysign PTX instruction (#107800)" (#108066)

02c943a

This reverts commit b0d2411. Reverting because the original commit misses case of copysign from a constant.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[NVPTX] Support copysign PTX instruction #107800

[NVPTX] Support copysign PTX instruction #107800

Uh oh!

AlexMaclean commented Sep 9, 2024

Uh oh!

llvmbot commented Sep 9, 2024

Uh oh!

Artem-B left a comment

Uh oh!

AlexMaclean commented Sep 9, 2024

Uh oh!

Artem-B commented Sep 9, 2024

Uh oh!

Uh oh!

pranavk commented Sep 10, 2024

Uh oh!

pranavk commented Sep 10, 2024

Uh oh!

AlexMaclean commented Sep 10, 2024

Uh oh!

pranavk commented Sep 10, 2024

Uh oh!

AlexMaclean commented Sep 10, 2024

Uh oh!

Artem-B commented Sep 10, 2024

Uh oh!

AlexMaclean commented Sep 11, 2024

Uh oh!

Uh oh!

[NVPTX] Support copysign PTX instruction #107800

[NVPTX] Support copysign PTX instruction #107800

Uh oh!

Conversation

AlexMaclean commented Sep 9, 2024

Uh oh!

llvmbot commented Sep 9, 2024

Uh oh!

Artem-B left a comment

Choose a reason for hiding this comment

Uh oh!

AlexMaclean commented Sep 9, 2024

Uh oh!

Artem-B commented Sep 9, 2024

Uh oh!

Uh oh!

pranavk commented Sep 10, 2024

Uh oh!

pranavk commented Sep 10, 2024

Uh oh!

AlexMaclean commented Sep 10, 2024

Uh oh!

pranavk commented Sep 10, 2024

Uh oh!

AlexMaclean commented Sep 10, 2024

Uh oh!

Artem-B commented Sep 10, 2024

Uh oh!

AlexMaclean commented Sep 11, 2024

Uh oh!

Uh oh!