-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[NVPTX] Support copysign PTX instruction #107800
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NVPTX] Support copysign PTX instruction #107800
Conversation
@llvm/pr-subscribers-backend-nvptx Author: Alex MacLean (AlexMaclean) ChangesLower Full diff: https://github.com/llvm/llvm-project/pull/107800.diff 3 Files Affected:
diff --git a/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp b/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
index bb76ffdfd99d7b..6a046f5cd7c4d7 100644
--- a/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
+++ b/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
@@ -838,8 +838,8 @@ NVPTXTargetLowering::NVPTXTargetLowering(const NVPTXTargetMachine &TM,
setOperationAction(ISD::FCOPYSIGN, MVT::v2f16, Expand);
setOperationAction(ISD::FCOPYSIGN, MVT::bf16, Expand);
setOperationAction(ISD::FCOPYSIGN, MVT::v2bf16, Expand);
- setOperationAction(ISD::FCOPYSIGN, MVT::f32, Expand);
- setOperationAction(ISD::FCOPYSIGN, MVT::f64, Expand);
+ setOperationAction(ISD::FCOPYSIGN, MVT::f32, Legal);
+ setOperationAction(ISD::FCOPYSIGN, MVT::f64, Legal);
// These map to corresponding instructions for f32/f64. f16 must be
// promoted to f32. v2f16 is expanded to f16, which is then promoted
diff --git a/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td b/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
index 0c883093dd0a54..1b817ec07ce3cf 100644
--- a/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
+++ b/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
@@ -977,6 +977,20 @@ def INT_NVVM_FABS_F : F_MATH_1<"abs.f32 \t$dst, $src0;", Float32Regs,
def INT_NVVM_FABS_D : F_MATH_1<"abs.f64 \t$dst, $src0;", Float64Regs,
Float64Regs, int_nvvm_fabs_d>;
+//
+// copysign
+//
+
+def COPYSIGN_F :
+ NVPTXInst<(outs Float32Regs:$dst), (ins Float32Regs:$src0, Float32Regs:$src1),
+ "copysign.f32 \t$dst, $src0, $src1;",
+ [(set Float32Regs:$dst, (fcopysign Float32Regs:$src0, Float32Regs:$src1))]>;
+
+def COPYSIGN_D :
+ NVPTXInst<(outs Float64Regs:$dst), (ins Float64Regs:$src0, Float64Regs:$src1),
+ "copysign.f64 \t$dst, $src0, $src1;",
+ [(set Float64Regs:$dst, (fcopysign Float64Regs:$src0, Float64Regs:$src1))]>;
+
//
// Abs, Neg bf16, bf16x2
//
diff --git a/llvm/test/CodeGen/NVPTX/copysign.ll b/llvm/test/CodeGen/NVPTX/copysign.ll
new file mode 100644
index 00000000000000..1b2061060a3dd9
--- /dev/null
+++ b/llvm/test/CodeGen/NVPTX/copysign.ll
@@ -0,0 +1,39 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc < %s -march=nvptx64 -mcpu=sm_20 -verify-machineinstrs | FileCheck %s
+; RUN: %if ptxas %{ llc < %s -march=nvptx64 -mcpu=sm_20 -verify-machineinstrs | %ptxas-verify %}
+
+target triple = "nvptx64-nvidia-cuda"
+target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v16:16:16-v32:32:32-v64:64:64-v128:128:128-n16:32:64"
+
+define float @fcopysign_f(float %a, float %b) {
+; CHECK-LABEL: fcopysign_f(
+; CHECK: {
+; CHECK-NEXT: .reg .f32 %f<4>;
+; CHECK-EMPTY:
+; CHECK-NEXT: // %bb.0:
+; CHECK-NEXT: ld.param.f32 %f1, [fcopysign_f_param_0];
+; CHECK-NEXT: ld.param.f32 %f2, [fcopysign_f_param_1];
+; CHECK-NEXT: copysign.f32 %f3, %f1, %f2;
+; CHECK-NEXT: st.param.f32 [func_retval0+0], %f3;
+; CHECK-NEXT: ret;
+ %val = call float @llvm.copysign.f32(float %a, float %b)
+ ret float %val
+}
+
+define double @fcopysign_d(double %a, double %b) {
+; CHECK-LABEL: fcopysign_d(
+; CHECK: {
+; CHECK-NEXT: .reg .f64 %fd<4>;
+; CHECK-EMPTY:
+; CHECK-NEXT: // %bb.0:
+; CHECK-NEXT: ld.param.f64 %fd1, [fcopysign_d_param_0];
+; CHECK-NEXT: ld.param.f64 %fd2, [fcopysign_d_param_1];
+; CHECK-NEXT: copysign.f64 %fd3, %fd1, %fd2;
+; CHECK-NEXT: st.param.f64 [func_retval0+0], %fd3;
+; CHECK-NEXT: ret;
+ %val = call double @llvm.copysign.f64(double %a, double %b)
+ ret double %val
+}
+
+declare float @llvm.copysign.f32(float, float)
+declare double @llvm.copysign.f64(double, double)
|
ce7afff
to
bd78827
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
For my own education -- is there hardware support for this instruction in SASS? Or would the copysign get expanded to the bit twiddling LLVM does?
Here is a quick comparison of the SASS before/after this change: https://godbolt.org/z/snWTGEG81. copysign is still getting expanded but it looks like |
bd78827
to
d5e0ff5
Compare
We should probably start using |
This is making some of our internal XLA tests fail. I talked to @Artem-B offline and it seems we missed case of copysign from a constant. I will have to revert this to unblock us. |
I am working on some reproducers for you but that may take few hours. |
Sounds good, Apologies! Some reproducers would be great! |
@AlexMaclean I have this I can share:
I will be working on getting the LLVM IR for you unless above is sufficient. Please let me know. |
Thanks, this is enough, it seems the problem is not the constant but that the second arg is of f16 type |
Huh. I'm surprised to see copysign used with mixed FP types. |
Lower `fcopysign` SDNodes into `copysign` PTX instructions where possible. See [PTX ISA: 9.7.3.2. Floating Point Instructions: copysign] (https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#floating-point-instructions-copysign).
Yea it seems like many targets have had to work around this property of FCOPYSIGN nodes. One option would be to update ISel to handle these cases but I went ahead and added a TLI hook to prevent this sort of folding all together. #108125 |
Lower
fcopysign
SDNodes intocopysign
PTX instructions where possible. See PTX ISA: 9.7.3.2. Floating Point Instructions: copysign.