Skip to content

Revert "[NVPTX] Support copysign PTX instruction (#107800)" #108066

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 10, 2024

Conversation

pranavk
Copy link
Contributor

@pranavk pranavk commented Sep 10, 2024

This reverts commit b0d2411.

Reverting because the original commit misses case of copysign from a constant.

@llvmbot
Copy link
Member

llvmbot commented Sep 10, 2024

@llvm/pr-subscribers-backend-nvptx

Author: Pranav Kant (pranavk)

Changes

This reverts commit b0d2411.

Reverting because the original commit misses case of copysign from a constant.


Full diff: https://github.com/llvm/llvm-project/pull/108066.diff

4 Files Affected:

  • (modified) llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp (+2-2)
  • (modified) llvm/lib/Target/NVPTX/NVPTXIntrinsics.td (-14)
  • (removed) llvm/test/CodeGen/NVPTX/copysign.ll (-39)
  • (modified) llvm/test/CodeGen/NVPTX/math-intrins.ll (+13-6)
diff --git a/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp b/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
index 3816e099537199..5c5766a8b23455 100644
--- a/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
+++ b/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
@@ -838,8 +838,8 @@ NVPTXTargetLowering::NVPTXTargetLowering(const NVPTXTargetMachine &TM,
   setOperationAction(ISD::FCOPYSIGN, MVT::v2f16, Expand);
   setOperationAction(ISD::FCOPYSIGN, MVT::bf16, Expand);
   setOperationAction(ISD::FCOPYSIGN, MVT::v2bf16, Expand);
-  setOperationAction(ISD::FCOPYSIGN, MVT::f32, Legal);
-  setOperationAction(ISD::FCOPYSIGN, MVT::f64, Legal);
+  setOperationAction(ISD::FCOPYSIGN, MVT::f32, Expand);
+  setOperationAction(ISD::FCOPYSIGN, MVT::f64, Expand);
 
   // These map to corresponding instructions for f32/f64. f16 must be
   // promoted to f32. v2f16 is expanded to f16, which is then promoted
diff --git a/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td b/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
index e8e8548120131e..0c883093dd0a54 100644
--- a/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
+++ b/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
@@ -977,20 +977,6 @@ def INT_NVVM_FABS_F : F_MATH_1<"abs.f32 \t$dst, $src0;", Float32Regs,
 def INT_NVVM_FABS_D : F_MATH_1<"abs.f64 \t$dst, $src0;", Float64Regs,
   Float64Regs, int_nvvm_fabs_d>;
 
-//
-// copysign
-//
-
-def COPYSIGN_F :
-    NVPTXInst<(outs Float32Regs:$dst), (ins Float32Regs:$src0, Float32Regs:$src1),
-              "copysign.f32 \t$dst, $src0, $src1;",
-              [(set Float32Regs:$dst, (fcopysign Float32Regs:$src1, Float32Regs:$src0))]>;
-
-def COPYSIGN_D :
-    NVPTXInst<(outs Float64Regs:$dst), (ins Float64Regs:$src0, Float64Regs:$src1),
-              "copysign.f64 \t$dst, $src0, $src1;",
-              [(set Float64Regs:$dst, (fcopysign Float64Regs:$src1, Float64Regs:$src0))]>;
-
 //
 // Abs, Neg bf16, bf16x2
 //
diff --git a/llvm/test/CodeGen/NVPTX/copysign.ll b/llvm/test/CodeGen/NVPTX/copysign.ll
deleted file mode 100644
index 96fb37a129b207..00000000000000
--- a/llvm/test/CodeGen/NVPTX/copysign.ll
+++ /dev/null
@@ -1,39 +0,0 @@
-; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
-; RUN: llc < %s -march=nvptx64 -mcpu=sm_20 -verify-machineinstrs | FileCheck %s
-; RUN: %if ptxas %{ llc < %s -march=nvptx64 -mcpu=sm_20 -verify-machineinstrs | %ptxas-verify %}
-
-target triple = "nvptx64-nvidia-cuda"
-target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v16:16:16-v32:32:32-v64:64:64-v128:128:128-n16:32:64"
-
-define float @fcopysign_f(float %a, float %b) {
-; CHECK-LABEL: fcopysign_f(
-; CHECK:       {
-; CHECK-NEXT:    .reg .f32 %f<4>;
-; CHECK-EMPTY:
-; CHECK-NEXT:  // %bb.0:
-; CHECK-NEXT:    ld.param.f32 %f1, [fcopysign_f_param_0];
-; CHECK-NEXT:    ld.param.f32 %f2, [fcopysign_f_param_1];
-; CHECK-NEXT:    copysign.f32 %f3, %f2, %f1;
-; CHECK-NEXT:    st.param.f32 [func_retval0+0], %f3;
-; CHECK-NEXT:    ret;
-  %val = call float @llvm.copysign.f32(float %a, float %b)
-  ret float %val
-}
-
-define double @fcopysign_d(double %a, double %b) {
-; CHECK-LABEL: fcopysign_d(
-; CHECK:       {
-; CHECK-NEXT:    .reg .f64 %fd<4>;
-; CHECK-EMPTY:
-; CHECK-NEXT:  // %bb.0:
-; CHECK-NEXT:    ld.param.f64 %fd1, [fcopysign_d_param_0];
-; CHECK-NEXT:    ld.param.f64 %fd2, [fcopysign_d_param_1];
-; CHECK-NEXT:    copysign.f64 %fd3, %fd2, %fd1;
-; CHECK-NEXT:    st.param.f64 [func_retval0+0], %fd3;
-; CHECK-NEXT:    ret;
-  %val = call double @llvm.copysign.f64(double %a, double %b)
-  ret double %val
-}
-
-declare float @llvm.copysign.f32(float, float)
-declare double @llvm.copysign.f64(double, double)
diff --git a/llvm/test/CodeGen/NVPTX/math-intrins.ll b/llvm/test/CodeGen/NVPTX/math-intrins.ll
index bdd6c914384601..fcc4ec6e4017f7 100644
--- a/llvm/test/CodeGen/NVPTX/math-intrins.ll
+++ b/llvm/test/CodeGen/NVPTX/math-intrins.ll
@@ -195,8 +195,9 @@ define double @round_double(double %a) {
 ; check the use of 0.5 to implement round
 ; CHECK-LABEL: round_double(
 ; CHECK:       {
-; CHECK-NEXT:    .reg .pred %p<3>;
-; CHECK-NEXT:    .reg .f64 %fd<8>;
+; CHECK-NEXT:    .reg .pred %p<4>;
+; CHECK-NEXT:    .reg .b64 %rd<4>;
+; CHECK-NEXT:    .reg .f64 %fd<10>;
 ; CHECK-EMPTY:
 ; CHECK-NEXT:  // %bb.0:
 ; CHECK-NEXT:    ld.param.f64 %fd1, [round_double_param_0];
@@ -205,10 +206,16 @@ define double @round_double(double %a) {
 ; CHECK-NEXT:    add.rn.f64 %fd3, %fd2, 0d3FE0000000000000;
 ; CHECK-NEXT:    cvt.rzi.f64.f64 %fd4, %fd3;
 ; CHECK-NEXT:    selp.f64 %fd5, 0d0000000000000000, %fd4, %p1;
-; CHECK-NEXT:    copysign.f64 %fd6, %fd1, %fd5;
-; CHECK-NEXT:    setp.gt.f64 %p2, %fd2, 0d4330000000000000;
-; CHECK-NEXT:    selp.f64 %fd7, %fd1, %fd6, %p2;
-; CHECK-NEXT:    st.param.f64 [func_retval0+0], %fd7;
+; CHECK-NEXT:    abs.f64 %fd6, %fd5;
+; CHECK-NEXT:    neg.f64 %fd7, %fd6;
+; CHECK-NEXT:    mov.b64 %rd1, %fd1;
+; CHECK-NEXT:    shr.u64 %rd2, %rd1, 63;
+; CHECK-NEXT:    and.b64 %rd3, %rd2, 1;
+; CHECK-NEXT:    setp.eq.b64 %p2, %rd3, 1;
+; CHECK-NEXT:    selp.f64 %fd8, %fd7, %fd6, %p2;
+; CHECK-NEXT:    setp.gt.f64 %p3, %fd2, 0d4330000000000000;
+; CHECK-NEXT:    selp.f64 %fd9, %fd1, %fd8, %p3;
+; CHECK-NEXT:    st.param.f64 [func_retval0+0], %fd9;
 ; CHECK-NEXT:    ret;
   %b = call double @llvm.round.f64(double %a)
   ret double %b

@pranavk
Copy link
Contributor Author

pranavk commented Sep 10, 2024

Reverting this. Reproducers coming soon.

@pranavk pranavk merged commit 02c943a into llvm:main Sep 10, 2024
7 of 9 checks passed
@Artem-B
Copy link
Member

Artem-B commented Sep 10, 2024

LLVM ERROR: Cannot select: t31: f32 = fcopysign ConstantFP:f32<5.000000e-01>, t17
  t19: f32 = ConstantFP<5.000000e-01>
  t17: f16,ch = load<(invariant load (s16) from %ir.7, addrspace 1)> t0, t16, undef:i64
    t16: i64 = add t10, t34
      t10: i64 = addrspacecast[0 -> 1] t35
        t35: i64,ch = load<(dereferenceable invariant load (s64) from `ptr addrspace(101) null`, addrspace 101)> t0, TargetExternalSymbol:i64'wrapped_round_nearest_afz_param_0', undef:i64
          t1: i64 = TargetExternalSymbol'wrapped_round_nearest_afz_param_0'
          t3: i64 = undef
      t34: i64 = NVPTXISD::MUL_WIDE_UNSIGNED t12, Constant:i32<2>
        t12: i32 = llvm.nvvm.read.ptx.sreg.tid.x TargetConstant:i64<8487>
          t11: i64 = TargetConstant<8487>
        t33: i32 = Constant<2>
    t3: i64 = undef

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants