Skip to content

Reimplement constrained 'trunc' using operand bundles #118253

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

spavloff
Copy link
Collaborator

@spavloff spavloff commented Dec 2, 2024

Previously the function 'trunc' in non-default floating-point environment was implemented with a special LLVM intrinsic 'experimental.constrained.trunc'. Introduction of floating-point operand bundles allows expressing the interaction with the FP environment using the same intrinsic as for the default mode.

This changes removes 'llvm.experimental.constrained.trunc' and use 'llvm.trunc' in all cases.

Currently floating-point operations in general form (beyond the default
mode) are always represented by calls to constrained intrinsics. In
addition to the side effect, they carry additional information in the
form of metadata arguments. This scheme is not efficient in the case of
intrinsic function calls, as was noted in
https://discourse.llvm.org/t/thought-on-strictfp-support/71453, because
it requires defining a separate intrinsic for the same operation but
used in non-default FP environment. The solution proposed in the
discussion was "to move the complexity about the environment tracking
from the intrinsics themselves to the call instruction".

The way implemented in this change is to use operand bundles
(https://llvm.org/docs/LangRef.html#operand-bundles). This way was tried
previously (https://reviews.llvm.org/D93455), but was not finished.

This change does not add any new functionality, it only adds the new way
of keeping FP related information in LLVM IR. Metadata arguments of
constrained functions are preserved, but they are not used in the
queries like `getRoundingMode` or `getExceptionBehavior`.
- Fix Doxygen error,
- Fix clang-format error,
- remove unused function declaration,
- remove setting MD_fpmath, it is made by copyMetadata.
Previously the function 'trunc' in non-default floating-point
environment was implemented with a special LLVM intrinsic
'experimental.constrained.trunc'. Introduction of floating-point operand
bundles allows expressing the interaction with the FP environment using the
same intrinsic as for the default mode.

This changes removes 'llvm.experimental.constrained.trunc' and use
'llvm.trunc' in all cases.
Copy link

graphite-app bot commented Dec 2, 2024

Your org has enabled the Graphite merge queue for merging into main

Add the label “FP Bundles” to the PR and Graphite will automatically add it to the merge queue when it’s ready to merge.

You must have a Graphite account and log in to Graphite in order to use the merge queue. Sign up using this link.

@llvmbot
Copy link
Member

llvmbot commented Dec 2, 2024

@llvm/pr-subscribers-backend-amdgpu
@llvm/pr-subscribers-llvm-selectiondag
@llvm/pr-subscribers-llvm-transforms
@llvm/pr-subscribers-clang-codegen
@llvm/pr-subscribers-clang

@llvm/pr-subscribers-backend-arm

Author: Serge Pavlov (spavloff)

Changes

Previously the function 'trunc' in non-default floating-point environment was implemented with a special LLVM intrinsic 'experimental.constrained.trunc'. Introduction of floating-point operand bundles allows expressing the interaction with the FP environment using the same intrinsic as for the default mode.

This changes removes 'llvm.experimental.constrained.trunc' and use 'llvm.trunc' in all cases.


Patch is 138.79 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/118253.diff

66 Files Affected:

  • (modified) clang/lib/CodeGen/CGBuiltin.cpp (+26-26)
  • (modified) clang/test/CodeGen/AArch64/neon-intrinsics-constrained.c (+1-1)
  • (modified) clang/test/CodeGen/AArch64/v8.2a-fp16-intrinsics-constrained.c (+3-1)
  • (modified) clang/test/CodeGen/PowerPC/builtins-ppc-fpconstrained.c (+4-2)
  • (modified) clang/test/CodeGen/SystemZ/builtins-systemz-vector-constrained.c (+3-1)
  • (modified) clang/test/CodeGen/SystemZ/builtins-systemz-vector2-constrained.c (+2-1)
  • (modified) clang/test/CodeGen/SystemZ/builtins-systemz-zvector-constrained.c (+4-2)
  • (modified) clang/test/CodeGen/SystemZ/builtins-systemz-zvector2-constrained.c (+6-4)
  • (modified) clang/test/CodeGen/arm64-vrnd-constrained.c (+3-1)
  • (modified) clang/test/CodeGen/constrained-math-builtins.c (+11-8)
  • (modified) llvm/include/llvm/CodeGen/SelectionDAGNodes.h (+1)
  • (modified) llvm/include/llvm/CodeGen/TargetLowering.h (+1)
  • (modified) llvm/include/llvm/IR/ConstrainedOps.def (+7-1)
  • (modified) llvm/include/llvm/IR/Function.h (+1-1)
  • (modified) llvm/include/llvm/IR/InstrTypes.h (+3)
  • (modified) llvm/include/llvm/IR/IntrinsicInst.h (+12)
  • (modified) llvm/include/llvm/IR/Intrinsics.h (+4-3)
  • (modified) llvm/include/llvm/IR/Intrinsics.td (-3)
  • (modified) llvm/lib/Analysis/ConstantFolding.cpp (+6-7)
  • (modified) llvm/lib/AsmParser/LLParser.cpp (+9)
  • (modified) llvm/lib/CodeGen/ExpandVectorPredication.cpp (+1-1)
  • (modified) llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp (+6)
  • (modified) llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp (+2)
  • (modified) llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp (+3)
  • (modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp (+1)
  • (modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp (+16-3)
  • (modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h (+1-1)
  • (modified) llvm/lib/CodeGen/TargetLoweringBase.cpp (+2-1)
  • (modified) llvm/lib/IR/AutoUpgrade.cpp (+65-7)
  • (modified) llvm/lib/IR/Function.cpp (+2-2)
  • (modified) llvm/lib/IR/Instructions.cpp (+5)
  • (modified) llvm/lib/IR/IntrinsicInst.cpp (+31-1)
  • (modified) llvm/lib/IR/Intrinsics.cpp (+1-1)
  • (modified) llvm/lib/Transforms/Utils/Local.cpp (+3-4)
  • (modified) llvm/test/Assembler/fp-intrinsics-attr.ll (+5-7)
  • (modified) llvm/test/Bitcode/auto-upgrade-constrained.ll (+1-1)
  • (modified) llvm/test/CodeGen/AArch64/fp-intrinsics-fp16.ll (+1-2)
  • (modified) llvm/test/CodeGen/AArch64/fp-intrinsics-vector.ll (+3-6)
  • (modified) llvm/test/CodeGen/AArch64/fp-intrinsics.ll (+3-6)
  • (modified) llvm/test/CodeGen/ARM/fp-intrinsics.ll (+2-2)
  • (modified) llvm/test/CodeGen/PowerPC/fp-strict-round.ll (+4-17)
  • (modified) llvm/test/CodeGen/PowerPC/ppcf128-constrained-fp-intrinsics.ll (+1-4)
  • (modified) llvm/test/CodeGen/PowerPC/vector-constrained-fp-intrinsics.ll (+4-17)
  • (modified) llvm/test/CodeGen/RISCV/double-intrinsics-strict.ll (+1-3)
  • (modified) llvm/test/CodeGen/RISCV/float-intrinsics-strict.ll (+1-3)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-ftrunc-constrained-sdnode.ll (+15-30)
  • (modified) llvm/test/CodeGen/RISCV/rvv/ftrunc-constrained-sdnode.ll (+15-30)
  • (modified) llvm/test/CodeGen/RISCV/zfh-half-intrinsics-strict.ll (+1-3)
  • (modified) llvm/test/CodeGen/RISCV/zfhmin-half-intrinsics-strict.ll (+1-3)
  • (modified) llvm/test/CodeGen/SystemZ/fp-strict-round-01.ll (+3-12)
  • (modified) llvm/test/CodeGen/SystemZ/fp-strict-round-02.ll (+3-12)
  • (modified) llvm/test/CodeGen/SystemZ/fp-strict-round-03.ll (+3-12)
  • (modified) llvm/test/CodeGen/SystemZ/vec-strict-round-01.ll (+2-8)
  • (modified) llvm/test/CodeGen/SystemZ/vec-strict-round-02.ll (+2-8)
  • (modified) llvm/test/CodeGen/SystemZ/vector-constrained-fp-intrinsics.ll (+4-17)
  • (modified) llvm/test/CodeGen/X86/fp-strict-scalar-round-fp16.ll (+2-4)
  • (modified) llvm/test/CodeGen/X86/fp-strict-scalar-round.ll (+2-6)
  • (modified) llvm/test/CodeGen/X86/fp128-libcalls-strict.ll (+1-2)
  • (modified) llvm/test/CodeGen/X86/fp80-strict-libcalls.ll (+1-2)
  • (modified) llvm/test/CodeGen/X86/vec-strict-256-fp16.ll (+1-3)
  • (modified) llvm/test/CodeGen/X86/vec-strict-256.ll (+2-6)
  • (modified) llvm/test/CodeGen/X86/vec-strict-512-fp16.ll (+1-2)
  • (modified) llvm/test/CodeGen/X86/vec-strict-512.ll (+2-4)
  • (modified) llvm/test/CodeGen/X86/vec-strict-round-128.ll (+2-6)
  • (modified) llvm/test/CodeGen/X86/vector-constrained-fp-intrinsics.ll (+4-17)
  • (modified) llvm/test/Transforms/InstSimplify/constfold-constrained.ll (+24-25)
diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index cb9c23b8e0a0d0..52b2d3320c60ea 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -657,6 +657,17 @@ static Value *emitUnaryMaybeConstrainedFPBuiltin(CodeGenFunction &CGF,
   }
 }
 
+// Emit a simple mangled intrinsic that has 1 argument and a return type
+// matching the argument type.
+static Value *emitUnaryFPBuiltin(CodeGenFunction &CGF, const CallExpr *E,
+                                 unsigned IntrinsicID) {
+  llvm::Value *Src0 = CGF.EmitScalarExpr(E->getArg(0));
+
+  CodeGenFunction::CGFPOptionsRAII FPOptsRAII(CGF, E);
+  Function *F = CGF.CGM.getIntrinsic(IntrinsicID, Src0->getType());
+  return CGF.Builder.CreateCall(F, Src0);
+}
+
 // Emit an intrinsic that has 2 operands of the same type as its result.
 // Depending on mode, this may be a constrained floating-point intrinsic.
 static Value *emitBinaryMaybeConstrainedFPBuiltin(CodeGenFunction &CGF,
@@ -3238,9 +3249,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     case Builtin::BI__builtin_truncf16:
     case Builtin::BI__builtin_truncl:
     case Builtin::BI__builtin_truncf128:
-      return RValue::get(emitUnaryMaybeConstrainedFPBuiltin(*this, E,
-                                   Intrinsic::trunc,
-                                   Intrinsic::experimental_constrained_trunc));
+      return RValue::get(emitUnaryFPBuiltin(*this, E, Intrinsic::trunc));
 
     case Builtin::BIlround:
     case Builtin::BIlroundf:
@@ -6827,7 +6836,7 @@ Value *CodeGenFunction::EmitNeonCall(Function *F, SmallVectorImpl<Value*> &Ops,
   unsigned j = 0;
   for (Function::const_arg_iterator ai = F->arg_begin(), ae = F->arg_end();
        ai != ae; ++ai, ++j) {
-    if (F->isConstrainedFPIntrinsic())
+    if (F->isLegacyConstrainedIntrinsic())
       if (ai->getType()->isMetadataTy())
         continue;
     if (shift > 0 && shift == j)
@@ -6836,7 +6845,7 @@ Value *CodeGenFunction::EmitNeonCall(Function *F, SmallVectorImpl<Value*> &Ops,
       Ops[j] = Builder.CreateBitCast(Ops[j], ai->getType(), name);
   }
 
-  if (F->isConstrainedFPIntrinsic())
+  if (F->isLegacyConstrainedIntrinsic())
     return Builder.CreateConstrainedFPCall(F, Ops, name);
   else
     return Builder.CreateCall(F, Ops, name);
@@ -12989,13 +12998,11 @@ Value *CodeGenFunction::EmitAArch64BuiltinExpr(unsigned BuiltinID,
               : Intrinsic::rint;
     return EmitNeonCall(CGM.getIntrinsic(Int, Ty), Ops, "vrndx");
   }
-  case NEON::BI__builtin_neon_vrndh_f16: {
+  case NEON::BI__builtin_neon_vrndh_f16:
     Ops.push_back(EmitScalarExpr(E->getArg(0)));
-    Int = Builder.getIsFPConstrained()
-              ? Intrinsic::experimental_constrained_trunc
-              : Intrinsic::trunc;
-    return EmitNeonCall(CGM.getIntrinsic(Int, HalfTy), Ops, "vrndz");
-  }
+    return EmitNeonCall(CGM.getIntrinsic(Intrinsic::trunc, HalfTy), Ops,
+                        "vrndz");
+
   case NEON::BI__builtin_neon_vrnd32x_f32:
   case NEON::BI__builtin_neon_vrnd32xq_f32:
   case NEON::BI__builtin_neon_vrnd32x_f64:
@@ -13029,12 +13036,9 @@ Value *CodeGenFunction::EmitAArch64BuiltinExpr(unsigned BuiltinID,
     return EmitNeonCall(CGM.getIntrinsic(Int, Ty), Ops, "vrnd64z");
   }
   case NEON::BI__builtin_neon_vrnd_v:
-  case NEON::BI__builtin_neon_vrndq_v: {
-    Int = Builder.getIsFPConstrained()
-              ? Intrinsic::experimental_constrained_trunc
-              : Intrinsic::trunc;
-    return EmitNeonCall(CGM.getIntrinsic(Int, Ty), Ops, "vrndz");
-  }
+  case NEON::BI__builtin_neon_vrndq_v:
+    return EmitNeonCall(CGM.getIntrinsic(Intrinsic::trunc, Ty), Ops, "vrndz");
+
   case NEON::BI__builtin_neon_vcvt_f64_v:
   case NEON::BI__builtin_neon_vcvtq_f64_v:
     Ops[0] = Builder.CreateBitCast(Ops[0], Ty);
@@ -18251,9 +18255,8 @@ Value *CodeGenFunction::EmitPPCBuiltinExpr(unsigned BuiltinID,
                : Intrinsic::ceil;
     else if (BuiltinID == PPC::BI__builtin_vsx_xvrdpiz ||
              BuiltinID == PPC::BI__builtin_vsx_xvrspiz)
-      ID = Builder.getIsFPConstrained()
-               ? Intrinsic::experimental_constrained_trunc
-               : Intrinsic::trunc;
+      return emitUnaryFPBuiltin(*this, E, Intrinsic::trunc);
+
     llvm::Function *F = CGM.getIntrinsic(ID, ResultType);
     return Builder.getIsFPConstrained() ? Builder.CreateConstrainedFPCall(F, X)
                                         : Builder.CreateCall(F, X);
@@ -18754,9 +18757,7 @@ Value *CodeGenFunction::EmitPPCBuiltinExpr(unsigned BuiltinID,
         .getScalarVal();
   case PPC::BI__builtin_ppc_friz:
   case PPC::BI__builtin_ppc_frizs:
-    return RValue::get(emitUnaryMaybeConstrainedFPBuiltin(
-                           *this, E, Intrinsic::trunc,
-                           Intrinsic::experimental_constrained_trunc))
+    return RValue::get(emitUnaryFPBuiltin(*this, E, Intrinsic::trunc))
         .getScalarVal();
   case PPC::BI__builtin_ppc_fsqrt:
   case PPC::BI__builtin_ppc_fsqrts:
@@ -20536,8 +20537,7 @@ Value *CodeGenFunction::EmitSystemZBuiltinExpr(unsigned BuiltinID,
               CI = Intrinsic::experimental_constrained_nearbyint; break;
       case 1: ID = Intrinsic::round;
               CI = Intrinsic::experimental_constrained_round; break;
-      case 5: ID = Intrinsic::trunc;
-              CI = Intrinsic::experimental_constrained_trunc; break;
+      case 5: ID = Intrinsic::trunc; break;
       case 6: ID = Intrinsic::ceil;
               CI = Intrinsic::experimental_constrained_ceil; break;
       case 7: ID = Intrinsic::floor;
@@ -20546,7 +20546,7 @@ Value *CodeGenFunction::EmitSystemZBuiltinExpr(unsigned BuiltinID,
       break;
     }
     if (ID != Intrinsic::not_intrinsic) {
-      if (Builder.getIsFPConstrained()) {
+      if (Builder.getIsFPConstrained() && ID != Intrinsic::trunc) {
         Function *F = CGM.getIntrinsic(CI, ResultType);
         return Builder.CreateConstrainedFPCall(F, X);
       } else {
diff --git a/clang/test/CodeGen/AArch64/neon-intrinsics-constrained.c b/clang/test/CodeGen/AArch64/neon-intrinsics-constrained.c
index 15ae7eea820e80..0405cf7f19c73b 100644
--- a/clang/test/CodeGen/AArch64/neon-intrinsics-constrained.c
+++ b/clang/test/CodeGen/AArch64/neon-intrinsics-constrained.c
@@ -792,7 +792,7 @@ float64x1_t test_vrndx_f64(float64x1_t a) {
 // COMMON-LABEL: test_vrnd_f64
 // COMMONIR:      [[TMP0:%.*]] = bitcast <1 x double> %a to <8 x i8>
 // UNCONSTRAINED: [[VRNDZ1_I:%.*]] = call <1 x double> @llvm.trunc.v1f64(<1 x double> %a)
-// CONSTRAINED:   [[VRNDZ1_I:%.*]] = call <1 x double> @llvm.experimental.constrained.trunc.v1f64(<1 x double> %a, metadata !"fpexcept.strict")
+// CONSTRAINED:   [[VRNDZ1_I:%.*]] = call <1 x double> @llvm.trunc.v1f64(<1 x double> %a) #[[ATTR:[0-9]+]] [ "fpe.except"(metadata !"strict") ]
 // COMMONIR:      ret <1 x double> [[VRNDZ1_I]]
 float64x1_t test_vrnd_f64(float64x1_t a) {
   return vrnd_f64(a);
diff --git a/clang/test/CodeGen/AArch64/v8.2a-fp16-intrinsics-constrained.c b/clang/test/CodeGen/AArch64/v8.2a-fp16-intrinsics-constrained.c
index 9109626cea9ca2..9079a6690b9db3 100644
--- a/clang/test/CodeGen/AArch64/v8.2a-fp16-intrinsics-constrained.c
+++ b/clang/test/CodeGen/AArch64/v8.2a-fp16-intrinsics-constrained.c
@@ -150,7 +150,7 @@ uint64_t test_vcvth_u64_f16 (float16_t a) {
 
 // COMMON-LABEL: test_vrndh_f16
 // UNCONSTRAINED:  [[RND:%.*]] = call half @llvm.trunc.f16(half %a)
-// CONSTRAINED:    [[RND:%.*]] = call half @llvm.experimental.constrained.trunc.f16(half %a, metadata !"fpexcept.strict")
+// CONSTRAINED:    [[RND:%.*]] = call half @llvm.trunc.f16(half %a) #[[ATTR:[0-9]+]] [ "fpe.except"(metadata !"strict") ]
 // COMMONIR:       ret half [[RND]]
 float16_t test_vrndh_f16(float16_t a) {
   return vrndh_f16(a);
@@ -298,3 +298,5 @@ float16_t test_vfmsh_f16(float16_t a, float16_t b, float16_t c) {
   return vfmsh_f16(a, b, c);
 }
 
+// CHECK: attributes #[[ATTR]] = { strictfp memory(inaccessiblemem: readwrite) }
+
diff --git a/clang/test/CodeGen/PowerPC/builtins-ppc-fpconstrained.c b/clang/test/CodeGen/PowerPC/builtins-ppc-fpconstrained.c
index 838db02415fe5b..b326f131a56e54 100644
--- a/clang/test/CodeGen/PowerPC/builtins-ppc-fpconstrained.c
+++ b/clang/test/CodeGen/PowerPC/builtins-ppc-fpconstrained.c
@@ -85,13 +85,13 @@ void test_float(void) {
   vf = __builtin_vsx_xvrspiz(vf);
   // CHECK-LABEL: try-xvrspiz
   // CHECK-UNCONSTRAINED: @llvm.trunc.v4f32(<4 x float> %{{.*}})
-  // CHECK-CONSTRAINED: @llvm.experimental.constrained.trunc.v4f32(<4 x float> %{{.*}}, metadata !"fpexcept.strict")
+  // CHECK-CONSTRAINED: @llvm.trunc.v4f32(<4 x float> %{{.*}}) #[[ATTR:[0-9]+]] [ "fpe.except"(metadata !"strict") ]
   // CHECK-ASM: xvrspiz
 
   vd = __builtin_vsx_xvrdpiz(vd);
   // CHECK-LABEL: try-xvrdpiz
   // CHECK-UNCONSTRAINED: @llvm.trunc.v2f64(<2 x double> %{{.*}})
-  // CHECK-CONSTRAINED: @llvm.experimental.constrained.trunc.v2f64(<2 x double> %{{.*}}, metadata !"fpexcept.strict")
+  // CHECK-CONSTRAINED: @llvm.trunc.v2f64(<2 x double> %{{.*}}) #[[ATTR]] [ "fpe.except"(metadata !"strict") ]
   // CHECK-ASM: xvrdpiz
 
   vf = __builtin_vsx_xvmaddasp(vf, vf, vf);
@@ -156,3 +156,5 @@ void test_float(void) {
   // CHECK-CONSTRAINED: fneg <2 x double> [[RESULT1]]
   // CHECK-ASM: xvnmsubadp
 }
+
+// CHECK-CONSTRAINED: attributes #[[ATTR]] = { strictfp memory(inaccessiblemem: readwrite) }
diff --git a/clang/test/CodeGen/SystemZ/builtins-systemz-vector-constrained.c b/clang/test/CodeGen/SystemZ/builtins-systemz-vector-constrained.c
index 6d2845504a39f0..77ede2c10eea08 100644
--- a/clang/test/CodeGen/SystemZ/builtins-systemz-vector-constrained.c
+++ b/clang/test/CodeGen/SystemZ/builtins-systemz-vector-constrained.c
@@ -45,7 +45,7 @@ void test_float(void) {
   vd = __builtin_s390_vfidb(vd, 4, 1);
   // CHECK: call <2 x double> @llvm.experimental.constrained.round.v2f64(<2 x double> %{{.*}})
   vd = __builtin_s390_vfidb(vd, 4, 5);
-  // CHECK: call <2 x double> @llvm.experimental.constrained.trunc.v2f64(<2 x double> %{{.*}})
+  // CHECK: call <2 x double> @llvm.trunc.v2f64(<2 x double> %{{.*}}) #[[ATTR:[0-9]+]] [ "fpe.except"(metadata !"strict") ]
   vd = __builtin_s390_vfidb(vd, 4, 6);
   // CHECK: call <2 x double> @llvm.experimental.constrained.ceil.v2f64(<2 x double> %{{.*}})
   vd = __builtin_s390_vfidb(vd, 4, 7);
@@ -53,3 +53,5 @@ void test_float(void) {
   vd = __builtin_s390_vfidb(vd, 4, 4);
   // CHECK: call <2 x double> @llvm.s390.vfidb(<2 x double> %{{.*}}, i32 4, i32 4)
 }
+
+// CHECK: attributes #[[ATTR]] = { strictfp memory(inaccessiblemem: readwrite) }
diff --git a/clang/test/CodeGen/SystemZ/builtins-systemz-vector2-constrained.c b/clang/test/CodeGen/SystemZ/builtins-systemz-vector2-constrained.c
index 735b6a0249ab62..7488cf90a9669d 100644
--- a/clang/test/CodeGen/SystemZ/builtins-systemz-vector2-constrained.c
+++ b/clang/test/CodeGen/SystemZ/builtins-systemz-vector2-constrained.c
@@ -60,10 +60,11 @@ void test_float(void) {
   vf = __builtin_s390_vfisb(vf, 4, 1);
   // CHECK: call <4 x float> @llvm.experimental.constrained.round.v4f32(<4 x float> %{{.*}}, metadata !{{.*}})
   vf = __builtin_s390_vfisb(vf, 4, 5);
-  // CHECK: call <4 x float> @llvm.experimental.constrained.trunc.v4f32(<4 x float> %{{.*}}, metadata !{{.*}})
+  // CHECK: call <4 x float> @llvm.trunc.v4f32(<4 x float> %{{.*}}) #[[ATTR:[0-9]+]] [ "fpe.except"(metadata !"strict") ]
   vf = __builtin_s390_vfisb(vf, 4, 6);
   // CHECK: call <4 x float> @llvm.experimental.constrained.ceil.v4f32(<4 x float> %{{.*}}, metadata !{{.*}})
   vf = __builtin_s390_vfisb(vf, 4, 7);
   // CHECK: call <4 x float> @llvm.experimental.constrained.floor.v4f32(<4 x float> %{{.*}}, metadata !{{.*}})
 }
 
+// CHECK: attributes #[[ATTR]] = { strictfp memory(inaccessiblemem: readwrite) }
diff --git a/clang/test/CodeGen/SystemZ/builtins-systemz-zvector-constrained.c b/clang/test/CodeGen/SystemZ/builtins-systemz-zvector-constrained.c
index 6a1f8f0e923f65..fe964fa38aee07 100644
--- a/clang/test/CodeGen/SystemZ/builtins-systemz-zvector-constrained.c
+++ b/clang/test/CodeGen/SystemZ/builtins-systemz-zvector-constrained.c
@@ -303,10 +303,10 @@ void test_float(void) {
   // CHECK: call <2 x double> @llvm.experimental.constrained.floor.v2f64(<2 x double> %{{.*}}, metadata !{{.*}})
   // CHECK-ASM: vfidb %{{.*}}, %{{.*}}, 4, 7
   vd = vec_roundz(vd);
-  // CHECK: call <2 x double> @llvm.experimental.constrained.trunc.v2f64(<2 x double> %{{.*}}, metadata !{{.*}})
+  // CHECK: call <2 x double> @llvm.trunc.v2f64(<2 x double> %{{.*}}) #[[ATTR:[0-9]+]] [ "fpe.except"(metadata !"strict") ]
   // CHECK-ASM: vfidb %{{.*}}, %{{.*}}, 4, 5
   vd = vec_trunc(vd);
-  // CHECK: call <2 x double> @llvm.experimental.constrained.trunc.v2f64(<2 x double> %{{.*}}, metadata !{{.*}})
+  // CHECK: call <2 x double> @llvm.trunc.v2f64(<2 x double> %{{.*}}) #[[ATTR]] [ "fpe.except"(metadata !"strict") ]
   // CHECK-ASM: vfidb %{{.*}}, %{{.*}}, 4, 5
   vd = vec_roundc(vd);
   // CHECK: call <2 x double> @llvm.experimental.constrained.nearbyint.v2f64(<2 x double> %{{.*}}, metadata !{{.*}})
@@ -316,3 +316,5 @@ void test_float(void) {
   // CHECK-ASM: vfidb %{{.*}}, %{{.*}}, 0, 0
   vd = vec_round(vd);
 }
+
+// CHECK: attributes #[[ATTR]] = { strictfp memory(inaccessiblemem: readwrite) }
diff --git a/clang/test/CodeGen/SystemZ/builtins-systemz-zvector2-constrained.c b/clang/test/CodeGen/SystemZ/builtins-systemz-zvector2-constrained.c
index 750f5011a26798..e7ea4e325862e9 100644
--- a/clang/test/CodeGen/SystemZ/builtins-systemz-zvector2-constrained.c
+++ b/clang/test/CodeGen/SystemZ/builtins-systemz-zvector2-constrained.c
@@ -495,16 +495,16 @@ void test_float(void) {
   // CHECK-ASM: vfidb %{{.*}}, %{{.*}}, 4, 7
 
   vf = vec_roundz(vf);
-  // CHECK: call <4 x float> @llvm.experimental.constrained.trunc.v4f32(<4 x float> %{{.*}}, metadata !{{.*}})
+  // CHECK: call <4 x float> @llvm.trunc.v4f32(<4 x float> %{{.*}}) #[[ATTR:[0-9]+]] [ "fpe.except"(metadata !"strict") ]
   // CHECK-ASM: vfisb %{{.*}}, %{{.*}}, 4, 5
   vf = vec_trunc(vf);
-  // CHECK: call <4 x float> @llvm.experimental.constrained.trunc.v4f32(<4 x float> %{{.*}}, metadata !{{.*}})
+  // CHECK: call <4 x float> @llvm.trunc.v4f32(<4 x float> %{{.*}}) #[[ATTR:[0-9]+]] [ "fpe.except"(metadata !"strict") ]
   // CHECK-ASM: vfisb %{{.*}}, %{{.*}}, 4, 5
   vd = vec_roundz(vd);
-  // CHECK: call <2 x double> @llvm.experimental.constrained.trunc.v2f64(<2 x double> %{{.*}}, metadata !{{.*}})
+  // CHECK: call <2 x double> @llvm.trunc.v2f64(<2 x double> %{{.*}}) #[[ATTR:[0-9]+]] [ "fpe.except"(metadata !"strict") ]
   // CHECK-ASM: vfidb %{{.*}}, %{{.*}}, 4, 5
   vd = vec_trunc(vd);
-  // CHECK: call <2 x double> @llvm.experimental.constrained.trunc.v2f64(<2 x double> %{{.*}}, metadata !{{.*}})
+  // CHECK: call <2 x double> @llvm.trunc.v2f64(<2 x double> %{{.*}}) #[[ATTR:[0-9]+]] [ "fpe.except"(metadata !"strict") ]
   // CHECK-ASM: vfidb %{{.*}}, %{{.*}}, 4, 5
 
   vf = vec_roundc(vf);
@@ -541,3 +541,5 @@ void test_float(void) {
   // CHECK: call { <2 x i64>, i32 } @llvm.s390.vftcidb(<2 x double> %{{.*}}, i32 4095)
   // CHECK-ASM: vftcidb
 }
+
+// CHECK: attributes #[[ATTR]] = { strictfp memory(inaccessiblemem: readwrite) }
\ No newline at end of file
diff --git a/clang/test/CodeGen/arm64-vrnd-constrained.c b/clang/test/CodeGen/arm64-vrnd-constrained.c
index ccf729a6a25ef6..e690f26b0def52 100644
--- a/clang/test/CodeGen/arm64-vrnd-constrained.c
+++ b/clang/test/CodeGen/arm64-vrnd-constrained.c
@@ -14,7 +14,7 @@
 float64x2_t rnd5(float64x2_t a) { return vrndq_f64(a); }
 // COMMON-LABEL: rnd5
 // UNCONSTRAINED: call <2 x double> @llvm.trunc.v2f64(<2 x double>
-// CONSTRAINED:   call <2 x double> @llvm.experimental.constrained.trunc.v2f64(<2 x double>
+// CONSTRAINED:   call <2 x double> @llvm.trunc.v2f64(<2 x double> %{{.*}}) #[[ATTR:[0-9]+]] [ "fpe.except"(metadata !"strict") ]
 // CHECK-ASM:     frintz.2d v{{[0-9]+}}, v{{[0-9]+}}
 
 float64x2_t rnd13(float64x2_t a) { return vrndmq_f64(a); }
@@ -41,3 +41,5 @@ float64x2_t rnd25(float64x2_t a) { return vrndxq_f64(a); }
 // CONSTRAINED:   call <2 x double> @llvm.experimental.constrained.rint.v2f64(<2 x double>
 // CHECK-ASM:     frintx.2d v{{[0-9]+}}, v{{[0-9]+}}
 
+// CHECK: attributes #[[ATTR]] = { strictfp memory(inaccessiblemem: readwrite) }
+
diff --git a/clang/test/CodeGen/constrained-math-builtins.c b/clang/test/CodeGen/constrained-math-builtins.c
index 68b9e75283c547..f5136cd18e0eff 100644
--- a/clang/test/CodeGen/constrained-math-builtins.c
+++ b/clang/test/CodeGen/constrained-math-builtins.c
@@ -242,10 +242,10 @@ __builtin_atan2(f,f);        __builtin_atan2f(f,f);       __builtin_atan2l(f,f);
 
   __builtin_trunc(f);      __builtin_truncf(f);     __builtin_truncl(f); __builtin_truncf128(f);
 
-// CHECK: call double @llvm.experimental.constrained.trunc.f64(double %{{.*}}, metadata !"fpexcept.strict")
-// CHECK: call float @llvm.experimental.constrained.trunc.f32(float %{{.*}}, metadata !"fpexcept.strict")
-// CHECK: call x86_fp80 @llvm.experimental.constrained.trunc.f80(x86_fp80 %{{.*}}, metadata !"fpexcept.strict")
-// CHECK: call fp128 @llvm.experimental.constrained.trunc.f128(fp128 %{{.*}}, metadata !"fpexcept.strict")
+// CHECK: call double @llvm.trunc.f64(double %{{.*}}) #[[ATTR_CALL:[0-9]+]] [ "fpe.except"(metadata !"strict") ]
+// CHECK: call float @llvm.trunc.f32(float %{{.*}}) #[[ATTR_CALL]] [ "fpe.except"(metadata !"strict") ]
+// CHECK: call x86_fp80 @llvm.trunc.f80(x86_fp80 %{{.*}}) #[[ATTR_CALL]] [ "fpe.except"(metadata !"strict") ]
+// CHECK: call fp128 @llvm.trunc.f128(fp128 %{{.*}}) #[[ATTR_CALL]] [ "fpe.except"(metadata !"strict") ]
 };
 
 // CHECK: declare double @llvm.experimental.constrained.frem.f64(double, double, metadata, metadata)
@@ -377,10 +377,10 @@ __builtin_atan2(f,f);        __builtin_atan2f(f,f);       __builtin_atan2l(f,f);
 // CHECK: declare x86_fp80 @llvm.experimental.constrained.tan.f80(x86_fp80, metadata, metadata)
 // CHECK: declare fp128 @llvm.experimental.constrained.tan.f128(fp128, metadata, metadata)
 
-// CHECK: declare double @llvm.experimental.constrained.trunc.f64(double, metadata)
-// CHECK: declare float @llvm.experimental.constrained.trunc.f32(float, metadata)
-// CHECK: declare x86_fp80 @llvm.experimental.constrained.trunc.f80(x86_fp80, metadata)
-// CHECK: declare fp128 @llvm.experimental.constrained.trunc.f128(fp128, metadata)
+// CHECK: declare double @llvm.trunc.f64(double) #[[ATTR_FUNC:[0-9]+]]
+// CHECK: declare float @llvm.trunc.f32(float) #[[ATTR_FUNC]]
+// CHECK: declare x86_fp80 @llvm.trunc.f80(x86_fp80) #[[ATTR_FUNC]]
+// CHECK: declare fp128 @llvm.trunc.f128(fp128) #[[ATTR_FUNC]]
 
 #pragma STDC FP_CONTRACT ON
 void bar(float f) {
@@ -401,3 +401,6 @@ void bar(float f) {
   // CHECK: fneg
   // CHECK: call float @llvm.experimental.constrained.fmuladd.f32(float %{{.*}}, float %{{.*}}, float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
 };
+
+// CHECK: attributes #[[ATTR_FUNC]] = { {{.*}} memory(none) }
+// CHECK: attributes #[[ATTR_CALL]] = { strictfp memory(inaccessiblemem: readwrite) }
diff --git a/llvm/include/llvm/CodeGen/SelectionDAGNodes.h b/llvm/include/llvm/CodeGen/SelectionDAGNodes.h
index 677b59e0c8fbeb..9dc831ef23273d 100644
--- a/llvm/include/llvm/CodeGen/SelectionDAGNodes.h
+++ b/llvm/include/llvm/CodeGen/SelectionDAGNodes.h
@@ -721,6 +721,7 @@ END_TWO_BYTE_PACK()
       case ISD::STRICT_FP_TO_BF16:
 #define DAG_INSTRUCTION(NAME, NARG, ROUND_MODE, INTRINSIC, DAGN)               \
       case ISD::STRICT_##DAGN:
+#define LEGACY_FUNCTION DAG_INSTRUCTION
 #include "llvm/IR/ConstrainedOps.def"
         return true;
     }
diff --git a/llvm/include/llvm/CodeGen/TargetLowering.h b/llvm/include/llvm/CodeGen/TargetLowering.h
index 6a41094ff933b0..7ccaf9558077c0 100644
--- a/llvm/include/llvm/CodeGen/TargetLowering.h
+++ b/llvm/include/llvm/CodeGen/TargetLowering.h
@@ -1324,6 +1324,7 @@ class TargetLoweringBase {
       default: llvm_unreachable("Unexpected FP pseudo-opcode");
 #define DAG_INSTRUCTION(NAME, NARG, ROUND_MODE, INTRINSIC, DAGN)               \
       case ISD::STRICT_##DAGN: EqOpc = ISD::DAGN; break;
+#define LEGACY_FUNCTION DAG_INSTRUCTION
 #define CMP_INSTRUCTION(NAME, NARG, ROUND_MODE, INTRINSIC, DAGN)               \
       case ISD::STRICT_##DAGN: EqOpc = ISD::SETCC; break;
 #include "llvm/IR/ConstrainedOps.def"
diff --git a/llvm/include/llvm/IR/ConstrainedOps.def b/llvm/include/llvm/IR/ConstrainedOps.def
index 30a82bf633d575..2b1...
[truncated]

@llvmbot
Copy link
Member

llvmbot commented Dec 2, 2024

@llvm/pr-subscribers-backend-powerpc

Author: Serge Pavlov (spavloff)

Changes

Previously the function 'trunc' in non-default floating-point environment was implemented with a special LLVM intrinsic 'experimental.constrained.trunc'. Introduction of floating-point operand bundles allows expressing the interaction with the FP environment using the same intrinsic as for the default mode.

This changes removes 'llvm.experimental.constrained.trunc' and use 'llvm.trunc' in all cases.


Patch is 138.79 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/118253.diff

66 Files Affected:

  • (modified) clang/lib/CodeGen/CGBuiltin.cpp (+26-26)
  • (modified) clang/test/CodeGen/AArch64/neon-intrinsics-constrained.c (+1-1)
  • (modified) clang/test/CodeGen/AArch64/v8.2a-fp16-intrinsics-constrained.c (+3-1)
  • (modified) clang/test/CodeGen/PowerPC/builtins-ppc-fpconstrained.c (+4-2)
  • (modified) clang/test/CodeGen/SystemZ/builtins-systemz-vector-constrained.c (+3-1)
  • (modified) clang/test/CodeGen/SystemZ/builtins-systemz-vector2-constrained.c (+2-1)
  • (modified) clang/test/CodeGen/SystemZ/builtins-systemz-zvector-constrained.c (+4-2)
  • (modified) clang/test/CodeGen/SystemZ/builtins-systemz-zvector2-constrained.c (+6-4)
  • (modified) clang/test/CodeGen/arm64-vrnd-constrained.c (+3-1)
  • (modified) clang/test/CodeGen/constrained-math-builtins.c (+11-8)
  • (modified) llvm/include/llvm/CodeGen/SelectionDAGNodes.h (+1)
  • (modified) llvm/include/llvm/CodeGen/TargetLowering.h (+1)
  • (modified) llvm/include/llvm/IR/ConstrainedOps.def (+7-1)
  • (modified) llvm/include/llvm/IR/Function.h (+1-1)
  • (modified) llvm/include/llvm/IR/InstrTypes.h (+3)
  • (modified) llvm/include/llvm/IR/IntrinsicInst.h (+12)
  • (modified) llvm/include/llvm/IR/Intrinsics.h (+4-3)
  • (modified) llvm/include/llvm/IR/Intrinsics.td (-3)
  • (modified) llvm/lib/Analysis/ConstantFolding.cpp (+6-7)
  • (modified) llvm/lib/AsmParser/LLParser.cpp (+9)
  • (modified) llvm/lib/CodeGen/ExpandVectorPredication.cpp (+1-1)
  • (modified) llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp (+6)
  • (modified) llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp (+2)
  • (modified) llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp (+3)
  • (modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp (+1)
  • (modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp (+16-3)
  • (modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h (+1-1)
  • (modified) llvm/lib/CodeGen/TargetLoweringBase.cpp (+2-1)
  • (modified) llvm/lib/IR/AutoUpgrade.cpp (+65-7)
  • (modified) llvm/lib/IR/Function.cpp (+2-2)
  • (modified) llvm/lib/IR/Instructions.cpp (+5)
  • (modified) llvm/lib/IR/IntrinsicInst.cpp (+31-1)
  • (modified) llvm/lib/IR/Intrinsics.cpp (+1-1)
  • (modified) llvm/lib/Transforms/Utils/Local.cpp (+3-4)
  • (modified) llvm/test/Assembler/fp-intrinsics-attr.ll (+5-7)
  • (modified) llvm/test/Bitcode/auto-upgrade-constrained.ll (+1-1)
  • (modified) llvm/test/CodeGen/AArch64/fp-intrinsics-fp16.ll (+1-2)
  • (modified) llvm/test/CodeGen/AArch64/fp-intrinsics-vector.ll (+3-6)
  • (modified) llvm/test/CodeGen/AArch64/fp-intrinsics.ll (+3-6)
  • (modified) llvm/test/CodeGen/ARM/fp-intrinsics.ll (+2-2)
  • (modified) llvm/test/CodeGen/PowerPC/fp-strict-round.ll (+4-17)
  • (modified) llvm/test/CodeGen/PowerPC/ppcf128-constrained-fp-intrinsics.ll (+1-4)
  • (modified) llvm/test/CodeGen/PowerPC/vector-constrained-fp-intrinsics.ll (+4-17)
  • (modified) llvm/test/CodeGen/RISCV/double-intrinsics-strict.ll (+1-3)
  • (modified) llvm/test/CodeGen/RISCV/float-intrinsics-strict.ll (+1-3)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-ftrunc-constrained-sdnode.ll (+15-30)
  • (modified) llvm/test/CodeGen/RISCV/rvv/ftrunc-constrained-sdnode.ll (+15-30)
  • (modified) llvm/test/CodeGen/RISCV/zfh-half-intrinsics-strict.ll (+1-3)
  • (modified) llvm/test/CodeGen/RISCV/zfhmin-half-intrinsics-strict.ll (+1-3)
  • (modified) llvm/test/CodeGen/SystemZ/fp-strict-round-01.ll (+3-12)
  • (modified) llvm/test/CodeGen/SystemZ/fp-strict-round-02.ll (+3-12)
  • (modified) llvm/test/CodeGen/SystemZ/fp-strict-round-03.ll (+3-12)
  • (modified) llvm/test/CodeGen/SystemZ/vec-strict-round-01.ll (+2-8)
  • (modified) llvm/test/CodeGen/SystemZ/vec-strict-round-02.ll (+2-8)
  • (modified) llvm/test/CodeGen/SystemZ/vector-constrained-fp-intrinsics.ll (+4-17)
  • (modified) llvm/test/CodeGen/X86/fp-strict-scalar-round-fp16.ll (+2-4)
  • (modified) llvm/test/CodeGen/X86/fp-strict-scalar-round.ll (+2-6)
  • (modified) llvm/test/CodeGen/X86/fp128-libcalls-strict.ll (+1-2)
  • (modified) llvm/test/CodeGen/X86/fp80-strict-libcalls.ll (+1-2)
  • (modified) llvm/test/CodeGen/X86/vec-strict-256-fp16.ll (+1-3)
  • (modified) llvm/test/CodeGen/X86/vec-strict-256.ll (+2-6)
  • (modified) llvm/test/CodeGen/X86/vec-strict-512-fp16.ll (+1-2)
  • (modified) llvm/test/CodeGen/X86/vec-strict-512.ll (+2-4)
  • (modified) llvm/test/CodeGen/X86/vec-strict-round-128.ll (+2-6)
  • (modified) llvm/test/CodeGen/X86/vector-constrained-fp-intrinsics.ll (+4-17)
  • (modified) llvm/test/Transforms/InstSimplify/constfold-constrained.ll (+24-25)
diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index cb9c23b8e0a0d0..52b2d3320c60ea 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -657,6 +657,17 @@ static Value *emitUnaryMaybeConstrainedFPBuiltin(CodeGenFunction &CGF,
   }
 }
 
+// Emit a simple mangled intrinsic that has 1 argument and a return type
+// matching the argument type.
+static Value *emitUnaryFPBuiltin(CodeGenFunction &CGF, const CallExpr *E,
+                                 unsigned IntrinsicID) {
+  llvm::Value *Src0 = CGF.EmitScalarExpr(E->getArg(0));
+
+  CodeGenFunction::CGFPOptionsRAII FPOptsRAII(CGF, E);
+  Function *F = CGF.CGM.getIntrinsic(IntrinsicID, Src0->getType());
+  return CGF.Builder.CreateCall(F, Src0);
+}
+
 // Emit an intrinsic that has 2 operands of the same type as its result.
 // Depending on mode, this may be a constrained floating-point intrinsic.
 static Value *emitBinaryMaybeConstrainedFPBuiltin(CodeGenFunction &CGF,
@@ -3238,9 +3249,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
     case Builtin::BI__builtin_truncf16:
     case Builtin::BI__builtin_truncl:
     case Builtin::BI__builtin_truncf128:
-      return RValue::get(emitUnaryMaybeConstrainedFPBuiltin(*this, E,
-                                   Intrinsic::trunc,
-                                   Intrinsic::experimental_constrained_trunc));
+      return RValue::get(emitUnaryFPBuiltin(*this, E, Intrinsic::trunc));
 
     case Builtin::BIlround:
     case Builtin::BIlroundf:
@@ -6827,7 +6836,7 @@ Value *CodeGenFunction::EmitNeonCall(Function *F, SmallVectorImpl<Value*> &Ops,
   unsigned j = 0;
   for (Function::const_arg_iterator ai = F->arg_begin(), ae = F->arg_end();
        ai != ae; ++ai, ++j) {
-    if (F->isConstrainedFPIntrinsic())
+    if (F->isLegacyConstrainedIntrinsic())
       if (ai->getType()->isMetadataTy())
         continue;
     if (shift > 0 && shift == j)
@@ -6836,7 +6845,7 @@ Value *CodeGenFunction::EmitNeonCall(Function *F, SmallVectorImpl<Value*> &Ops,
       Ops[j] = Builder.CreateBitCast(Ops[j], ai->getType(), name);
   }
 
-  if (F->isConstrainedFPIntrinsic())
+  if (F->isLegacyConstrainedIntrinsic())
     return Builder.CreateConstrainedFPCall(F, Ops, name);
   else
     return Builder.CreateCall(F, Ops, name);
@@ -12989,13 +12998,11 @@ Value *CodeGenFunction::EmitAArch64BuiltinExpr(unsigned BuiltinID,
               : Intrinsic::rint;
     return EmitNeonCall(CGM.getIntrinsic(Int, Ty), Ops, "vrndx");
   }
-  case NEON::BI__builtin_neon_vrndh_f16: {
+  case NEON::BI__builtin_neon_vrndh_f16:
     Ops.push_back(EmitScalarExpr(E->getArg(0)));
-    Int = Builder.getIsFPConstrained()
-              ? Intrinsic::experimental_constrained_trunc
-              : Intrinsic::trunc;
-    return EmitNeonCall(CGM.getIntrinsic(Int, HalfTy), Ops, "vrndz");
-  }
+    return EmitNeonCall(CGM.getIntrinsic(Intrinsic::trunc, HalfTy), Ops,
+                        "vrndz");
+
   case NEON::BI__builtin_neon_vrnd32x_f32:
   case NEON::BI__builtin_neon_vrnd32xq_f32:
   case NEON::BI__builtin_neon_vrnd32x_f64:
@@ -13029,12 +13036,9 @@ Value *CodeGenFunction::EmitAArch64BuiltinExpr(unsigned BuiltinID,
     return EmitNeonCall(CGM.getIntrinsic(Int, Ty), Ops, "vrnd64z");
   }
   case NEON::BI__builtin_neon_vrnd_v:
-  case NEON::BI__builtin_neon_vrndq_v: {
-    Int = Builder.getIsFPConstrained()
-              ? Intrinsic::experimental_constrained_trunc
-              : Intrinsic::trunc;
-    return EmitNeonCall(CGM.getIntrinsic(Int, Ty), Ops, "vrndz");
-  }
+  case NEON::BI__builtin_neon_vrndq_v:
+    return EmitNeonCall(CGM.getIntrinsic(Intrinsic::trunc, Ty), Ops, "vrndz");
+
   case NEON::BI__builtin_neon_vcvt_f64_v:
   case NEON::BI__builtin_neon_vcvtq_f64_v:
     Ops[0] = Builder.CreateBitCast(Ops[0], Ty);
@@ -18251,9 +18255,8 @@ Value *CodeGenFunction::EmitPPCBuiltinExpr(unsigned BuiltinID,
                : Intrinsic::ceil;
     else if (BuiltinID == PPC::BI__builtin_vsx_xvrdpiz ||
              BuiltinID == PPC::BI__builtin_vsx_xvrspiz)
-      ID = Builder.getIsFPConstrained()
-               ? Intrinsic::experimental_constrained_trunc
-               : Intrinsic::trunc;
+      return emitUnaryFPBuiltin(*this, E, Intrinsic::trunc);
+
     llvm::Function *F = CGM.getIntrinsic(ID, ResultType);
     return Builder.getIsFPConstrained() ? Builder.CreateConstrainedFPCall(F, X)
                                         : Builder.CreateCall(F, X);
@@ -18754,9 +18757,7 @@ Value *CodeGenFunction::EmitPPCBuiltinExpr(unsigned BuiltinID,
         .getScalarVal();
   case PPC::BI__builtin_ppc_friz:
   case PPC::BI__builtin_ppc_frizs:
-    return RValue::get(emitUnaryMaybeConstrainedFPBuiltin(
-                           *this, E, Intrinsic::trunc,
-                           Intrinsic::experimental_constrained_trunc))
+    return RValue::get(emitUnaryFPBuiltin(*this, E, Intrinsic::trunc))
         .getScalarVal();
   case PPC::BI__builtin_ppc_fsqrt:
   case PPC::BI__builtin_ppc_fsqrts:
@@ -20536,8 +20537,7 @@ Value *CodeGenFunction::EmitSystemZBuiltinExpr(unsigned BuiltinID,
               CI = Intrinsic::experimental_constrained_nearbyint; break;
       case 1: ID = Intrinsic::round;
               CI = Intrinsic::experimental_constrained_round; break;
-      case 5: ID = Intrinsic::trunc;
-              CI = Intrinsic::experimental_constrained_trunc; break;
+      case 5: ID = Intrinsic::trunc; break;
       case 6: ID = Intrinsic::ceil;
               CI = Intrinsic::experimental_constrained_ceil; break;
       case 7: ID = Intrinsic::floor;
@@ -20546,7 +20546,7 @@ Value *CodeGenFunction::EmitSystemZBuiltinExpr(unsigned BuiltinID,
       break;
     }
     if (ID != Intrinsic::not_intrinsic) {
-      if (Builder.getIsFPConstrained()) {
+      if (Builder.getIsFPConstrained() && ID != Intrinsic::trunc) {
         Function *F = CGM.getIntrinsic(CI, ResultType);
         return Builder.CreateConstrainedFPCall(F, X);
       } else {
diff --git a/clang/test/CodeGen/AArch64/neon-intrinsics-constrained.c b/clang/test/CodeGen/AArch64/neon-intrinsics-constrained.c
index 15ae7eea820e80..0405cf7f19c73b 100644
--- a/clang/test/CodeGen/AArch64/neon-intrinsics-constrained.c
+++ b/clang/test/CodeGen/AArch64/neon-intrinsics-constrained.c
@@ -792,7 +792,7 @@ float64x1_t test_vrndx_f64(float64x1_t a) {
 // COMMON-LABEL: test_vrnd_f64
 // COMMONIR:      [[TMP0:%.*]] = bitcast <1 x double> %a to <8 x i8>
 // UNCONSTRAINED: [[VRNDZ1_I:%.*]] = call <1 x double> @llvm.trunc.v1f64(<1 x double> %a)
-// CONSTRAINED:   [[VRNDZ1_I:%.*]] = call <1 x double> @llvm.experimental.constrained.trunc.v1f64(<1 x double> %a, metadata !"fpexcept.strict")
+// CONSTRAINED:   [[VRNDZ1_I:%.*]] = call <1 x double> @llvm.trunc.v1f64(<1 x double> %a) #[[ATTR:[0-9]+]] [ "fpe.except"(metadata !"strict") ]
 // COMMONIR:      ret <1 x double> [[VRNDZ1_I]]
 float64x1_t test_vrnd_f64(float64x1_t a) {
   return vrnd_f64(a);
diff --git a/clang/test/CodeGen/AArch64/v8.2a-fp16-intrinsics-constrained.c b/clang/test/CodeGen/AArch64/v8.2a-fp16-intrinsics-constrained.c
index 9109626cea9ca2..9079a6690b9db3 100644
--- a/clang/test/CodeGen/AArch64/v8.2a-fp16-intrinsics-constrained.c
+++ b/clang/test/CodeGen/AArch64/v8.2a-fp16-intrinsics-constrained.c
@@ -150,7 +150,7 @@ uint64_t test_vcvth_u64_f16 (float16_t a) {
 
 // COMMON-LABEL: test_vrndh_f16
 // UNCONSTRAINED:  [[RND:%.*]] = call half @llvm.trunc.f16(half %a)
-// CONSTRAINED:    [[RND:%.*]] = call half @llvm.experimental.constrained.trunc.f16(half %a, metadata !"fpexcept.strict")
+// CONSTRAINED:    [[RND:%.*]] = call half @llvm.trunc.f16(half %a) #[[ATTR:[0-9]+]] [ "fpe.except"(metadata !"strict") ]
 // COMMONIR:       ret half [[RND]]
 float16_t test_vrndh_f16(float16_t a) {
   return vrndh_f16(a);
@@ -298,3 +298,5 @@ float16_t test_vfmsh_f16(float16_t a, float16_t b, float16_t c) {
   return vfmsh_f16(a, b, c);
 }
 
+// CHECK: attributes #[[ATTR]] = { strictfp memory(inaccessiblemem: readwrite) }
+
diff --git a/clang/test/CodeGen/PowerPC/builtins-ppc-fpconstrained.c b/clang/test/CodeGen/PowerPC/builtins-ppc-fpconstrained.c
index 838db02415fe5b..b326f131a56e54 100644
--- a/clang/test/CodeGen/PowerPC/builtins-ppc-fpconstrained.c
+++ b/clang/test/CodeGen/PowerPC/builtins-ppc-fpconstrained.c
@@ -85,13 +85,13 @@ void test_float(void) {
   vf = __builtin_vsx_xvrspiz(vf);
   // CHECK-LABEL: try-xvrspiz
   // CHECK-UNCONSTRAINED: @llvm.trunc.v4f32(<4 x float> %{{.*}})
-  // CHECK-CONSTRAINED: @llvm.experimental.constrained.trunc.v4f32(<4 x float> %{{.*}}, metadata !"fpexcept.strict")
+  // CHECK-CONSTRAINED: @llvm.trunc.v4f32(<4 x float> %{{.*}}) #[[ATTR:[0-9]+]] [ "fpe.except"(metadata !"strict") ]
   // CHECK-ASM: xvrspiz
 
   vd = __builtin_vsx_xvrdpiz(vd);
   // CHECK-LABEL: try-xvrdpiz
   // CHECK-UNCONSTRAINED: @llvm.trunc.v2f64(<2 x double> %{{.*}})
-  // CHECK-CONSTRAINED: @llvm.experimental.constrained.trunc.v2f64(<2 x double> %{{.*}}, metadata !"fpexcept.strict")
+  // CHECK-CONSTRAINED: @llvm.trunc.v2f64(<2 x double> %{{.*}}) #[[ATTR]] [ "fpe.except"(metadata !"strict") ]
   // CHECK-ASM: xvrdpiz
 
   vf = __builtin_vsx_xvmaddasp(vf, vf, vf);
@@ -156,3 +156,5 @@ void test_float(void) {
   // CHECK-CONSTRAINED: fneg <2 x double> [[RESULT1]]
   // CHECK-ASM: xvnmsubadp
 }
+
+// CHECK-CONSTRAINED: attributes #[[ATTR]] = { strictfp memory(inaccessiblemem: readwrite) }
diff --git a/clang/test/CodeGen/SystemZ/builtins-systemz-vector-constrained.c b/clang/test/CodeGen/SystemZ/builtins-systemz-vector-constrained.c
index 6d2845504a39f0..77ede2c10eea08 100644
--- a/clang/test/CodeGen/SystemZ/builtins-systemz-vector-constrained.c
+++ b/clang/test/CodeGen/SystemZ/builtins-systemz-vector-constrained.c
@@ -45,7 +45,7 @@ void test_float(void) {
   vd = __builtin_s390_vfidb(vd, 4, 1);
   // CHECK: call <2 x double> @llvm.experimental.constrained.round.v2f64(<2 x double> %{{.*}})
   vd = __builtin_s390_vfidb(vd, 4, 5);
-  // CHECK: call <2 x double> @llvm.experimental.constrained.trunc.v2f64(<2 x double> %{{.*}})
+  // CHECK: call <2 x double> @llvm.trunc.v2f64(<2 x double> %{{.*}}) #[[ATTR:[0-9]+]] [ "fpe.except"(metadata !"strict") ]
   vd = __builtin_s390_vfidb(vd, 4, 6);
   // CHECK: call <2 x double> @llvm.experimental.constrained.ceil.v2f64(<2 x double> %{{.*}})
   vd = __builtin_s390_vfidb(vd, 4, 7);
@@ -53,3 +53,5 @@ void test_float(void) {
   vd = __builtin_s390_vfidb(vd, 4, 4);
   // CHECK: call <2 x double> @llvm.s390.vfidb(<2 x double> %{{.*}}, i32 4, i32 4)
 }
+
+// CHECK: attributes #[[ATTR]] = { strictfp memory(inaccessiblemem: readwrite) }
diff --git a/clang/test/CodeGen/SystemZ/builtins-systemz-vector2-constrained.c b/clang/test/CodeGen/SystemZ/builtins-systemz-vector2-constrained.c
index 735b6a0249ab62..7488cf90a9669d 100644
--- a/clang/test/CodeGen/SystemZ/builtins-systemz-vector2-constrained.c
+++ b/clang/test/CodeGen/SystemZ/builtins-systemz-vector2-constrained.c
@@ -60,10 +60,11 @@ void test_float(void) {
   vf = __builtin_s390_vfisb(vf, 4, 1);
   // CHECK: call <4 x float> @llvm.experimental.constrained.round.v4f32(<4 x float> %{{.*}}, metadata !{{.*}})
   vf = __builtin_s390_vfisb(vf, 4, 5);
-  // CHECK: call <4 x float> @llvm.experimental.constrained.trunc.v4f32(<4 x float> %{{.*}}, metadata !{{.*}})
+  // CHECK: call <4 x float> @llvm.trunc.v4f32(<4 x float> %{{.*}}) #[[ATTR:[0-9]+]] [ "fpe.except"(metadata !"strict") ]
   vf = __builtin_s390_vfisb(vf, 4, 6);
   // CHECK: call <4 x float> @llvm.experimental.constrained.ceil.v4f32(<4 x float> %{{.*}}, metadata !{{.*}})
   vf = __builtin_s390_vfisb(vf, 4, 7);
   // CHECK: call <4 x float> @llvm.experimental.constrained.floor.v4f32(<4 x float> %{{.*}}, metadata !{{.*}})
 }
 
+// CHECK: attributes #[[ATTR]] = { strictfp memory(inaccessiblemem: readwrite) }
diff --git a/clang/test/CodeGen/SystemZ/builtins-systemz-zvector-constrained.c b/clang/test/CodeGen/SystemZ/builtins-systemz-zvector-constrained.c
index 6a1f8f0e923f65..fe964fa38aee07 100644
--- a/clang/test/CodeGen/SystemZ/builtins-systemz-zvector-constrained.c
+++ b/clang/test/CodeGen/SystemZ/builtins-systemz-zvector-constrained.c
@@ -303,10 +303,10 @@ void test_float(void) {
   // CHECK: call <2 x double> @llvm.experimental.constrained.floor.v2f64(<2 x double> %{{.*}}, metadata !{{.*}})
   // CHECK-ASM: vfidb %{{.*}}, %{{.*}}, 4, 7
   vd = vec_roundz(vd);
-  // CHECK: call <2 x double> @llvm.experimental.constrained.trunc.v2f64(<2 x double> %{{.*}}, metadata !{{.*}})
+  // CHECK: call <2 x double> @llvm.trunc.v2f64(<2 x double> %{{.*}}) #[[ATTR:[0-9]+]] [ "fpe.except"(metadata !"strict") ]
   // CHECK-ASM: vfidb %{{.*}}, %{{.*}}, 4, 5
   vd = vec_trunc(vd);
-  // CHECK: call <2 x double> @llvm.experimental.constrained.trunc.v2f64(<2 x double> %{{.*}}, metadata !{{.*}})
+  // CHECK: call <2 x double> @llvm.trunc.v2f64(<2 x double> %{{.*}}) #[[ATTR]] [ "fpe.except"(metadata !"strict") ]
   // CHECK-ASM: vfidb %{{.*}}, %{{.*}}, 4, 5
   vd = vec_roundc(vd);
   // CHECK: call <2 x double> @llvm.experimental.constrained.nearbyint.v2f64(<2 x double> %{{.*}}, metadata !{{.*}})
@@ -316,3 +316,5 @@ void test_float(void) {
   // CHECK-ASM: vfidb %{{.*}}, %{{.*}}, 0, 0
   vd = vec_round(vd);
 }
+
+// CHECK: attributes #[[ATTR]] = { strictfp memory(inaccessiblemem: readwrite) }
diff --git a/clang/test/CodeGen/SystemZ/builtins-systemz-zvector2-constrained.c b/clang/test/CodeGen/SystemZ/builtins-systemz-zvector2-constrained.c
index 750f5011a26798..e7ea4e325862e9 100644
--- a/clang/test/CodeGen/SystemZ/builtins-systemz-zvector2-constrained.c
+++ b/clang/test/CodeGen/SystemZ/builtins-systemz-zvector2-constrained.c
@@ -495,16 +495,16 @@ void test_float(void) {
   // CHECK-ASM: vfidb %{{.*}}, %{{.*}}, 4, 7
 
   vf = vec_roundz(vf);
-  // CHECK: call <4 x float> @llvm.experimental.constrained.trunc.v4f32(<4 x float> %{{.*}}, metadata !{{.*}})
+  // CHECK: call <4 x float> @llvm.trunc.v4f32(<4 x float> %{{.*}}) #[[ATTR:[0-9]+]] [ "fpe.except"(metadata !"strict") ]
   // CHECK-ASM: vfisb %{{.*}}, %{{.*}}, 4, 5
   vf = vec_trunc(vf);
-  // CHECK: call <4 x float> @llvm.experimental.constrained.trunc.v4f32(<4 x float> %{{.*}}, metadata !{{.*}})
+  // CHECK: call <4 x float> @llvm.trunc.v4f32(<4 x float> %{{.*}}) #[[ATTR:[0-9]+]] [ "fpe.except"(metadata !"strict") ]
   // CHECK-ASM: vfisb %{{.*}}, %{{.*}}, 4, 5
   vd = vec_roundz(vd);
-  // CHECK: call <2 x double> @llvm.experimental.constrained.trunc.v2f64(<2 x double> %{{.*}}, metadata !{{.*}})
+  // CHECK: call <2 x double> @llvm.trunc.v2f64(<2 x double> %{{.*}}) #[[ATTR:[0-9]+]] [ "fpe.except"(metadata !"strict") ]
   // CHECK-ASM: vfidb %{{.*}}, %{{.*}}, 4, 5
   vd = vec_trunc(vd);
-  // CHECK: call <2 x double> @llvm.experimental.constrained.trunc.v2f64(<2 x double> %{{.*}}, metadata !{{.*}})
+  // CHECK: call <2 x double> @llvm.trunc.v2f64(<2 x double> %{{.*}}) #[[ATTR:[0-9]+]] [ "fpe.except"(metadata !"strict") ]
   // CHECK-ASM: vfidb %{{.*}}, %{{.*}}, 4, 5
 
   vf = vec_roundc(vf);
@@ -541,3 +541,5 @@ void test_float(void) {
   // CHECK: call { <2 x i64>, i32 } @llvm.s390.vftcidb(<2 x double> %{{.*}}, i32 4095)
   // CHECK-ASM: vftcidb
 }
+
+// CHECK: attributes #[[ATTR]] = { strictfp memory(inaccessiblemem: readwrite) }
\ No newline at end of file
diff --git a/clang/test/CodeGen/arm64-vrnd-constrained.c b/clang/test/CodeGen/arm64-vrnd-constrained.c
index ccf729a6a25ef6..e690f26b0def52 100644
--- a/clang/test/CodeGen/arm64-vrnd-constrained.c
+++ b/clang/test/CodeGen/arm64-vrnd-constrained.c
@@ -14,7 +14,7 @@
 float64x2_t rnd5(float64x2_t a) { return vrndq_f64(a); }
 // COMMON-LABEL: rnd5
 // UNCONSTRAINED: call <2 x double> @llvm.trunc.v2f64(<2 x double>
-// CONSTRAINED:   call <2 x double> @llvm.experimental.constrained.trunc.v2f64(<2 x double>
+// CONSTRAINED:   call <2 x double> @llvm.trunc.v2f64(<2 x double> %{{.*}}) #[[ATTR:[0-9]+]] [ "fpe.except"(metadata !"strict") ]
 // CHECK-ASM:     frintz.2d v{{[0-9]+}}, v{{[0-9]+}}
 
 float64x2_t rnd13(float64x2_t a) { return vrndmq_f64(a); }
@@ -41,3 +41,5 @@ float64x2_t rnd25(float64x2_t a) { return vrndxq_f64(a); }
 // CONSTRAINED:   call <2 x double> @llvm.experimental.constrained.rint.v2f64(<2 x double>
 // CHECK-ASM:     frintx.2d v{{[0-9]+}}, v{{[0-9]+}}
 
+// CHECK: attributes #[[ATTR]] = { strictfp memory(inaccessiblemem: readwrite) }
+
diff --git a/clang/test/CodeGen/constrained-math-builtins.c b/clang/test/CodeGen/constrained-math-builtins.c
index 68b9e75283c547..f5136cd18e0eff 100644
--- a/clang/test/CodeGen/constrained-math-builtins.c
+++ b/clang/test/CodeGen/constrained-math-builtins.c
@@ -242,10 +242,10 @@ __builtin_atan2(f,f);        __builtin_atan2f(f,f);       __builtin_atan2l(f,f);
 
   __builtin_trunc(f);      __builtin_truncf(f);     __builtin_truncl(f); __builtin_truncf128(f);
 
-// CHECK: call double @llvm.experimental.constrained.trunc.f64(double %{{.*}}, metadata !"fpexcept.strict")
-// CHECK: call float @llvm.experimental.constrained.trunc.f32(float %{{.*}}, metadata !"fpexcept.strict")
-// CHECK: call x86_fp80 @llvm.experimental.constrained.trunc.f80(x86_fp80 %{{.*}}, metadata !"fpexcept.strict")
-// CHECK: call fp128 @llvm.experimental.constrained.trunc.f128(fp128 %{{.*}}, metadata !"fpexcept.strict")
+// CHECK: call double @llvm.trunc.f64(double %{{.*}}) #[[ATTR_CALL:[0-9]+]] [ "fpe.except"(metadata !"strict") ]
+// CHECK: call float @llvm.trunc.f32(float %{{.*}}) #[[ATTR_CALL]] [ "fpe.except"(metadata !"strict") ]
+// CHECK: call x86_fp80 @llvm.trunc.f80(x86_fp80 %{{.*}}) #[[ATTR_CALL]] [ "fpe.except"(metadata !"strict") ]
+// CHECK: call fp128 @llvm.trunc.f128(fp128 %{{.*}}) #[[ATTR_CALL]] [ "fpe.except"(metadata !"strict") ]
 };
 
 // CHECK: declare double @llvm.experimental.constrained.frem.f64(double, double, metadata, metadata)
@@ -377,10 +377,10 @@ __builtin_atan2(f,f);        __builtin_atan2f(f,f);       __builtin_atan2l(f,f);
 // CHECK: declare x86_fp80 @llvm.experimental.constrained.tan.f80(x86_fp80, metadata, metadata)
 // CHECK: declare fp128 @llvm.experimental.constrained.tan.f128(fp128, metadata, metadata)
 
-// CHECK: declare double @llvm.experimental.constrained.trunc.f64(double, metadata)
-// CHECK: declare float @llvm.experimental.constrained.trunc.f32(float, metadata)
-// CHECK: declare x86_fp80 @llvm.experimental.constrained.trunc.f80(x86_fp80, metadata)
-// CHECK: declare fp128 @llvm.experimental.constrained.trunc.f128(fp128, metadata)
+// CHECK: declare double @llvm.trunc.f64(double) #[[ATTR_FUNC:[0-9]+]]
+// CHECK: declare float @llvm.trunc.f32(float) #[[ATTR_FUNC]]
+// CHECK: declare x86_fp80 @llvm.trunc.f80(x86_fp80) #[[ATTR_FUNC]]
+// CHECK: declare fp128 @llvm.trunc.f128(fp128) #[[ATTR_FUNC]]
 
 #pragma STDC FP_CONTRACT ON
 void bar(float f) {
@@ -401,3 +401,6 @@ void bar(float f) {
   // CHECK: fneg
   // CHECK: call float @llvm.experimental.constrained.fmuladd.f32(float %{{.*}}, float %{{.*}}, float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
 };
+
+// CHECK: attributes #[[ATTR_FUNC]] = { {{.*}} memory(none) }
+// CHECK: attributes #[[ATTR_CALL]] = { strictfp memory(inaccessiblemem: readwrite) }
diff --git a/llvm/include/llvm/CodeGen/SelectionDAGNodes.h b/llvm/include/llvm/CodeGen/SelectionDAGNodes.h
index 677b59e0c8fbeb..9dc831ef23273d 100644
--- a/llvm/include/llvm/CodeGen/SelectionDAGNodes.h
+++ b/llvm/include/llvm/CodeGen/SelectionDAGNodes.h
@@ -721,6 +721,7 @@ END_TWO_BYTE_PACK()
       case ISD::STRICT_FP_TO_BF16:
 #define DAG_INSTRUCTION(NAME, NARG, ROUND_MODE, INTRINSIC, DAGN)               \
       case ISD::STRICT_##DAGN:
+#define LEGACY_FUNCTION DAG_INSTRUCTION
 #include "llvm/IR/ConstrainedOps.def"
         return true;
     }
diff --git a/llvm/include/llvm/CodeGen/TargetLowering.h b/llvm/include/llvm/CodeGen/TargetLowering.h
index 6a41094ff933b0..7ccaf9558077c0 100644
--- a/llvm/include/llvm/CodeGen/TargetLowering.h
+++ b/llvm/include/llvm/CodeGen/TargetLowering.h
@@ -1324,6 +1324,7 @@ class TargetLoweringBase {
       default: llvm_unreachable("Unexpected FP pseudo-opcode");
 #define DAG_INSTRUCTION(NAME, NARG, ROUND_MODE, INTRINSIC, DAGN)               \
       case ISD::STRICT_##DAGN: EqOpc = ISD::DAGN; break;
+#define LEGACY_FUNCTION DAG_INSTRUCTION
 #define CMP_INSTRUCTION(NAME, NARG, ROUND_MODE, INTRINSIC, DAGN)               \
       case ISD::STRICT_##DAGN: EqOpc = ISD::SETCC; break;
 #include "llvm/IR/ConstrainedOps.def"
diff --git a/llvm/include/llvm/IR/ConstrainedOps.def b/llvm/include/llvm/IR/ConstrainedOps.def
index 30a82bf633d575..2b1...
[truncated]

Copy link

github-actions bot commented Dec 2, 2024

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff c4a1e0efe6b0767dfb5861a7e8814d7db0c0de8a 8e64ef3ee8baf97e9ad319486a4be3aacc71c75e --extensions c,h,cpp -- clang/lib/CodeGen/CGBuiltin.cpp clang/test/CodeGen/AArch64/neon-intrinsics-constrained.c clang/test/CodeGen/AArch64/v8.2a-fp16-intrinsics-constrained.c clang/test/CodeGen/PowerPC/builtins-ppc-fpconstrained.c clang/test/CodeGen/SystemZ/builtins-systemz-vector-constrained.c clang/test/CodeGen/SystemZ/builtins-systemz-vector2-constrained.c clang/test/CodeGen/SystemZ/builtins-systemz-zvector-constrained.c clang/test/CodeGen/SystemZ/builtins-systemz-zvector2-constrained.c clang/test/CodeGen/X86/strictfp_builtins.c clang/test/CodeGen/arm64-vrnd-constrained.c clang/test/CodeGen/constrained-math-builtins.c clang/test/CodeGen/strictfp_builtins.c llvm/include/llvm/AsmParser/LLParser.h llvm/include/llvm/CodeGen/SelectionDAGNodes.h llvm/include/llvm/CodeGen/TargetLowering.h llvm/include/llvm/IR/FPEnv.h llvm/include/llvm/IR/Function.h llvm/include/llvm/IR/IRBuilder.h llvm/include/llvm/IR/InstrTypes.h llvm/include/llvm/IR/IntrinsicInst.h llvm/include/llvm/IR/Intrinsics.h llvm/include/llvm/IR/LLVMContext.h llvm/lib/Analysis/ConstantFolding.cpp llvm/lib/AsmParser/LLParser.cpp llvm/lib/Bitcode/Reader/BitcodeReader.cpp llvm/lib/CodeGen/ExpandVectorPredication.cpp llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h llvm/lib/CodeGen/TargetLoweringBase.cpp llvm/lib/IR/AutoUpgrade.cpp llvm/lib/IR/FPEnv.cpp llvm/lib/IR/Function.cpp llvm/lib/IR/IRBuilder.cpp llvm/lib/IR/Instructions.cpp llvm/lib/IR/IntrinsicInst.cpp llvm/lib/IR/Intrinsics.cpp llvm/lib/IR/LLVMContext.cpp llvm/lib/IR/Verifier.cpp llvm/lib/Transforms/Scalar/TailRecursionElimination.cpp llvm/lib/Transforms/Utils/CloneFunction.cpp llvm/lib/Transforms/Utils/Local.cpp
View the diff from clang-format here.
diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index 52b2d3320c..13c9533a8b 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -20537,7 +20537,9 @@ Value *CodeGenFunction::EmitSystemZBuiltinExpr(unsigned BuiltinID,
               CI = Intrinsic::experimental_constrained_nearbyint; break;
       case 1: ID = Intrinsic::round;
               CI = Intrinsic::experimental_constrained_round; break;
-      case 5: ID = Intrinsic::trunc; break;
+      case 5:
+        ID = Intrinsic::trunc;
+        break;
       case 6: ID = Intrinsic::ceil;
               CI = Intrinsic::experimental_constrained_ceil; break;
       case 7: ID = Intrinsic::floor;

Copy link
Collaborator Author

spavloff commented Dec 2, 2024

Merge activity

  • Dec 2, 12:55 AM EST: The merge label 'FP Bundles' was detected. This PR will be added to the Graphite merge queue once it meets the requirements.
  • Dec 2, 12:55 AM EST: A user added this pull request to the Graphite merge queue.
  • Dec 2, 12:56 AM EST: The Graphite merge queue couldn't merge this PR because it failed for an unknown reason (Stack merges are not currently supported for forked repositories. Please create a branch in the target repository in order to merge).

@graphite-app graphite-app bot changed the base branch from users/spavloff/floating-point-operator-bundles to main December 2, 2024 05:55
@graphite-app graphite-app bot removed the FP Bundles label Dec 2, 2024
@@ -251,10 +251,12 @@ static bool markTails(Function &F, OptimizationRemarkEmitter *ORE) {

// Special-case operand bundles "clang.arc.attachedcall", "ptrauth", and
// "kcfi".
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment needs to be updated. I suggest removing the list that is present in the code and replacing it with text explaining why these operand bundles are special cases.

@efriedma-quic
Copy link
Collaborator

llvm.trunc is currently marked IntrNoMem in Intrinsics.td; you'll need to update that if you want it to read/modify FP state. (Trying to override the default by sticking attributes on top doesn't work properly, as far as I know.)

@arsenm
Copy link
Contributor

arsenm commented Dec 6, 2024

llvm.trunc is currently marked IntrNoMem in Intrinsics.td; you'll need to update that if you want it to read/modify FP state. (Trying to override the default by sticking attributes on top doesn't work properly, as far as I know.)

I think we need a dedicated fp env attribute to model this (and would be a prerequisite to making this change)

@spavloff
Copy link
Collaborator Author

spavloff commented Dec 6, 2024

llvm.trunc is currently marked IntrNoMem in Intrinsics.td; you'll need to update that if you want it to read/modify FP state. (Trying to override the default by sticking attributes on top doesn't work properly, as far as I know.)

This this the key point in this solution, - we want to use the same intrinsic both in default and non-default environment. All properties necessary for non-default case will be attached to the call site. If something prevents this plan, we should evaluate it.

I think we need a dedicated fp env attribute to model this (and would be a prerequisite to making this change)

If you mean an attribute of a call site, then yes, we need more detailed view on side effects in non-default environment. Anyway, performance of a program running with non-default rounding mode should not drop if exception tracking is not needed. As for attribute of an intrinsic, its purpose seems unclear.

@arsenm
Copy link
Contributor

arsenm commented Dec 6, 2024

If you mean an attribute of a call site, then yes, we need more detailed view on side effects in non-default environment. Anyway, performance of a program running with non-default rounding mode should not drop if exception tracking is not needed. As for attribute of an intrinsic, its purpose seems unclear.

I mean a general attribute that can apply to the declaration, and a call site. We need to be able to mark which intrinisic declarations do not care about errno or other fp mode bits, and whether they can read or write them. Furthermore, it is useful to mark individual callsites with stricter variants, just like for memory attributes. We need this to avoid stripping IntrNoMem, it should still be IntrNoMem, with the additional qualifier that strictfp may read/write errno/rounding mode.

@efriedma-quic
Copy link
Collaborator

llvm.trunc is currently marked IntrNoMem in Intrinsics.td; you'll need to update that if you want it to read/modify FP state. (Trying to override the default by sticking attributes on top doesn't work properly, as far as I know.)

This this the key point in this solution, - we want to use the same intrinsic both in default and non-default environment. All properties necessary for non-default case will be attached to the call site. If something prevents this plan, we should evaluate it.

IntrNoMem gets translated to readnone, i.e. does not access any memory, including FP state. If the intrinsic can in fact read/modify FP state in some cases, we have to remove that from the intrinsic.

There are basically two ways we can go from there. One, we can just make the frontend and/or transforms add a readnone marking to callsites that can't actually access FP state (i.e. calls in non-strictfp functions). Two, we can add a "readnone_fp_intrinsic" attribute, which would mean the intrinsic is readnone unless there's an operand bundle indicating otherwise.

I think the first way composes more cleanly with our general approach to memory effects.

@arsenm
Copy link
Contributor

arsenm commented Dec 8, 2024

Two, we can add a "readnone_fp_intrinsic" attribute, which would mean the intrinsic is readnone unless there's an operand bundle indicating otherwise.

I think this needs to be more refined to FP mode read/write and errno read/write. Basically a mirror of memory() for arguments/other memory

NeedRound = false;
else if (NeedExcept && Item.getTag() == "fpe.except")
NeedExcept = false;
ActualBundles.push_back(Item);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want intrinsics where the rounding mode is baked in (like trunc or floor) to be allowed to have a rounding mode bundle? Or, do we want the rounding mode bundle to be able to specify a rounding mode that isn't the one baked into the intrinsic? I'm leaning towards the rounding mode bundle not being allowed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depend on how we want to treat the rounding mode bundle. At least two cases are possible.

(1) The rounding mode bundle specifies the floating-point environment. That is it provides information about the current value of the rounding mode in FPCR. If optimizer can deduce this value, it may set the appropriate value in all affected instruction. For example, in the following code:

call @llvm.set_rounding(i32 1)
%v = float call @llvm.trunc(float %x)

the call to trunc can be replaced with:

%v = float call @llvm.trunc(float %x) [ "fpe.control"(metadata !"rte") ]

The rounding mode in this bundle does not change the meaning of trunc, but could be useful in some cases. The two calls:

%v = float call @llvm.trunc(float %x) [ "fpe.control"(metadata !"rte") ]
%v = float call @llvm.trunc(float %x) [ "fpe.control"(metadata !"rtz") ]

represent the same operation, but on the target where trunc is implemented as round using current mode the latter instruction is implemented as one operation, while the former generally requires three operations (set fpcr, nearbyint, set fpcr). This is a hypothetical example however.

It seems the meaning of current rounding metadata argument in the constrained intrinsics agrees with this model, see discussion in https://discourse.llvm.org/t/static-rounding-mode-in-ir/80621.

In this scenario it does not make much sense to exclude unused rounding mode from allowed bundles. The bundles can be set by optimizer in a simple way, without checking if the instruction uses rounding mode. We use a similar method in clang AST, where all relevant nodes have complete FPOptions.

(2) The rounding mode bundle specifies the rounding mode used for evaluating the instruction. Instructions like trunc do not depend on the specified rounding mode, so it does not make sense to use rounding bundles for them.

This viewpoint seems more natural since rounding is considered as a parameter of an operation, similar to arguments. It also can be naturally extended to static FP control modes. Rounding as a parameter produces exactly the same effect no matter if it is read from FPCR or specified in the instruction. Other FP options, such as denormal behavior, can be handled similarly.

Neither method has a clear-cut advantage, and we need to discuss which approach to take.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think #1 is a better choice despite the downsides. Having more opportunities to optimize when lowering if we know the current rounding mode seems like a good choice. It does simplify the implementation in places as you said.

Having the rounding mode bundle specify the FP environment is a change from the constrained intrinsics, and this point is a little fine, so I do think we'll need to clearly state this in the LangRef at some point.

I am a little worried that we're creating a footgun and someone may write code that relies on the rounding mode bundle when handling trunc, floor, or one of the other math intrinsics/library calls. Then again, if code is trying to evaluate an expression then a switch is going to be needed with entries that would be expected to have rounding modes hardcoded into them. So I'm not worried enough to change my view that #1 is preferred.

Having the rounding mode bundle specify the FP environment also means we don't need any Verifier checks for improperly present or specified rounding bundles. That's a minor win.

One last point: this is another example of how having the rounding mode specified is useful. Since we've defined the constrained intrinsics to require the rounding mode be correct, and an incorrect rounding mode is undefined behavior, we can rely on the specified rounding mode being correct. The constant folding that we're doing currently checks the rounding mode in the cases I remember. We should carry over in the LangRef the verbiage about incorrect rounding mode metadata being UB.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If rounding bundle specifies dynamic rounding mode only, we can just ignore it if operation does not need it, as in the case of trunc. It should simplify optimization - otherwise optimizer should analyze if this FP operation can depend on rounding and add or not add the bundle respectively. More rigid rules on the bundles can make the implementation more complex and less convenient. On the other hand, such rules should deruce errors coused by misuse. We can start with the loose implementation and add such restrictions at any time later.

@kpneal
Copy link
Member

kpneal commented Jan 8, 2025

I do think that before we start adding in code like this ticket we need to add IR Verifier code to check for proper use of the strictfp attribute. This code never made it into the tree because there are too many broken tests already in the tree.

Verifier code could be written that only fires when an error is detected AND no constrained intrinsics are used in a function. This should eliminate failures from most, but not all, of the currently broken tests. Hopefully the few broken tests that are in tree and fire will be small enough that they can be fixed. The remainder of the broken tests will be corrected over time or will simply be removed.

My ticket that never got pushed is here: https://reviews.llvm.org/D146845

I can provide a current version of that code if it would be useful.

I also have checks that are implemented on top of that code to ensure that regular FP instructions are never mixed with constrained intrinsics. We'll need to push something like that hopefully not long after we start putting this bundle support into the tree.

@spavloff
Copy link
Collaborator Author

spavloff commented Jan 9, 2025

You are right, and I already came across such improper use of the attrubute, when intrinsics like fabs get strictfp. Meaning and usage of strictfp must be defined more precisely and relevant checks be implemented accordingly. The ticket you reference can be helpful. I wonder why it was not committed.

@kpneal
Copy link
Member

kpneal commented Jan 9, 2025

D146845 wasn't committed because it would have caused test failures. The tests are wrong, the new checks reveal this, but the new checks cannot be committed until all broken tests are fixed. Otherwise we get failures from the bots and the Verifier checks would have been reverted.

The D146845 ticket encodes the current rules for the strictfp attribute. If you are making changes that fail with D146845 applied to your tree then you are moving in the wrong direction.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants