-
Notifications
You must be signed in to change notification settings - Fork 13.6k
Reimplement constrained 'trunc' using operand bundles #118253
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Currently floating-point operations in general form (beyond the default mode) are always represented by calls to constrained intrinsics. In addition to the side effect, they carry additional information in the form of metadata arguments. This scheme is not efficient in the case of intrinsic function calls, as was noted in https://discourse.llvm.org/t/thought-on-strictfp-support/71453, because it requires defining a separate intrinsic for the same operation but used in non-default FP environment. The solution proposed in the discussion was "to move the complexity about the environment tracking from the intrinsics themselves to the call instruction". The way implemented in this change is to use operand bundles (https://llvm.org/docs/LangRef.html#operand-bundles). This way was tried previously (https://reviews.llvm.org/D93455), but was not finished. This change does not add any new functionality, it only adds the new way of keeping FP related information in LLVM IR. Metadata arguments of constrained functions are preserved, but they are not used in the queries like `getRoundingMode` or `getExceptionBehavior`.
- Fix Doxygen error, - Fix clang-format error, - remove unused function declaration, - remove setting MD_fpmath, it is made by copyMetadata.
Previously the function 'trunc' in non-default floating-point environment was implemented with a special LLVM intrinsic 'experimental.constrained.trunc'. Introduction of floating-point operand bundles allows expressing the interaction with the FP environment using the same intrinsic as for the default mode. This changes removes 'llvm.experimental.constrained.trunc' and use 'llvm.trunc' in all cases.
Your org has enabled the Graphite merge queue for merging into mainAdd the label “FP Bundles” to the PR and Graphite will automatically add it to the merge queue when it’s ready to merge. You must have a Graphite account and log in to Graphite in order to use the merge queue. Sign up using this link. |
@llvm/pr-subscribers-backend-amdgpu @llvm/pr-subscribers-backend-arm Author: Serge Pavlov (spavloff) ChangesPreviously the function 'trunc' in non-default floating-point environment was implemented with a special LLVM intrinsic 'experimental.constrained.trunc'. Introduction of floating-point operand bundles allows expressing the interaction with the FP environment using the same intrinsic as for the default mode. This changes removes 'llvm.experimental.constrained.trunc' and use 'llvm.trunc' in all cases. Patch is 138.79 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/118253.diff 66 Files Affected:
diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index cb9c23b8e0a0d0..52b2d3320c60ea 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -657,6 +657,17 @@ static Value *emitUnaryMaybeConstrainedFPBuiltin(CodeGenFunction &CGF,
}
}
+// Emit a simple mangled intrinsic that has 1 argument and a return type
+// matching the argument type.
+static Value *emitUnaryFPBuiltin(CodeGenFunction &CGF, const CallExpr *E,
+ unsigned IntrinsicID) {
+ llvm::Value *Src0 = CGF.EmitScalarExpr(E->getArg(0));
+
+ CodeGenFunction::CGFPOptionsRAII FPOptsRAII(CGF, E);
+ Function *F = CGF.CGM.getIntrinsic(IntrinsicID, Src0->getType());
+ return CGF.Builder.CreateCall(F, Src0);
+}
+
// Emit an intrinsic that has 2 operands of the same type as its result.
// Depending on mode, this may be a constrained floating-point intrinsic.
static Value *emitBinaryMaybeConstrainedFPBuiltin(CodeGenFunction &CGF,
@@ -3238,9 +3249,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
case Builtin::BI__builtin_truncf16:
case Builtin::BI__builtin_truncl:
case Builtin::BI__builtin_truncf128:
- return RValue::get(emitUnaryMaybeConstrainedFPBuiltin(*this, E,
- Intrinsic::trunc,
- Intrinsic::experimental_constrained_trunc));
+ return RValue::get(emitUnaryFPBuiltin(*this, E, Intrinsic::trunc));
case Builtin::BIlround:
case Builtin::BIlroundf:
@@ -6827,7 +6836,7 @@ Value *CodeGenFunction::EmitNeonCall(Function *F, SmallVectorImpl<Value*> &Ops,
unsigned j = 0;
for (Function::const_arg_iterator ai = F->arg_begin(), ae = F->arg_end();
ai != ae; ++ai, ++j) {
- if (F->isConstrainedFPIntrinsic())
+ if (F->isLegacyConstrainedIntrinsic())
if (ai->getType()->isMetadataTy())
continue;
if (shift > 0 && shift == j)
@@ -6836,7 +6845,7 @@ Value *CodeGenFunction::EmitNeonCall(Function *F, SmallVectorImpl<Value*> &Ops,
Ops[j] = Builder.CreateBitCast(Ops[j], ai->getType(), name);
}
- if (F->isConstrainedFPIntrinsic())
+ if (F->isLegacyConstrainedIntrinsic())
return Builder.CreateConstrainedFPCall(F, Ops, name);
else
return Builder.CreateCall(F, Ops, name);
@@ -12989,13 +12998,11 @@ Value *CodeGenFunction::EmitAArch64BuiltinExpr(unsigned BuiltinID,
: Intrinsic::rint;
return EmitNeonCall(CGM.getIntrinsic(Int, Ty), Ops, "vrndx");
}
- case NEON::BI__builtin_neon_vrndh_f16: {
+ case NEON::BI__builtin_neon_vrndh_f16:
Ops.push_back(EmitScalarExpr(E->getArg(0)));
- Int = Builder.getIsFPConstrained()
- ? Intrinsic::experimental_constrained_trunc
- : Intrinsic::trunc;
- return EmitNeonCall(CGM.getIntrinsic(Int, HalfTy), Ops, "vrndz");
- }
+ return EmitNeonCall(CGM.getIntrinsic(Intrinsic::trunc, HalfTy), Ops,
+ "vrndz");
+
case NEON::BI__builtin_neon_vrnd32x_f32:
case NEON::BI__builtin_neon_vrnd32xq_f32:
case NEON::BI__builtin_neon_vrnd32x_f64:
@@ -13029,12 +13036,9 @@ Value *CodeGenFunction::EmitAArch64BuiltinExpr(unsigned BuiltinID,
return EmitNeonCall(CGM.getIntrinsic(Int, Ty), Ops, "vrnd64z");
}
case NEON::BI__builtin_neon_vrnd_v:
- case NEON::BI__builtin_neon_vrndq_v: {
- Int = Builder.getIsFPConstrained()
- ? Intrinsic::experimental_constrained_trunc
- : Intrinsic::trunc;
- return EmitNeonCall(CGM.getIntrinsic(Int, Ty), Ops, "vrndz");
- }
+ case NEON::BI__builtin_neon_vrndq_v:
+ return EmitNeonCall(CGM.getIntrinsic(Intrinsic::trunc, Ty), Ops, "vrndz");
+
case NEON::BI__builtin_neon_vcvt_f64_v:
case NEON::BI__builtin_neon_vcvtq_f64_v:
Ops[0] = Builder.CreateBitCast(Ops[0], Ty);
@@ -18251,9 +18255,8 @@ Value *CodeGenFunction::EmitPPCBuiltinExpr(unsigned BuiltinID,
: Intrinsic::ceil;
else if (BuiltinID == PPC::BI__builtin_vsx_xvrdpiz ||
BuiltinID == PPC::BI__builtin_vsx_xvrspiz)
- ID = Builder.getIsFPConstrained()
- ? Intrinsic::experimental_constrained_trunc
- : Intrinsic::trunc;
+ return emitUnaryFPBuiltin(*this, E, Intrinsic::trunc);
+
llvm::Function *F = CGM.getIntrinsic(ID, ResultType);
return Builder.getIsFPConstrained() ? Builder.CreateConstrainedFPCall(F, X)
: Builder.CreateCall(F, X);
@@ -18754,9 +18757,7 @@ Value *CodeGenFunction::EmitPPCBuiltinExpr(unsigned BuiltinID,
.getScalarVal();
case PPC::BI__builtin_ppc_friz:
case PPC::BI__builtin_ppc_frizs:
- return RValue::get(emitUnaryMaybeConstrainedFPBuiltin(
- *this, E, Intrinsic::trunc,
- Intrinsic::experimental_constrained_trunc))
+ return RValue::get(emitUnaryFPBuiltin(*this, E, Intrinsic::trunc))
.getScalarVal();
case PPC::BI__builtin_ppc_fsqrt:
case PPC::BI__builtin_ppc_fsqrts:
@@ -20536,8 +20537,7 @@ Value *CodeGenFunction::EmitSystemZBuiltinExpr(unsigned BuiltinID,
CI = Intrinsic::experimental_constrained_nearbyint; break;
case 1: ID = Intrinsic::round;
CI = Intrinsic::experimental_constrained_round; break;
- case 5: ID = Intrinsic::trunc;
- CI = Intrinsic::experimental_constrained_trunc; break;
+ case 5: ID = Intrinsic::trunc; break;
case 6: ID = Intrinsic::ceil;
CI = Intrinsic::experimental_constrained_ceil; break;
case 7: ID = Intrinsic::floor;
@@ -20546,7 +20546,7 @@ Value *CodeGenFunction::EmitSystemZBuiltinExpr(unsigned BuiltinID,
break;
}
if (ID != Intrinsic::not_intrinsic) {
- if (Builder.getIsFPConstrained()) {
+ if (Builder.getIsFPConstrained() && ID != Intrinsic::trunc) {
Function *F = CGM.getIntrinsic(CI, ResultType);
return Builder.CreateConstrainedFPCall(F, X);
} else {
diff --git a/clang/test/CodeGen/AArch64/neon-intrinsics-constrained.c b/clang/test/CodeGen/AArch64/neon-intrinsics-constrained.c
index 15ae7eea820e80..0405cf7f19c73b 100644
--- a/clang/test/CodeGen/AArch64/neon-intrinsics-constrained.c
+++ b/clang/test/CodeGen/AArch64/neon-intrinsics-constrained.c
@@ -792,7 +792,7 @@ float64x1_t test_vrndx_f64(float64x1_t a) {
// COMMON-LABEL: test_vrnd_f64
// COMMONIR: [[TMP0:%.*]] = bitcast <1 x double> %a to <8 x i8>
// UNCONSTRAINED: [[VRNDZ1_I:%.*]] = call <1 x double> @llvm.trunc.v1f64(<1 x double> %a)
-// CONSTRAINED: [[VRNDZ1_I:%.*]] = call <1 x double> @llvm.experimental.constrained.trunc.v1f64(<1 x double> %a, metadata !"fpexcept.strict")
+// CONSTRAINED: [[VRNDZ1_I:%.*]] = call <1 x double> @llvm.trunc.v1f64(<1 x double> %a) #[[ATTR:[0-9]+]] [ "fpe.except"(metadata !"strict") ]
// COMMONIR: ret <1 x double> [[VRNDZ1_I]]
float64x1_t test_vrnd_f64(float64x1_t a) {
return vrnd_f64(a);
diff --git a/clang/test/CodeGen/AArch64/v8.2a-fp16-intrinsics-constrained.c b/clang/test/CodeGen/AArch64/v8.2a-fp16-intrinsics-constrained.c
index 9109626cea9ca2..9079a6690b9db3 100644
--- a/clang/test/CodeGen/AArch64/v8.2a-fp16-intrinsics-constrained.c
+++ b/clang/test/CodeGen/AArch64/v8.2a-fp16-intrinsics-constrained.c
@@ -150,7 +150,7 @@ uint64_t test_vcvth_u64_f16 (float16_t a) {
// COMMON-LABEL: test_vrndh_f16
// UNCONSTRAINED: [[RND:%.*]] = call half @llvm.trunc.f16(half %a)
-// CONSTRAINED: [[RND:%.*]] = call half @llvm.experimental.constrained.trunc.f16(half %a, metadata !"fpexcept.strict")
+// CONSTRAINED: [[RND:%.*]] = call half @llvm.trunc.f16(half %a) #[[ATTR:[0-9]+]] [ "fpe.except"(metadata !"strict") ]
// COMMONIR: ret half [[RND]]
float16_t test_vrndh_f16(float16_t a) {
return vrndh_f16(a);
@@ -298,3 +298,5 @@ float16_t test_vfmsh_f16(float16_t a, float16_t b, float16_t c) {
return vfmsh_f16(a, b, c);
}
+// CHECK: attributes #[[ATTR]] = { strictfp memory(inaccessiblemem: readwrite) }
+
diff --git a/clang/test/CodeGen/PowerPC/builtins-ppc-fpconstrained.c b/clang/test/CodeGen/PowerPC/builtins-ppc-fpconstrained.c
index 838db02415fe5b..b326f131a56e54 100644
--- a/clang/test/CodeGen/PowerPC/builtins-ppc-fpconstrained.c
+++ b/clang/test/CodeGen/PowerPC/builtins-ppc-fpconstrained.c
@@ -85,13 +85,13 @@ void test_float(void) {
vf = __builtin_vsx_xvrspiz(vf);
// CHECK-LABEL: try-xvrspiz
// CHECK-UNCONSTRAINED: @llvm.trunc.v4f32(<4 x float> %{{.*}})
- // CHECK-CONSTRAINED: @llvm.experimental.constrained.trunc.v4f32(<4 x float> %{{.*}}, metadata !"fpexcept.strict")
+ // CHECK-CONSTRAINED: @llvm.trunc.v4f32(<4 x float> %{{.*}}) #[[ATTR:[0-9]+]] [ "fpe.except"(metadata !"strict") ]
// CHECK-ASM: xvrspiz
vd = __builtin_vsx_xvrdpiz(vd);
// CHECK-LABEL: try-xvrdpiz
// CHECK-UNCONSTRAINED: @llvm.trunc.v2f64(<2 x double> %{{.*}})
- // CHECK-CONSTRAINED: @llvm.experimental.constrained.trunc.v2f64(<2 x double> %{{.*}}, metadata !"fpexcept.strict")
+ // CHECK-CONSTRAINED: @llvm.trunc.v2f64(<2 x double> %{{.*}}) #[[ATTR]] [ "fpe.except"(metadata !"strict") ]
// CHECK-ASM: xvrdpiz
vf = __builtin_vsx_xvmaddasp(vf, vf, vf);
@@ -156,3 +156,5 @@ void test_float(void) {
// CHECK-CONSTRAINED: fneg <2 x double> [[RESULT1]]
// CHECK-ASM: xvnmsubadp
}
+
+// CHECK-CONSTRAINED: attributes #[[ATTR]] = { strictfp memory(inaccessiblemem: readwrite) }
diff --git a/clang/test/CodeGen/SystemZ/builtins-systemz-vector-constrained.c b/clang/test/CodeGen/SystemZ/builtins-systemz-vector-constrained.c
index 6d2845504a39f0..77ede2c10eea08 100644
--- a/clang/test/CodeGen/SystemZ/builtins-systemz-vector-constrained.c
+++ b/clang/test/CodeGen/SystemZ/builtins-systemz-vector-constrained.c
@@ -45,7 +45,7 @@ void test_float(void) {
vd = __builtin_s390_vfidb(vd, 4, 1);
// CHECK: call <2 x double> @llvm.experimental.constrained.round.v2f64(<2 x double> %{{.*}})
vd = __builtin_s390_vfidb(vd, 4, 5);
- // CHECK: call <2 x double> @llvm.experimental.constrained.trunc.v2f64(<2 x double> %{{.*}})
+ // CHECK: call <2 x double> @llvm.trunc.v2f64(<2 x double> %{{.*}}) #[[ATTR:[0-9]+]] [ "fpe.except"(metadata !"strict") ]
vd = __builtin_s390_vfidb(vd, 4, 6);
// CHECK: call <2 x double> @llvm.experimental.constrained.ceil.v2f64(<2 x double> %{{.*}})
vd = __builtin_s390_vfidb(vd, 4, 7);
@@ -53,3 +53,5 @@ void test_float(void) {
vd = __builtin_s390_vfidb(vd, 4, 4);
// CHECK: call <2 x double> @llvm.s390.vfidb(<2 x double> %{{.*}}, i32 4, i32 4)
}
+
+// CHECK: attributes #[[ATTR]] = { strictfp memory(inaccessiblemem: readwrite) }
diff --git a/clang/test/CodeGen/SystemZ/builtins-systemz-vector2-constrained.c b/clang/test/CodeGen/SystemZ/builtins-systemz-vector2-constrained.c
index 735b6a0249ab62..7488cf90a9669d 100644
--- a/clang/test/CodeGen/SystemZ/builtins-systemz-vector2-constrained.c
+++ b/clang/test/CodeGen/SystemZ/builtins-systemz-vector2-constrained.c
@@ -60,10 +60,11 @@ void test_float(void) {
vf = __builtin_s390_vfisb(vf, 4, 1);
// CHECK: call <4 x float> @llvm.experimental.constrained.round.v4f32(<4 x float> %{{.*}}, metadata !{{.*}})
vf = __builtin_s390_vfisb(vf, 4, 5);
- // CHECK: call <4 x float> @llvm.experimental.constrained.trunc.v4f32(<4 x float> %{{.*}}, metadata !{{.*}})
+ // CHECK: call <4 x float> @llvm.trunc.v4f32(<4 x float> %{{.*}}) #[[ATTR:[0-9]+]] [ "fpe.except"(metadata !"strict") ]
vf = __builtin_s390_vfisb(vf, 4, 6);
// CHECK: call <4 x float> @llvm.experimental.constrained.ceil.v4f32(<4 x float> %{{.*}}, metadata !{{.*}})
vf = __builtin_s390_vfisb(vf, 4, 7);
// CHECK: call <4 x float> @llvm.experimental.constrained.floor.v4f32(<4 x float> %{{.*}}, metadata !{{.*}})
}
+// CHECK: attributes #[[ATTR]] = { strictfp memory(inaccessiblemem: readwrite) }
diff --git a/clang/test/CodeGen/SystemZ/builtins-systemz-zvector-constrained.c b/clang/test/CodeGen/SystemZ/builtins-systemz-zvector-constrained.c
index 6a1f8f0e923f65..fe964fa38aee07 100644
--- a/clang/test/CodeGen/SystemZ/builtins-systemz-zvector-constrained.c
+++ b/clang/test/CodeGen/SystemZ/builtins-systemz-zvector-constrained.c
@@ -303,10 +303,10 @@ void test_float(void) {
// CHECK: call <2 x double> @llvm.experimental.constrained.floor.v2f64(<2 x double> %{{.*}}, metadata !{{.*}})
// CHECK-ASM: vfidb %{{.*}}, %{{.*}}, 4, 7
vd = vec_roundz(vd);
- // CHECK: call <2 x double> @llvm.experimental.constrained.trunc.v2f64(<2 x double> %{{.*}}, metadata !{{.*}})
+ // CHECK: call <2 x double> @llvm.trunc.v2f64(<2 x double> %{{.*}}) #[[ATTR:[0-9]+]] [ "fpe.except"(metadata !"strict") ]
// CHECK-ASM: vfidb %{{.*}}, %{{.*}}, 4, 5
vd = vec_trunc(vd);
- // CHECK: call <2 x double> @llvm.experimental.constrained.trunc.v2f64(<2 x double> %{{.*}}, metadata !{{.*}})
+ // CHECK: call <2 x double> @llvm.trunc.v2f64(<2 x double> %{{.*}}) #[[ATTR]] [ "fpe.except"(metadata !"strict") ]
// CHECK-ASM: vfidb %{{.*}}, %{{.*}}, 4, 5
vd = vec_roundc(vd);
// CHECK: call <2 x double> @llvm.experimental.constrained.nearbyint.v2f64(<2 x double> %{{.*}}, metadata !{{.*}})
@@ -316,3 +316,5 @@ void test_float(void) {
// CHECK-ASM: vfidb %{{.*}}, %{{.*}}, 0, 0
vd = vec_round(vd);
}
+
+// CHECK: attributes #[[ATTR]] = { strictfp memory(inaccessiblemem: readwrite) }
diff --git a/clang/test/CodeGen/SystemZ/builtins-systemz-zvector2-constrained.c b/clang/test/CodeGen/SystemZ/builtins-systemz-zvector2-constrained.c
index 750f5011a26798..e7ea4e325862e9 100644
--- a/clang/test/CodeGen/SystemZ/builtins-systemz-zvector2-constrained.c
+++ b/clang/test/CodeGen/SystemZ/builtins-systemz-zvector2-constrained.c
@@ -495,16 +495,16 @@ void test_float(void) {
// CHECK-ASM: vfidb %{{.*}}, %{{.*}}, 4, 7
vf = vec_roundz(vf);
- // CHECK: call <4 x float> @llvm.experimental.constrained.trunc.v4f32(<4 x float> %{{.*}}, metadata !{{.*}})
+ // CHECK: call <4 x float> @llvm.trunc.v4f32(<4 x float> %{{.*}}) #[[ATTR:[0-9]+]] [ "fpe.except"(metadata !"strict") ]
// CHECK-ASM: vfisb %{{.*}}, %{{.*}}, 4, 5
vf = vec_trunc(vf);
- // CHECK: call <4 x float> @llvm.experimental.constrained.trunc.v4f32(<4 x float> %{{.*}}, metadata !{{.*}})
+ // CHECK: call <4 x float> @llvm.trunc.v4f32(<4 x float> %{{.*}}) #[[ATTR:[0-9]+]] [ "fpe.except"(metadata !"strict") ]
// CHECK-ASM: vfisb %{{.*}}, %{{.*}}, 4, 5
vd = vec_roundz(vd);
- // CHECK: call <2 x double> @llvm.experimental.constrained.trunc.v2f64(<2 x double> %{{.*}}, metadata !{{.*}})
+ // CHECK: call <2 x double> @llvm.trunc.v2f64(<2 x double> %{{.*}}) #[[ATTR:[0-9]+]] [ "fpe.except"(metadata !"strict") ]
// CHECK-ASM: vfidb %{{.*}}, %{{.*}}, 4, 5
vd = vec_trunc(vd);
- // CHECK: call <2 x double> @llvm.experimental.constrained.trunc.v2f64(<2 x double> %{{.*}}, metadata !{{.*}})
+ // CHECK: call <2 x double> @llvm.trunc.v2f64(<2 x double> %{{.*}}) #[[ATTR:[0-9]+]] [ "fpe.except"(metadata !"strict") ]
// CHECK-ASM: vfidb %{{.*}}, %{{.*}}, 4, 5
vf = vec_roundc(vf);
@@ -541,3 +541,5 @@ void test_float(void) {
// CHECK: call { <2 x i64>, i32 } @llvm.s390.vftcidb(<2 x double> %{{.*}}, i32 4095)
// CHECK-ASM: vftcidb
}
+
+// CHECK: attributes #[[ATTR]] = { strictfp memory(inaccessiblemem: readwrite) }
\ No newline at end of file
diff --git a/clang/test/CodeGen/arm64-vrnd-constrained.c b/clang/test/CodeGen/arm64-vrnd-constrained.c
index ccf729a6a25ef6..e690f26b0def52 100644
--- a/clang/test/CodeGen/arm64-vrnd-constrained.c
+++ b/clang/test/CodeGen/arm64-vrnd-constrained.c
@@ -14,7 +14,7 @@
float64x2_t rnd5(float64x2_t a) { return vrndq_f64(a); }
// COMMON-LABEL: rnd5
// UNCONSTRAINED: call <2 x double> @llvm.trunc.v2f64(<2 x double>
-// CONSTRAINED: call <2 x double> @llvm.experimental.constrained.trunc.v2f64(<2 x double>
+// CONSTRAINED: call <2 x double> @llvm.trunc.v2f64(<2 x double> %{{.*}}) #[[ATTR:[0-9]+]] [ "fpe.except"(metadata !"strict") ]
// CHECK-ASM: frintz.2d v{{[0-9]+}}, v{{[0-9]+}}
float64x2_t rnd13(float64x2_t a) { return vrndmq_f64(a); }
@@ -41,3 +41,5 @@ float64x2_t rnd25(float64x2_t a) { return vrndxq_f64(a); }
// CONSTRAINED: call <2 x double> @llvm.experimental.constrained.rint.v2f64(<2 x double>
// CHECK-ASM: frintx.2d v{{[0-9]+}}, v{{[0-9]+}}
+// CHECK: attributes #[[ATTR]] = { strictfp memory(inaccessiblemem: readwrite) }
+
diff --git a/clang/test/CodeGen/constrained-math-builtins.c b/clang/test/CodeGen/constrained-math-builtins.c
index 68b9e75283c547..f5136cd18e0eff 100644
--- a/clang/test/CodeGen/constrained-math-builtins.c
+++ b/clang/test/CodeGen/constrained-math-builtins.c
@@ -242,10 +242,10 @@ __builtin_atan2(f,f); __builtin_atan2f(f,f); __builtin_atan2l(f,f);
__builtin_trunc(f); __builtin_truncf(f); __builtin_truncl(f); __builtin_truncf128(f);
-// CHECK: call double @llvm.experimental.constrained.trunc.f64(double %{{.*}}, metadata !"fpexcept.strict")
-// CHECK: call float @llvm.experimental.constrained.trunc.f32(float %{{.*}}, metadata !"fpexcept.strict")
-// CHECK: call x86_fp80 @llvm.experimental.constrained.trunc.f80(x86_fp80 %{{.*}}, metadata !"fpexcept.strict")
-// CHECK: call fp128 @llvm.experimental.constrained.trunc.f128(fp128 %{{.*}}, metadata !"fpexcept.strict")
+// CHECK: call double @llvm.trunc.f64(double %{{.*}}) #[[ATTR_CALL:[0-9]+]] [ "fpe.except"(metadata !"strict") ]
+// CHECK: call float @llvm.trunc.f32(float %{{.*}}) #[[ATTR_CALL]] [ "fpe.except"(metadata !"strict") ]
+// CHECK: call x86_fp80 @llvm.trunc.f80(x86_fp80 %{{.*}}) #[[ATTR_CALL]] [ "fpe.except"(metadata !"strict") ]
+// CHECK: call fp128 @llvm.trunc.f128(fp128 %{{.*}}) #[[ATTR_CALL]] [ "fpe.except"(metadata !"strict") ]
};
// CHECK: declare double @llvm.experimental.constrained.frem.f64(double, double, metadata, metadata)
@@ -377,10 +377,10 @@ __builtin_atan2(f,f); __builtin_atan2f(f,f); __builtin_atan2l(f,f);
// CHECK: declare x86_fp80 @llvm.experimental.constrained.tan.f80(x86_fp80, metadata, metadata)
// CHECK: declare fp128 @llvm.experimental.constrained.tan.f128(fp128, metadata, metadata)
-// CHECK: declare double @llvm.experimental.constrained.trunc.f64(double, metadata)
-// CHECK: declare float @llvm.experimental.constrained.trunc.f32(float, metadata)
-// CHECK: declare x86_fp80 @llvm.experimental.constrained.trunc.f80(x86_fp80, metadata)
-// CHECK: declare fp128 @llvm.experimental.constrained.trunc.f128(fp128, metadata)
+// CHECK: declare double @llvm.trunc.f64(double) #[[ATTR_FUNC:[0-9]+]]
+// CHECK: declare float @llvm.trunc.f32(float) #[[ATTR_FUNC]]
+// CHECK: declare x86_fp80 @llvm.trunc.f80(x86_fp80) #[[ATTR_FUNC]]
+// CHECK: declare fp128 @llvm.trunc.f128(fp128) #[[ATTR_FUNC]]
#pragma STDC FP_CONTRACT ON
void bar(float f) {
@@ -401,3 +401,6 @@ void bar(float f) {
// CHECK: fneg
// CHECK: call float @llvm.experimental.constrained.fmuladd.f32(float %{{.*}}, float %{{.*}}, float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
};
+
+// CHECK: attributes #[[ATTR_FUNC]] = { {{.*}} memory(none) }
+// CHECK: attributes #[[ATTR_CALL]] = { strictfp memory(inaccessiblemem: readwrite) }
diff --git a/llvm/include/llvm/CodeGen/SelectionDAGNodes.h b/llvm/include/llvm/CodeGen/SelectionDAGNodes.h
index 677b59e0c8fbeb..9dc831ef23273d 100644
--- a/llvm/include/llvm/CodeGen/SelectionDAGNodes.h
+++ b/llvm/include/llvm/CodeGen/SelectionDAGNodes.h
@@ -721,6 +721,7 @@ END_TWO_BYTE_PACK()
case ISD::STRICT_FP_TO_BF16:
#define DAG_INSTRUCTION(NAME, NARG, ROUND_MODE, INTRINSIC, DAGN) \
case ISD::STRICT_##DAGN:
+#define LEGACY_FUNCTION DAG_INSTRUCTION
#include "llvm/IR/ConstrainedOps.def"
return true;
}
diff --git a/llvm/include/llvm/CodeGen/TargetLowering.h b/llvm/include/llvm/CodeGen/TargetLowering.h
index 6a41094ff933b0..7ccaf9558077c0 100644
--- a/llvm/include/llvm/CodeGen/TargetLowering.h
+++ b/llvm/include/llvm/CodeGen/TargetLowering.h
@@ -1324,6 +1324,7 @@ class TargetLoweringBase {
default: llvm_unreachable("Unexpected FP pseudo-opcode");
#define DAG_INSTRUCTION(NAME, NARG, ROUND_MODE, INTRINSIC, DAGN) \
case ISD::STRICT_##DAGN: EqOpc = ISD::DAGN; break;
+#define LEGACY_FUNCTION DAG_INSTRUCTION
#define CMP_INSTRUCTION(NAME, NARG, ROUND_MODE, INTRINSIC, DAGN) \
case ISD::STRICT_##DAGN: EqOpc = ISD::SETCC; break;
#include "llvm/IR/ConstrainedOps.def"
diff --git a/llvm/include/llvm/IR/ConstrainedOps.def b/llvm/include/llvm/IR/ConstrainedOps.def
index 30a82bf633d575..2b1...
[truncated]
|
@llvm/pr-subscribers-backend-powerpc Author: Serge Pavlov (spavloff) ChangesPreviously the function 'trunc' in non-default floating-point environment was implemented with a special LLVM intrinsic 'experimental.constrained.trunc'. Introduction of floating-point operand bundles allows expressing the interaction with the FP environment using the same intrinsic as for the default mode. This changes removes 'llvm.experimental.constrained.trunc' and use 'llvm.trunc' in all cases. Patch is 138.79 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/118253.diff 66 Files Affected:
diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index cb9c23b8e0a0d0..52b2d3320c60ea 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -657,6 +657,17 @@ static Value *emitUnaryMaybeConstrainedFPBuiltin(CodeGenFunction &CGF,
}
}
+// Emit a simple mangled intrinsic that has 1 argument and a return type
+// matching the argument type.
+static Value *emitUnaryFPBuiltin(CodeGenFunction &CGF, const CallExpr *E,
+ unsigned IntrinsicID) {
+ llvm::Value *Src0 = CGF.EmitScalarExpr(E->getArg(0));
+
+ CodeGenFunction::CGFPOptionsRAII FPOptsRAII(CGF, E);
+ Function *F = CGF.CGM.getIntrinsic(IntrinsicID, Src0->getType());
+ return CGF.Builder.CreateCall(F, Src0);
+}
+
// Emit an intrinsic that has 2 operands of the same type as its result.
// Depending on mode, this may be a constrained floating-point intrinsic.
static Value *emitBinaryMaybeConstrainedFPBuiltin(CodeGenFunction &CGF,
@@ -3238,9 +3249,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
case Builtin::BI__builtin_truncf16:
case Builtin::BI__builtin_truncl:
case Builtin::BI__builtin_truncf128:
- return RValue::get(emitUnaryMaybeConstrainedFPBuiltin(*this, E,
- Intrinsic::trunc,
- Intrinsic::experimental_constrained_trunc));
+ return RValue::get(emitUnaryFPBuiltin(*this, E, Intrinsic::trunc));
case Builtin::BIlround:
case Builtin::BIlroundf:
@@ -6827,7 +6836,7 @@ Value *CodeGenFunction::EmitNeonCall(Function *F, SmallVectorImpl<Value*> &Ops,
unsigned j = 0;
for (Function::const_arg_iterator ai = F->arg_begin(), ae = F->arg_end();
ai != ae; ++ai, ++j) {
- if (F->isConstrainedFPIntrinsic())
+ if (F->isLegacyConstrainedIntrinsic())
if (ai->getType()->isMetadataTy())
continue;
if (shift > 0 && shift == j)
@@ -6836,7 +6845,7 @@ Value *CodeGenFunction::EmitNeonCall(Function *F, SmallVectorImpl<Value*> &Ops,
Ops[j] = Builder.CreateBitCast(Ops[j], ai->getType(), name);
}
- if (F->isConstrainedFPIntrinsic())
+ if (F->isLegacyConstrainedIntrinsic())
return Builder.CreateConstrainedFPCall(F, Ops, name);
else
return Builder.CreateCall(F, Ops, name);
@@ -12989,13 +12998,11 @@ Value *CodeGenFunction::EmitAArch64BuiltinExpr(unsigned BuiltinID,
: Intrinsic::rint;
return EmitNeonCall(CGM.getIntrinsic(Int, Ty), Ops, "vrndx");
}
- case NEON::BI__builtin_neon_vrndh_f16: {
+ case NEON::BI__builtin_neon_vrndh_f16:
Ops.push_back(EmitScalarExpr(E->getArg(0)));
- Int = Builder.getIsFPConstrained()
- ? Intrinsic::experimental_constrained_trunc
- : Intrinsic::trunc;
- return EmitNeonCall(CGM.getIntrinsic(Int, HalfTy), Ops, "vrndz");
- }
+ return EmitNeonCall(CGM.getIntrinsic(Intrinsic::trunc, HalfTy), Ops,
+ "vrndz");
+
case NEON::BI__builtin_neon_vrnd32x_f32:
case NEON::BI__builtin_neon_vrnd32xq_f32:
case NEON::BI__builtin_neon_vrnd32x_f64:
@@ -13029,12 +13036,9 @@ Value *CodeGenFunction::EmitAArch64BuiltinExpr(unsigned BuiltinID,
return EmitNeonCall(CGM.getIntrinsic(Int, Ty), Ops, "vrnd64z");
}
case NEON::BI__builtin_neon_vrnd_v:
- case NEON::BI__builtin_neon_vrndq_v: {
- Int = Builder.getIsFPConstrained()
- ? Intrinsic::experimental_constrained_trunc
- : Intrinsic::trunc;
- return EmitNeonCall(CGM.getIntrinsic(Int, Ty), Ops, "vrndz");
- }
+ case NEON::BI__builtin_neon_vrndq_v:
+ return EmitNeonCall(CGM.getIntrinsic(Intrinsic::trunc, Ty), Ops, "vrndz");
+
case NEON::BI__builtin_neon_vcvt_f64_v:
case NEON::BI__builtin_neon_vcvtq_f64_v:
Ops[0] = Builder.CreateBitCast(Ops[0], Ty);
@@ -18251,9 +18255,8 @@ Value *CodeGenFunction::EmitPPCBuiltinExpr(unsigned BuiltinID,
: Intrinsic::ceil;
else if (BuiltinID == PPC::BI__builtin_vsx_xvrdpiz ||
BuiltinID == PPC::BI__builtin_vsx_xvrspiz)
- ID = Builder.getIsFPConstrained()
- ? Intrinsic::experimental_constrained_trunc
- : Intrinsic::trunc;
+ return emitUnaryFPBuiltin(*this, E, Intrinsic::trunc);
+
llvm::Function *F = CGM.getIntrinsic(ID, ResultType);
return Builder.getIsFPConstrained() ? Builder.CreateConstrainedFPCall(F, X)
: Builder.CreateCall(F, X);
@@ -18754,9 +18757,7 @@ Value *CodeGenFunction::EmitPPCBuiltinExpr(unsigned BuiltinID,
.getScalarVal();
case PPC::BI__builtin_ppc_friz:
case PPC::BI__builtin_ppc_frizs:
- return RValue::get(emitUnaryMaybeConstrainedFPBuiltin(
- *this, E, Intrinsic::trunc,
- Intrinsic::experimental_constrained_trunc))
+ return RValue::get(emitUnaryFPBuiltin(*this, E, Intrinsic::trunc))
.getScalarVal();
case PPC::BI__builtin_ppc_fsqrt:
case PPC::BI__builtin_ppc_fsqrts:
@@ -20536,8 +20537,7 @@ Value *CodeGenFunction::EmitSystemZBuiltinExpr(unsigned BuiltinID,
CI = Intrinsic::experimental_constrained_nearbyint; break;
case 1: ID = Intrinsic::round;
CI = Intrinsic::experimental_constrained_round; break;
- case 5: ID = Intrinsic::trunc;
- CI = Intrinsic::experimental_constrained_trunc; break;
+ case 5: ID = Intrinsic::trunc; break;
case 6: ID = Intrinsic::ceil;
CI = Intrinsic::experimental_constrained_ceil; break;
case 7: ID = Intrinsic::floor;
@@ -20546,7 +20546,7 @@ Value *CodeGenFunction::EmitSystemZBuiltinExpr(unsigned BuiltinID,
break;
}
if (ID != Intrinsic::not_intrinsic) {
- if (Builder.getIsFPConstrained()) {
+ if (Builder.getIsFPConstrained() && ID != Intrinsic::trunc) {
Function *F = CGM.getIntrinsic(CI, ResultType);
return Builder.CreateConstrainedFPCall(F, X);
} else {
diff --git a/clang/test/CodeGen/AArch64/neon-intrinsics-constrained.c b/clang/test/CodeGen/AArch64/neon-intrinsics-constrained.c
index 15ae7eea820e80..0405cf7f19c73b 100644
--- a/clang/test/CodeGen/AArch64/neon-intrinsics-constrained.c
+++ b/clang/test/CodeGen/AArch64/neon-intrinsics-constrained.c
@@ -792,7 +792,7 @@ float64x1_t test_vrndx_f64(float64x1_t a) {
// COMMON-LABEL: test_vrnd_f64
// COMMONIR: [[TMP0:%.*]] = bitcast <1 x double> %a to <8 x i8>
// UNCONSTRAINED: [[VRNDZ1_I:%.*]] = call <1 x double> @llvm.trunc.v1f64(<1 x double> %a)
-// CONSTRAINED: [[VRNDZ1_I:%.*]] = call <1 x double> @llvm.experimental.constrained.trunc.v1f64(<1 x double> %a, metadata !"fpexcept.strict")
+// CONSTRAINED: [[VRNDZ1_I:%.*]] = call <1 x double> @llvm.trunc.v1f64(<1 x double> %a) #[[ATTR:[0-9]+]] [ "fpe.except"(metadata !"strict") ]
// COMMONIR: ret <1 x double> [[VRNDZ1_I]]
float64x1_t test_vrnd_f64(float64x1_t a) {
return vrnd_f64(a);
diff --git a/clang/test/CodeGen/AArch64/v8.2a-fp16-intrinsics-constrained.c b/clang/test/CodeGen/AArch64/v8.2a-fp16-intrinsics-constrained.c
index 9109626cea9ca2..9079a6690b9db3 100644
--- a/clang/test/CodeGen/AArch64/v8.2a-fp16-intrinsics-constrained.c
+++ b/clang/test/CodeGen/AArch64/v8.2a-fp16-intrinsics-constrained.c
@@ -150,7 +150,7 @@ uint64_t test_vcvth_u64_f16 (float16_t a) {
// COMMON-LABEL: test_vrndh_f16
// UNCONSTRAINED: [[RND:%.*]] = call half @llvm.trunc.f16(half %a)
-// CONSTRAINED: [[RND:%.*]] = call half @llvm.experimental.constrained.trunc.f16(half %a, metadata !"fpexcept.strict")
+// CONSTRAINED: [[RND:%.*]] = call half @llvm.trunc.f16(half %a) #[[ATTR:[0-9]+]] [ "fpe.except"(metadata !"strict") ]
// COMMONIR: ret half [[RND]]
float16_t test_vrndh_f16(float16_t a) {
return vrndh_f16(a);
@@ -298,3 +298,5 @@ float16_t test_vfmsh_f16(float16_t a, float16_t b, float16_t c) {
return vfmsh_f16(a, b, c);
}
+// CHECK: attributes #[[ATTR]] = { strictfp memory(inaccessiblemem: readwrite) }
+
diff --git a/clang/test/CodeGen/PowerPC/builtins-ppc-fpconstrained.c b/clang/test/CodeGen/PowerPC/builtins-ppc-fpconstrained.c
index 838db02415fe5b..b326f131a56e54 100644
--- a/clang/test/CodeGen/PowerPC/builtins-ppc-fpconstrained.c
+++ b/clang/test/CodeGen/PowerPC/builtins-ppc-fpconstrained.c
@@ -85,13 +85,13 @@ void test_float(void) {
vf = __builtin_vsx_xvrspiz(vf);
// CHECK-LABEL: try-xvrspiz
// CHECK-UNCONSTRAINED: @llvm.trunc.v4f32(<4 x float> %{{.*}})
- // CHECK-CONSTRAINED: @llvm.experimental.constrained.trunc.v4f32(<4 x float> %{{.*}}, metadata !"fpexcept.strict")
+ // CHECK-CONSTRAINED: @llvm.trunc.v4f32(<4 x float> %{{.*}}) #[[ATTR:[0-9]+]] [ "fpe.except"(metadata !"strict") ]
// CHECK-ASM: xvrspiz
vd = __builtin_vsx_xvrdpiz(vd);
// CHECK-LABEL: try-xvrdpiz
// CHECK-UNCONSTRAINED: @llvm.trunc.v2f64(<2 x double> %{{.*}})
- // CHECK-CONSTRAINED: @llvm.experimental.constrained.trunc.v2f64(<2 x double> %{{.*}}, metadata !"fpexcept.strict")
+ // CHECK-CONSTRAINED: @llvm.trunc.v2f64(<2 x double> %{{.*}}) #[[ATTR]] [ "fpe.except"(metadata !"strict") ]
// CHECK-ASM: xvrdpiz
vf = __builtin_vsx_xvmaddasp(vf, vf, vf);
@@ -156,3 +156,5 @@ void test_float(void) {
// CHECK-CONSTRAINED: fneg <2 x double> [[RESULT1]]
// CHECK-ASM: xvnmsubadp
}
+
+// CHECK-CONSTRAINED: attributes #[[ATTR]] = { strictfp memory(inaccessiblemem: readwrite) }
diff --git a/clang/test/CodeGen/SystemZ/builtins-systemz-vector-constrained.c b/clang/test/CodeGen/SystemZ/builtins-systemz-vector-constrained.c
index 6d2845504a39f0..77ede2c10eea08 100644
--- a/clang/test/CodeGen/SystemZ/builtins-systemz-vector-constrained.c
+++ b/clang/test/CodeGen/SystemZ/builtins-systemz-vector-constrained.c
@@ -45,7 +45,7 @@ void test_float(void) {
vd = __builtin_s390_vfidb(vd, 4, 1);
// CHECK: call <2 x double> @llvm.experimental.constrained.round.v2f64(<2 x double> %{{.*}})
vd = __builtin_s390_vfidb(vd, 4, 5);
- // CHECK: call <2 x double> @llvm.experimental.constrained.trunc.v2f64(<2 x double> %{{.*}})
+ // CHECK: call <2 x double> @llvm.trunc.v2f64(<2 x double> %{{.*}}) #[[ATTR:[0-9]+]] [ "fpe.except"(metadata !"strict") ]
vd = __builtin_s390_vfidb(vd, 4, 6);
// CHECK: call <2 x double> @llvm.experimental.constrained.ceil.v2f64(<2 x double> %{{.*}})
vd = __builtin_s390_vfidb(vd, 4, 7);
@@ -53,3 +53,5 @@ void test_float(void) {
vd = __builtin_s390_vfidb(vd, 4, 4);
// CHECK: call <2 x double> @llvm.s390.vfidb(<2 x double> %{{.*}}, i32 4, i32 4)
}
+
+// CHECK: attributes #[[ATTR]] = { strictfp memory(inaccessiblemem: readwrite) }
diff --git a/clang/test/CodeGen/SystemZ/builtins-systemz-vector2-constrained.c b/clang/test/CodeGen/SystemZ/builtins-systemz-vector2-constrained.c
index 735b6a0249ab62..7488cf90a9669d 100644
--- a/clang/test/CodeGen/SystemZ/builtins-systemz-vector2-constrained.c
+++ b/clang/test/CodeGen/SystemZ/builtins-systemz-vector2-constrained.c
@@ -60,10 +60,11 @@ void test_float(void) {
vf = __builtin_s390_vfisb(vf, 4, 1);
// CHECK: call <4 x float> @llvm.experimental.constrained.round.v4f32(<4 x float> %{{.*}}, metadata !{{.*}})
vf = __builtin_s390_vfisb(vf, 4, 5);
- // CHECK: call <4 x float> @llvm.experimental.constrained.trunc.v4f32(<4 x float> %{{.*}}, metadata !{{.*}})
+ // CHECK: call <4 x float> @llvm.trunc.v4f32(<4 x float> %{{.*}}) #[[ATTR:[0-9]+]] [ "fpe.except"(metadata !"strict") ]
vf = __builtin_s390_vfisb(vf, 4, 6);
// CHECK: call <4 x float> @llvm.experimental.constrained.ceil.v4f32(<4 x float> %{{.*}}, metadata !{{.*}})
vf = __builtin_s390_vfisb(vf, 4, 7);
// CHECK: call <4 x float> @llvm.experimental.constrained.floor.v4f32(<4 x float> %{{.*}}, metadata !{{.*}})
}
+// CHECK: attributes #[[ATTR]] = { strictfp memory(inaccessiblemem: readwrite) }
diff --git a/clang/test/CodeGen/SystemZ/builtins-systemz-zvector-constrained.c b/clang/test/CodeGen/SystemZ/builtins-systemz-zvector-constrained.c
index 6a1f8f0e923f65..fe964fa38aee07 100644
--- a/clang/test/CodeGen/SystemZ/builtins-systemz-zvector-constrained.c
+++ b/clang/test/CodeGen/SystemZ/builtins-systemz-zvector-constrained.c
@@ -303,10 +303,10 @@ void test_float(void) {
// CHECK: call <2 x double> @llvm.experimental.constrained.floor.v2f64(<2 x double> %{{.*}}, metadata !{{.*}})
// CHECK-ASM: vfidb %{{.*}}, %{{.*}}, 4, 7
vd = vec_roundz(vd);
- // CHECK: call <2 x double> @llvm.experimental.constrained.trunc.v2f64(<2 x double> %{{.*}}, metadata !{{.*}})
+ // CHECK: call <2 x double> @llvm.trunc.v2f64(<2 x double> %{{.*}}) #[[ATTR:[0-9]+]] [ "fpe.except"(metadata !"strict") ]
// CHECK-ASM: vfidb %{{.*}}, %{{.*}}, 4, 5
vd = vec_trunc(vd);
- // CHECK: call <2 x double> @llvm.experimental.constrained.trunc.v2f64(<2 x double> %{{.*}}, metadata !{{.*}})
+ // CHECK: call <2 x double> @llvm.trunc.v2f64(<2 x double> %{{.*}}) #[[ATTR]] [ "fpe.except"(metadata !"strict") ]
// CHECK-ASM: vfidb %{{.*}}, %{{.*}}, 4, 5
vd = vec_roundc(vd);
// CHECK: call <2 x double> @llvm.experimental.constrained.nearbyint.v2f64(<2 x double> %{{.*}}, metadata !{{.*}})
@@ -316,3 +316,5 @@ void test_float(void) {
// CHECK-ASM: vfidb %{{.*}}, %{{.*}}, 0, 0
vd = vec_round(vd);
}
+
+// CHECK: attributes #[[ATTR]] = { strictfp memory(inaccessiblemem: readwrite) }
diff --git a/clang/test/CodeGen/SystemZ/builtins-systemz-zvector2-constrained.c b/clang/test/CodeGen/SystemZ/builtins-systemz-zvector2-constrained.c
index 750f5011a26798..e7ea4e325862e9 100644
--- a/clang/test/CodeGen/SystemZ/builtins-systemz-zvector2-constrained.c
+++ b/clang/test/CodeGen/SystemZ/builtins-systemz-zvector2-constrained.c
@@ -495,16 +495,16 @@ void test_float(void) {
// CHECK-ASM: vfidb %{{.*}}, %{{.*}}, 4, 7
vf = vec_roundz(vf);
- // CHECK: call <4 x float> @llvm.experimental.constrained.trunc.v4f32(<4 x float> %{{.*}}, metadata !{{.*}})
+ // CHECK: call <4 x float> @llvm.trunc.v4f32(<4 x float> %{{.*}}) #[[ATTR:[0-9]+]] [ "fpe.except"(metadata !"strict") ]
// CHECK-ASM: vfisb %{{.*}}, %{{.*}}, 4, 5
vf = vec_trunc(vf);
- // CHECK: call <4 x float> @llvm.experimental.constrained.trunc.v4f32(<4 x float> %{{.*}}, metadata !{{.*}})
+ // CHECK: call <4 x float> @llvm.trunc.v4f32(<4 x float> %{{.*}}) #[[ATTR:[0-9]+]] [ "fpe.except"(metadata !"strict") ]
// CHECK-ASM: vfisb %{{.*}}, %{{.*}}, 4, 5
vd = vec_roundz(vd);
- // CHECK: call <2 x double> @llvm.experimental.constrained.trunc.v2f64(<2 x double> %{{.*}}, metadata !{{.*}})
+ // CHECK: call <2 x double> @llvm.trunc.v2f64(<2 x double> %{{.*}}) #[[ATTR:[0-9]+]] [ "fpe.except"(metadata !"strict") ]
// CHECK-ASM: vfidb %{{.*}}, %{{.*}}, 4, 5
vd = vec_trunc(vd);
- // CHECK: call <2 x double> @llvm.experimental.constrained.trunc.v2f64(<2 x double> %{{.*}}, metadata !{{.*}})
+ // CHECK: call <2 x double> @llvm.trunc.v2f64(<2 x double> %{{.*}}) #[[ATTR:[0-9]+]] [ "fpe.except"(metadata !"strict") ]
// CHECK-ASM: vfidb %{{.*}}, %{{.*}}, 4, 5
vf = vec_roundc(vf);
@@ -541,3 +541,5 @@ void test_float(void) {
// CHECK: call { <2 x i64>, i32 } @llvm.s390.vftcidb(<2 x double> %{{.*}}, i32 4095)
// CHECK-ASM: vftcidb
}
+
+// CHECK: attributes #[[ATTR]] = { strictfp memory(inaccessiblemem: readwrite) }
\ No newline at end of file
diff --git a/clang/test/CodeGen/arm64-vrnd-constrained.c b/clang/test/CodeGen/arm64-vrnd-constrained.c
index ccf729a6a25ef6..e690f26b0def52 100644
--- a/clang/test/CodeGen/arm64-vrnd-constrained.c
+++ b/clang/test/CodeGen/arm64-vrnd-constrained.c
@@ -14,7 +14,7 @@
float64x2_t rnd5(float64x2_t a) { return vrndq_f64(a); }
// COMMON-LABEL: rnd5
// UNCONSTRAINED: call <2 x double> @llvm.trunc.v2f64(<2 x double>
-// CONSTRAINED: call <2 x double> @llvm.experimental.constrained.trunc.v2f64(<2 x double>
+// CONSTRAINED: call <2 x double> @llvm.trunc.v2f64(<2 x double> %{{.*}}) #[[ATTR:[0-9]+]] [ "fpe.except"(metadata !"strict") ]
// CHECK-ASM: frintz.2d v{{[0-9]+}}, v{{[0-9]+}}
float64x2_t rnd13(float64x2_t a) { return vrndmq_f64(a); }
@@ -41,3 +41,5 @@ float64x2_t rnd25(float64x2_t a) { return vrndxq_f64(a); }
// CONSTRAINED: call <2 x double> @llvm.experimental.constrained.rint.v2f64(<2 x double>
// CHECK-ASM: frintx.2d v{{[0-9]+}}, v{{[0-9]+}}
+// CHECK: attributes #[[ATTR]] = { strictfp memory(inaccessiblemem: readwrite) }
+
diff --git a/clang/test/CodeGen/constrained-math-builtins.c b/clang/test/CodeGen/constrained-math-builtins.c
index 68b9e75283c547..f5136cd18e0eff 100644
--- a/clang/test/CodeGen/constrained-math-builtins.c
+++ b/clang/test/CodeGen/constrained-math-builtins.c
@@ -242,10 +242,10 @@ __builtin_atan2(f,f); __builtin_atan2f(f,f); __builtin_atan2l(f,f);
__builtin_trunc(f); __builtin_truncf(f); __builtin_truncl(f); __builtin_truncf128(f);
-// CHECK: call double @llvm.experimental.constrained.trunc.f64(double %{{.*}}, metadata !"fpexcept.strict")
-// CHECK: call float @llvm.experimental.constrained.trunc.f32(float %{{.*}}, metadata !"fpexcept.strict")
-// CHECK: call x86_fp80 @llvm.experimental.constrained.trunc.f80(x86_fp80 %{{.*}}, metadata !"fpexcept.strict")
-// CHECK: call fp128 @llvm.experimental.constrained.trunc.f128(fp128 %{{.*}}, metadata !"fpexcept.strict")
+// CHECK: call double @llvm.trunc.f64(double %{{.*}}) #[[ATTR_CALL:[0-9]+]] [ "fpe.except"(metadata !"strict") ]
+// CHECK: call float @llvm.trunc.f32(float %{{.*}}) #[[ATTR_CALL]] [ "fpe.except"(metadata !"strict") ]
+// CHECK: call x86_fp80 @llvm.trunc.f80(x86_fp80 %{{.*}}) #[[ATTR_CALL]] [ "fpe.except"(metadata !"strict") ]
+// CHECK: call fp128 @llvm.trunc.f128(fp128 %{{.*}}) #[[ATTR_CALL]] [ "fpe.except"(metadata !"strict") ]
};
// CHECK: declare double @llvm.experimental.constrained.frem.f64(double, double, metadata, metadata)
@@ -377,10 +377,10 @@ __builtin_atan2(f,f); __builtin_atan2f(f,f); __builtin_atan2l(f,f);
// CHECK: declare x86_fp80 @llvm.experimental.constrained.tan.f80(x86_fp80, metadata, metadata)
// CHECK: declare fp128 @llvm.experimental.constrained.tan.f128(fp128, metadata, metadata)
-// CHECK: declare double @llvm.experimental.constrained.trunc.f64(double, metadata)
-// CHECK: declare float @llvm.experimental.constrained.trunc.f32(float, metadata)
-// CHECK: declare x86_fp80 @llvm.experimental.constrained.trunc.f80(x86_fp80, metadata)
-// CHECK: declare fp128 @llvm.experimental.constrained.trunc.f128(fp128, metadata)
+// CHECK: declare double @llvm.trunc.f64(double) #[[ATTR_FUNC:[0-9]+]]
+// CHECK: declare float @llvm.trunc.f32(float) #[[ATTR_FUNC]]
+// CHECK: declare x86_fp80 @llvm.trunc.f80(x86_fp80) #[[ATTR_FUNC]]
+// CHECK: declare fp128 @llvm.trunc.f128(fp128) #[[ATTR_FUNC]]
#pragma STDC FP_CONTRACT ON
void bar(float f) {
@@ -401,3 +401,6 @@ void bar(float f) {
// CHECK: fneg
// CHECK: call float @llvm.experimental.constrained.fmuladd.f32(float %{{.*}}, float %{{.*}}, float %{{.*}}, metadata !"round.tonearest", metadata !"fpexcept.strict")
};
+
+// CHECK: attributes #[[ATTR_FUNC]] = { {{.*}} memory(none) }
+// CHECK: attributes #[[ATTR_CALL]] = { strictfp memory(inaccessiblemem: readwrite) }
diff --git a/llvm/include/llvm/CodeGen/SelectionDAGNodes.h b/llvm/include/llvm/CodeGen/SelectionDAGNodes.h
index 677b59e0c8fbeb..9dc831ef23273d 100644
--- a/llvm/include/llvm/CodeGen/SelectionDAGNodes.h
+++ b/llvm/include/llvm/CodeGen/SelectionDAGNodes.h
@@ -721,6 +721,7 @@ END_TWO_BYTE_PACK()
case ISD::STRICT_FP_TO_BF16:
#define DAG_INSTRUCTION(NAME, NARG, ROUND_MODE, INTRINSIC, DAGN) \
case ISD::STRICT_##DAGN:
+#define LEGACY_FUNCTION DAG_INSTRUCTION
#include "llvm/IR/ConstrainedOps.def"
return true;
}
diff --git a/llvm/include/llvm/CodeGen/TargetLowering.h b/llvm/include/llvm/CodeGen/TargetLowering.h
index 6a41094ff933b0..7ccaf9558077c0 100644
--- a/llvm/include/llvm/CodeGen/TargetLowering.h
+++ b/llvm/include/llvm/CodeGen/TargetLowering.h
@@ -1324,6 +1324,7 @@ class TargetLoweringBase {
default: llvm_unreachable("Unexpected FP pseudo-opcode");
#define DAG_INSTRUCTION(NAME, NARG, ROUND_MODE, INTRINSIC, DAGN) \
case ISD::STRICT_##DAGN: EqOpc = ISD::DAGN; break;
+#define LEGACY_FUNCTION DAG_INSTRUCTION
#define CMP_INSTRUCTION(NAME, NARG, ROUND_MODE, INTRINSIC, DAGN) \
case ISD::STRICT_##DAGN: EqOpc = ISD::SETCC; break;
#include "llvm/IR/ConstrainedOps.def"
diff --git a/llvm/include/llvm/IR/ConstrainedOps.def b/llvm/include/llvm/IR/ConstrainedOps.def
index 30a82bf633d575..2b1...
[truncated]
|
You can test this locally with the following command:git-clang-format --diff c4a1e0efe6b0767dfb5861a7e8814d7db0c0de8a 8e64ef3ee8baf97e9ad319486a4be3aacc71c75e --extensions c,h,cpp -- clang/lib/CodeGen/CGBuiltin.cpp clang/test/CodeGen/AArch64/neon-intrinsics-constrained.c clang/test/CodeGen/AArch64/v8.2a-fp16-intrinsics-constrained.c clang/test/CodeGen/PowerPC/builtins-ppc-fpconstrained.c clang/test/CodeGen/SystemZ/builtins-systemz-vector-constrained.c clang/test/CodeGen/SystemZ/builtins-systemz-vector2-constrained.c clang/test/CodeGen/SystemZ/builtins-systemz-zvector-constrained.c clang/test/CodeGen/SystemZ/builtins-systemz-zvector2-constrained.c clang/test/CodeGen/X86/strictfp_builtins.c clang/test/CodeGen/arm64-vrnd-constrained.c clang/test/CodeGen/constrained-math-builtins.c clang/test/CodeGen/strictfp_builtins.c llvm/include/llvm/AsmParser/LLParser.h llvm/include/llvm/CodeGen/SelectionDAGNodes.h llvm/include/llvm/CodeGen/TargetLowering.h llvm/include/llvm/IR/FPEnv.h llvm/include/llvm/IR/Function.h llvm/include/llvm/IR/IRBuilder.h llvm/include/llvm/IR/InstrTypes.h llvm/include/llvm/IR/IntrinsicInst.h llvm/include/llvm/IR/Intrinsics.h llvm/include/llvm/IR/LLVMContext.h llvm/lib/Analysis/ConstantFolding.cpp llvm/lib/AsmParser/LLParser.cpp llvm/lib/Bitcode/Reader/BitcodeReader.cpp llvm/lib/CodeGen/ExpandVectorPredication.cpp llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h llvm/lib/CodeGen/TargetLoweringBase.cpp llvm/lib/IR/AutoUpgrade.cpp llvm/lib/IR/FPEnv.cpp llvm/lib/IR/Function.cpp llvm/lib/IR/IRBuilder.cpp llvm/lib/IR/Instructions.cpp llvm/lib/IR/IntrinsicInst.cpp llvm/lib/IR/Intrinsics.cpp llvm/lib/IR/LLVMContext.cpp llvm/lib/IR/Verifier.cpp llvm/lib/Transforms/Scalar/TailRecursionElimination.cpp llvm/lib/Transforms/Utils/CloneFunction.cpp llvm/lib/Transforms/Utils/Local.cpp View the diff from clang-format here.diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index 52b2d3320c..13c9533a8b 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -20537,7 +20537,9 @@ Value *CodeGenFunction::EmitSystemZBuiltinExpr(unsigned BuiltinID,
CI = Intrinsic::experimental_constrained_nearbyint; break;
case 1: ID = Intrinsic::round;
CI = Intrinsic::experimental_constrained_round; break;
- case 5: ID = Intrinsic::trunc; break;
+ case 5:
+ ID = Intrinsic::trunc;
+ break;
case 6: ID = Intrinsic::ceil;
CI = Intrinsic::experimental_constrained_ceil; break;
case 7: ID = Intrinsic::floor;
|
Merge activity
|
@@ -251,10 +251,12 @@ static bool markTails(Function &F, OptimizationRemarkEmitter *ORE) { | |||
|
|||
// Special-case operand bundles "clang.arc.attachedcall", "ptrauth", and | |||
// "kcfi". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment needs to be updated. I suggest removing the list that is present in the code and replacing it with text explaining why these operand bundles are special cases.
llvm.trunc is currently marked IntrNoMem in Intrinsics.td; you'll need to update that if you want it to read/modify FP state. (Trying to override the default by sticking attributes on top doesn't work properly, as far as I know.) |
I think we need a dedicated fp env attribute to model this (and would be a prerequisite to making this change) |
This this the key point in this solution, - we want to use the same intrinsic both in default and non-default environment. All properties necessary for non-default case will be attached to the call site. If something prevents this plan, we should evaluate it.
If you mean an attribute of a call site, then yes, we need more detailed view on side effects in non-default environment. Anyway, performance of a program running with non-default rounding mode should not drop if exception tracking is not needed. As for attribute of an intrinsic, its purpose seems unclear. |
I mean a general attribute that can apply to the declaration, and a call site. We need to be able to mark which intrinisic declarations do not care about errno or other fp mode bits, and whether they can read or write them. Furthermore, it is useful to mark individual callsites with stricter variants, just like for memory attributes. We need this to avoid stripping IntrNoMem, it should still be IntrNoMem, with the additional qualifier that strictfp may read/write errno/rounding mode. |
IntrNoMem gets translated to readnone, i.e. does not access any memory, including FP state. If the intrinsic can in fact read/modify FP state in some cases, we have to remove that from the intrinsic. There are basically two ways we can go from there. One, we can just make the frontend and/or transforms add a readnone marking to callsites that can't actually access FP state (i.e. calls in non-strictfp functions). Two, we can add a "readnone_fp_intrinsic" attribute, which would mean the intrinsic is readnone unless there's an operand bundle indicating otherwise. I think the first way composes more cleanly with our general approach to memory effects. |
I think this needs to be more refined to FP mode read/write and errno read/write. Basically a mirror of memory() for arguments/other memory |
NeedRound = false; | ||
else if (NeedExcept && Item.getTag() == "fpe.except") | ||
NeedExcept = false; | ||
ActualBundles.push_back(Item); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want intrinsics where the rounding mode is baked in (like trunc or floor) to be allowed to have a rounding mode bundle? Or, do we want the rounding mode bundle to be able to specify a rounding mode that isn't the one baked into the intrinsic? I'm leaning towards the rounding mode bundle not being allowed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It depend on how we want to treat the rounding mode bundle. At least two cases are possible.
(1) The rounding mode bundle specifies the floating-point environment. That is it provides information about the current value of the rounding mode in FPCR. If optimizer can deduce this value, it may set the appropriate value in all affected instruction. For example, in the following code:
call @llvm.set_rounding(i32 1)
%v = float call @llvm.trunc(float %x)
the call to trunc
can be replaced with:
%v = float call @llvm.trunc(float %x) [ "fpe.control"(metadata !"rte") ]
The rounding mode in this bundle does not change the meaning of trunc
, but could be useful in some cases. The two calls:
%v = float call @llvm.trunc(float %x) [ "fpe.control"(metadata !"rte") ]
%v = float call @llvm.trunc(float %x) [ "fpe.control"(metadata !"rtz") ]
represent the same operation, but on the target where trunc
is implemented as round using current mode
the latter instruction is implemented as one operation, while the former generally requires three operations (set fpcr
, nearbyint
, set fpcr
). This is a hypothetical example however.
It seems the meaning of current rounding metadata argument in the constrained intrinsics agrees with this model, see discussion in https://discourse.llvm.org/t/static-rounding-mode-in-ir/80621.
In this scenario it does not make much sense to exclude unused rounding mode from allowed bundles. The bundles can be set by optimizer in a simple way, without checking if the instruction uses rounding mode. We use a similar method in clang AST, where all relevant nodes have complete FPOptions
.
(2) The rounding mode bundle specifies the rounding mode used for evaluating the instruction. Instructions like trunc
do not depend on the specified rounding mode, so it does not make sense to use rounding bundles for them.
This viewpoint seems more natural since rounding is considered as a parameter of an operation, similar to arguments. It also can be naturally extended to static FP control modes. Rounding as a parameter produces exactly the same effect no matter if it is read from FPCR or specified in the instruction. Other FP options, such as denormal behavior, can be handled similarly.
Neither method has a clear-cut advantage, and we need to discuss which approach to take.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think #1 is a better choice despite the downsides. Having more opportunities to optimize when lowering if we know the current rounding mode seems like a good choice. It does simplify the implementation in places as you said.
Having the rounding mode bundle specify the FP environment is a change from the constrained intrinsics, and this point is a little fine, so I do think we'll need to clearly state this in the LangRef at some point.
I am a little worried that we're creating a footgun and someone may write code that relies on the rounding mode bundle when handling trunc, floor, or one of the other math intrinsics/library calls. Then again, if code is trying to evaluate an expression then a switch is going to be needed with entries that would be expected to have rounding modes hardcoded into them. So I'm not worried enough to change my view that #1 is preferred.
Having the rounding mode bundle specify the FP environment also means we don't need any Verifier checks for improperly present or specified rounding bundles. That's a minor win.
One last point: this is another example of how having the rounding mode specified is useful. Since we've defined the constrained intrinsics to require the rounding mode be correct, and an incorrect rounding mode is undefined behavior, we can rely on the specified rounding mode being correct. The constant folding that we're doing currently checks the rounding mode in the cases I remember. We should carry over in the LangRef the verbiage about incorrect rounding mode metadata being UB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If rounding bundle specifies dynamic rounding mode only, we can just ignore it if operation does not need it, as in the case of trunc. It should simplify optimization - otherwise optimizer should analyze if this FP operation can depend on rounding and add or not add the bundle respectively. More rigid rules on the bundles can make the implementation more complex and less convenient. On the other hand, such rules should deruce errors coused by misuse. We can start with the loose implementation and add such restrictions at any time later.
I do think that before we start adding in code like this ticket we need to add IR Verifier code to check for proper use of the strictfp attribute. This code never made it into the tree because there are too many broken tests already in the tree. Verifier code could be written that only fires when an error is detected AND no constrained intrinsics are used in a function. This should eliminate failures from most, but not all, of the currently broken tests. Hopefully the few broken tests that are in tree and fire will be small enough that they can be fixed. The remainder of the broken tests will be corrected over time or will simply be removed. My ticket that never got pushed is here: https://reviews.llvm.org/D146845 I can provide a current version of that code if it would be useful. I also have checks that are implemented on top of that code to ensure that regular FP instructions are never mixed with constrained intrinsics. We'll need to push something like that hopefully not long after we start putting this bundle support into the tree. |
You are right, and I already came across such improper use of the attrubute, when intrinsics like |
D146845 wasn't committed because it would have caused test failures. The tests are wrong, the new checks reveal this, but the new checks cannot be committed until all broken tests are fixed. Otherwise we get failures from the bots and the Verifier checks would have been reverted. The D146845 ticket encodes the current rules for the strictfp attribute. If you are making changes that fail with D146845 applied to your tree then you are moving in the wrong direction. |
Previously the function 'trunc' in non-default floating-point environment was implemented with a special LLVM intrinsic 'experimental.constrained.trunc'. Introduction of floating-point operand bundles allows expressing the interaction with the FP environment using the same intrinsic as for the default mode.
This changes removes 'llvm.experimental.constrained.trunc' and use 'llvm.trunc' in all cases.