-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[SimplifyCFG] Increase budget for FoldTwoEntryPHINode() if the branch is unpredictable. #98495
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-llvm-analysis @llvm/pr-subscribers-llvm-transforms Author: Tianqing Wang (tianqingw) ChangesThe A TTI interface Full diff: https://github.com/llvm/llvm-project/pull/98495.diff 7 Files Affected:
diff --git a/llvm/include/llvm/Analysis/TargetTransformInfo.h b/llvm/include/llvm/Analysis/TargetTransformInfo.h
index dcdd9f82cde8e..9fbb6e1cff445 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfo.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfo.h
@@ -419,6 +419,11 @@ class TargetTransformInfo {
/// this factor, it is very likely to be predicted correctly.
BranchProbability getPredictableBranchThreshold() const;
+ // Returns an integer indicating how aggressive the target wants for
+ // eliminating unpredictable branches. A zero return value means extra
+ // optimization applied to them should be minimal.
+ unsigned getBranchMispredictPenalty() const;
+
/// Return true if branch divergence exists.
///
/// Branch divergence has a significantly negative impact on GPU performance
@@ -1820,6 +1825,7 @@ class TargetTransformInfo::Concept {
ArrayRef<const Value *> Operands,
TargetCostKind CostKind) = 0;
virtual BranchProbability getPredictableBranchThreshold() = 0;
+ virtual unsigned getBranchMispredictPenalty() = 0;
virtual bool hasBranchDivergence(const Function *F = nullptr) = 0;
virtual bool isSourceOfDivergence(const Value *V) = 0;
virtual bool isAlwaysUniform(const Value *V) = 0;
@@ -2228,6 +2234,9 @@ class TargetTransformInfo::Model final : public TargetTransformInfo::Concept {
BranchProbability getPredictableBranchThreshold() override {
return Impl.getPredictableBranchThreshold();
}
+ unsigned getBranchMispredictPenalty() override {
+ return Impl.getBranchMispredictPenalty();
+ }
bool hasBranchDivergence(const Function *F = nullptr) override {
return Impl.hasBranchDivergence(F);
}
diff --git a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
index 01624de190d51..d4b65ee6f5bd0 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
@@ -99,6 +99,8 @@ class TargetTransformInfoImplBase {
return BranchProbability(99, 100);
}
+ unsigned getBranchMispredictPenalty() const { return 0; }
+
bool hasBranchDivergence(const Function *F = nullptr) const { return false; }
bool isSourceOfDivergence(const Value *V) const { return false; }
diff --git a/llvm/lib/Analysis/TargetTransformInfo.cpp b/llvm/lib/Analysis/TargetTransformInfo.cpp
index c175d1737e54b..0ee9c8ee0cdf8 100644
--- a/llvm/lib/Analysis/TargetTransformInfo.cpp
+++ b/llvm/lib/Analysis/TargetTransformInfo.cpp
@@ -279,6 +279,10 @@ BranchProbability TargetTransformInfo::getPredictableBranchThreshold() const {
: TTIImpl->getPredictableBranchThreshold();
}
+unsigned TargetTransformInfo::getBranchMispredictPenalty() const {
+ return TTIImpl->getBranchMispredictPenalty();
+}
+
bool TargetTransformInfo::hasBranchDivergence(const Function *F) const {
return TTIImpl->hasBranchDivergence(F);
}
diff --git a/llvm/lib/Target/X86/X86TargetTransformInfo.cpp b/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
index 32a3683355b72..984586f4ae5f6 100644
--- a/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
+++ b/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
@@ -6756,3 +6756,8 @@ InstructionCost X86TTIImpl::getScalingFactorCost(Type *Ty, GlobalValue *BaseGV,
return AM.Scale != 0;
return -1;
}
+
+unsigned X86TTIImpl::getBranchMispredictPenalty() const {
+ // TODO: Hook MispredictPenalty of SchedMachineModel into this.
+ return 14;
+}
diff --git a/llvm/lib/Target/X86/X86TargetTransformInfo.h b/llvm/lib/Target/X86/X86TargetTransformInfo.h
index 5eccb1aea308d..d2b5c093e7003 100644
--- a/llvm/lib/Target/X86/X86TargetTransformInfo.h
+++ b/llvm/lib/Target/X86/X86TargetTransformInfo.h
@@ -294,6 +294,8 @@ class X86TTIImpl : public BasicTTIImplBase<X86TTIImpl> {
bool supportsEfficientVectorElementLoadStore() const;
bool enableInterleavedAccessVectorization();
+ unsigned getBranchMispredictPenalty() const;
+
private:
bool supportsGather() const;
InstructionCost getGSVectorCost(unsigned Opcode, TTI::TargetCostKind CostKind,
diff --git a/llvm/lib/Transforms/Utils/SimplifyCFG.cpp b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
index 3fa3c0f1f52b0..128238bba10b0 100644
--- a/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
+++ b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
@@ -3508,7 +3508,8 @@ static bool FoldTwoEntryPHINode(PHINode *PN, const TargetTransformInfo &TTI,
// jump to one specific 'then' block (if we have two of them).
// It isn't beneficial to speculatively execute the code
// from the block that we know is predictably not entered.
- if (!DomBI->getMetadata(LLVMContext::MD_unpredictable)) {
+ bool IsUnpredictable = DomBI->getMetadata(LLVMContext::MD_unpredictable);
+ if (!IsUnpredictable) {
uint64_t TWeight, FWeight;
if (extractBranchWeights(*DomBI, TWeight, FWeight) &&
(TWeight + FWeight) != 0) {
@@ -3549,8 +3550,10 @@ static bool FoldTwoEntryPHINode(PHINode *PN, const TargetTransformInfo &TTI,
// that need to be moved to the dominating block.
SmallPtrSet<Instruction *, 4> AggressiveInsts;
InstructionCost Cost = 0;
- InstructionCost Budget =
- TwoEntryPHINodeFoldingThreshold * TargetTransformInfo::TCC_Basic;
+ unsigned Threshold = TwoEntryPHINodeFoldingThreshold;
+ if (IsUnpredictable)
+ Threshold += TTI.getBranchMispredictPenalty();
+ InstructionCost Budget = Threshold * TargetTransformInfo::TCC_Basic;
bool Changed = false;
for (BasicBlock::iterator II = BB->begin(); isa<PHINode>(II);) {
diff --git a/llvm/test/Transforms/SimplifyCFG/two-entry-phi-fold-unpredictable.ll b/llvm/test/Transforms/SimplifyCFG/two-entry-phi-fold-unpredictable.ll
new file mode 100644
index 0000000000000..88aa8a619207d
--- /dev/null
+++ b/llvm/test/Transforms/SimplifyCFG/two-entry-phi-fold-unpredictable.ll
@@ -0,0 +1,96 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --check-globals none --version 5
+; Two-entry phi nodes with unpredictable conditions may get increased budget for folding.
+; RUN: opt < %s -S -passes=simplifycfg | FileCheck --check-prefix=CHECK-NOFOLD %s
+; RUN: opt -mtriple=x86_64-unknown-linux-gnu -mattr=+avx2 < %s -S -passes=simplifycfg | FileCheck --check-prefix=CHECK-FOLD %s
+
+define { <2 x float>, <2 x float> } @foo(float %speed, <2 x float> %velocity.coerce0, <2 x float> %velocity.coerce1) {
+; CHECK-NOFOLD-LABEL: define { <2 x float>, <2 x float> } @foo(
+; CHECK-NOFOLD-SAME: float [[SPEED:%.*]], <2 x float> [[VELOCITY_COERCE0:%.*]], <2 x float> [[VELOCITY_COERCE1:%.*]]) {
+; CHECK-NOFOLD-NEXT: [[ENTRY:.*]]:
+; CHECK-NOFOLD-NEXT: [[CMP:%.*]] = fcmp fast ogt float [[SPEED]], 0x3F747AE140000000
+; CHECK-NOFOLD-NEXT: br i1 [[CMP]], label %[[IF_THEN:.*]], label %[[IF_END:.*]], !unpredictable [[META0:![0-9]+]]
+; CHECK-NOFOLD: [[IF_THEN]]:
+; CHECK-NOFOLD-NEXT: [[VELOCITY_SROA_0_0_VEC_EXTRACT:%.*]] = extractelement <2 x float> [[VELOCITY_COERCE0]], i64 0
+; CHECK-NOFOLD-NEXT: [[MUL_I_I_I_I:%.*]] = fmul fast float [[VELOCITY_SROA_0_0_VEC_EXTRACT]], [[VELOCITY_SROA_0_0_VEC_EXTRACT]]
+; CHECK-NOFOLD-NEXT: [[VELOCITY_SROA_0_4_VEC_EXTRACT:%.*]] = extractelement <2 x float> [[VELOCITY_COERCE0]], i64 1
+; CHECK-NOFOLD-NEXT: [[MUL8_I_I_I_I:%.*]] = fmul fast float [[VELOCITY_SROA_0_4_VEC_EXTRACT]], [[VELOCITY_SROA_0_4_VEC_EXTRACT]]
+; CHECK-NOFOLD-NEXT: [[ADD_I_I_I_I:%.*]] = fadd fast float [[MUL8_I_I_I_I]], [[MUL_I_I_I_I]]
+; CHECK-NOFOLD-NEXT: [[VELOCITY_SROA_14_8_VEC_EXTRACT:%.*]] = extractelement <2 x float> [[VELOCITY_COERCE1]], i64 0
+; CHECK-NOFOLD-NEXT: [[MUL13_I_I_I_I:%.*]] = fmul fast float [[VELOCITY_SROA_14_8_VEC_EXTRACT]], [[VELOCITY_SROA_14_8_VEC_EXTRACT]]
+; CHECK-NOFOLD-NEXT: [[ADD14_I_I_I_I:%.*]] = fadd fast float [[ADD_I_I_I_I]], [[MUL13_I_I_I_I]]
+; CHECK-NOFOLD-NEXT: [[TMP0:%.*]] = tail call fast noundef float @llvm.sqrt.f32(float [[ADD14_I_I_I_I]])
+; CHECK-NOFOLD-NEXT: [[MUL_I_I_I:%.*]] = fdiv fast float 0x3FEFD70A40000000, [[TMP0]]
+; CHECK-NOFOLD-NEXT: [[SUB_I:%.*]] = fmul fast float [[MUL_I_I_I]], [[VELOCITY_SROA_0_0_VEC_EXTRACT]]
+; CHECK-NOFOLD-NEXT: [[TMP1:%.*]] = insertelement <2 x float> poison, float [[SUB_I]], i64 0
+; CHECK-NOFOLD-NEXT: [[SUB8_I:%.*]] = fmul fast float [[MUL_I_I_I]], [[VELOCITY_SROA_0_4_VEC_EXTRACT]]
+; CHECK-NOFOLD-NEXT: [[VELOCITY_SROA_0_4_VEC_INSERT25:%.*]] = insertelement <2 x float> [[TMP1]], float [[SUB8_I]], i64 1
+; CHECK-NOFOLD-NEXT: [[SUB13_I:%.*]] = fmul fast float [[MUL_I_I_I]], [[VELOCITY_SROA_14_8_VEC_EXTRACT]]
+; CHECK-NOFOLD-NEXT: [[VELOCITY_SROA_14_8_VEC_INSERT35:%.*]] = insertelement <2 x float> [[VELOCITY_COERCE1]], float [[SUB13_I]], i64 0
+; CHECK-NOFOLD-NEXT: br label %[[IF_END]]
+; CHECK-NOFOLD: [[IF_END]]:
+; CHECK-NOFOLD-NEXT: [[VELOCITY_SROA_0_0:%.*]] = phi nsz <2 x float> [ [[VELOCITY_SROA_0_4_VEC_INSERT25]], %[[IF_THEN]] ], [ zeroinitializer, %[[ENTRY]] ]
+; CHECK-NOFOLD-NEXT: [[VELOCITY_SROA_14_0:%.*]] = phi nsz <2 x float> [ [[VELOCITY_SROA_14_8_VEC_INSERT35]], %[[IF_THEN]] ], [ zeroinitializer, %[[ENTRY]] ]
+; CHECK-NOFOLD-NEXT: [[DOTFCA_0_INSERT:%.*]] = insertvalue { <2 x float>, <2 x float> } poison, <2 x float> [[VELOCITY_SROA_0_0]], 0
+; CHECK-NOFOLD-NEXT: [[DOTFCA_1_INSERT:%.*]] = insertvalue { <2 x float>, <2 x float> } [[DOTFCA_0_INSERT]], <2 x float> [[VELOCITY_SROA_14_0]], 1
+; CHECK-NOFOLD-NEXT: ret { <2 x float>, <2 x float> } [[DOTFCA_1_INSERT]]
+;
+; CHECK-FOLD-LABEL: define { <2 x float>, <2 x float> } @foo(
+; CHECK-FOLD-SAME: float [[SPEED:%.*]], <2 x float> [[VELOCITY_COERCE0:%.*]], <2 x float> [[VELOCITY_COERCE1:%.*]]) #[[ATTR0:[0-9]+]] {
+; CHECK-FOLD-NEXT: [[ENTRY:.*:]]
+; CHECK-FOLD-NEXT: [[CMP:%.*]] = fcmp fast ogt float [[SPEED]], 0x3F747AE140000000
+; CHECK-FOLD-NEXT: [[VELOCITY_SROA_0_0_VEC_EXTRACT:%.*]] = extractelement <2 x float> [[VELOCITY_COERCE0]], i64 0
+; CHECK-FOLD-NEXT: [[MUL_I_I_I_I:%.*]] = fmul fast float [[VELOCITY_SROA_0_0_VEC_EXTRACT]], [[VELOCITY_SROA_0_0_VEC_EXTRACT]]
+; CHECK-FOLD-NEXT: [[VELOCITY_SROA_0_4_VEC_EXTRACT:%.*]] = extractelement <2 x float> [[VELOCITY_COERCE0]], i64 1
+; CHECK-FOLD-NEXT: [[MUL8_I_I_I_I:%.*]] = fmul fast float [[VELOCITY_SROA_0_4_VEC_EXTRACT]], [[VELOCITY_SROA_0_4_VEC_EXTRACT]]
+; CHECK-FOLD-NEXT: [[ADD_I_I_I_I:%.*]] = fadd fast float [[MUL8_I_I_I_I]], [[MUL_I_I_I_I]]
+; CHECK-FOLD-NEXT: [[VELOCITY_SROA_14_8_VEC_EXTRACT:%.*]] = extractelement <2 x float> [[VELOCITY_COERCE1]], i64 0
+; CHECK-FOLD-NEXT: [[MUL13_I_I_I_I:%.*]] = fmul fast float [[VELOCITY_SROA_14_8_VEC_EXTRACT]], [[VELOCITY_SROA_14_8_VEC_EXTRACT]]
+; CHECK-FOLD-NEXT: [[ADD14_I_I_I_I:%.*]] = fadd fast float [[ADD_I_I_I_I]], [[MUL13_I_I_I_I]]
+; CHECK-FOLD-NEXT: [[TMP0:%.*]] = tail call fast float @llvm.sqrt.f32(float [[ADD14_I_I_I_I]])
+; CHECK-FOLD-NEXT: [[MUL_I_I_I:%.*]] = fdiv fast float 0x3FEFD70A40000000, [[TMP0]]
+; CHECK-FOLD-NEXT: [[SUB_I:%.*]] = fmul fast float [[MUL_I_I_I]], [[VELOCITY_SROA_0_0_VEC_EXTRACT]]
+; CHECK-FOLD-NEXT: [[TMP1:%.*]] = insertelement <2 x float> poison, float [[SUB_I]], i64 0
+; CHECK-FOLD-NEXT: [[SUB8_I:%.*]] = fmul fast float [[MUL_I_I_I]], [[VELOCITY_SROA_0_4_VEC_EXTRACT]]
+; CHECK-FOLD-NEXT: [[VELOCITY_SROA_0_4_VEC_INSERT25:%.*]] = insertelement <2 x float> [[TMP1]], float [[SUB8_I]], i64 1
+; CHECK-FOLD-NEXT: [[SUB13_I:%.*]] = fmul fast float [[MUL_I_I_I]], [[VELOCITY_SROA_14_8_VEC_EXTRACT]]
+; CHECK-FOLD-NEXT: [[VELOCITY_SROA_14_8_VEC_INSERT35:%.*]] = insertelement <2 x float> [[VELOCITY_COERCE1]], float [[SUB13_I]], i64 0
+; CHECK-FOLD-NEXT: [[VELOCITY_SROA_0_0:%.*]] = select nsz i1 [[CMP]], <2 x float> [[VELOCITY_SROA_0_4_VEC_INSERT25]], <2 x float> zeroinitializer, !unpredictable [[META0:![0-9]+]]
+; CHECK-FOLD-NEXT: [[VELOCITY_SROA_14_0:%.*]] = select nsz i1 [[CMP]], <2 x float> [[VELOCITY_SROA_14_8_VEC_INSERT35]], <2 x float> zeroinitializer, !unpredictable [[META0]]
+; CHECK-FOLD-NEXT: [[DOTFCA_0_INSERT:%.*]] = insertvalue { <2 x float>, <2 x float> } poison, <2 x float> [[VELOCITY_SROA_0_0]], 0
+; CHECK-FOLD-NEXT: [[DOTFCA_1_INSERT:%.*]] = insertvalue { <2 x float>, <2 x float> } [[DOTFCA_0_INSERT]], <2 x float> [[VELOCITY_SROA_14_0]], 1
+; CHECK-FOLD-NEXT: ret { <2 x float>, <2 x float> } [[DOTFCA_1_INSERT]]
+;
+entry:
+ %cmp = fcmp fast ogt float %speed, 0x3F747AE140000000
+ br i1 %cmp, label %if.then, label %if.end, !unpredictable !0
+
+if.then:
+ %velocity.sroa.0.0.vec.extract = extractelement <2 x float> %velocity.coerce0, i64 0
+ %mul.i.i.i.i = fmul fast float %velocity.sroa.0.0.vec.extract, %velocity.sroa.0.0.vec.extract
+ %velocity.sroa.0.4.vec.extract = extractelement <2 x float> %velocity.coerce0, i64 1
+ %mul8.i.i.i.i = fmul fast float %velocity.sroa.0.4.vec.extract, %velocity.sroa.0.4.vec.extract
+ %add.i.i.i.i = fadd fast float %mul8.i.i.i.i, %mul.i.i.i.i
+ %velocity.sroa.14.8.vec.extract = extractelement <2 x float> %velocity.coerce1, i64 0
+ %mul13.i.i.i.i = fmul fast float %velocity.sroa.14.8.vec.extract, %velocity.sroa.14.8.vec.extract
+ %add14.i.i.i.i = fadd fast float %add.i.i.i.i, %mul13.i.i.i.i
+ %0 = tail call fast noundef float @llvm.sqrt.f32(float %add14.i.i.i.i)
+ %mul.i.i.i = fdiv fast float 0x3FEFD70A40000000, %0
+ %sub.i = fmul fast float %mul.i.i.i, %velocity.sroa.0.0.vec.extract
+ %1 = insertelement <2 x float> poison, float %sub.i, i64 0
+ %sub8.i = fmul fast float %mul.i.i.i, %velocity.sroa.0.4.vec.extract
+ %velocity.sroa.0.4.vec.insert25 = insertelement <2 x float> %1, float %sub8.i, i64 1
+ %sub13.i = fmul fast float %mul.i.i.i, %velocity.sroa.14.8.vec.extract
+ %velocity.sroa.14.8.vec.insert35 = insertelement <2 x float> %velocity.coerce1, float %sub13.i, i64 0
+ br label %if.end
+
+if.end:
+ %velocity.sroa.0.0 = phi nsz <2 x float> [ %velocity.sroa.0.4.vec.insert25, %if.then ], [ zeroinitializer, %entry ]
+ %velocity.sroa.14.0 = phi nsz <2 x float> [ %velocity.sroa.14.8.vec.insert35, %if.then ], [ zeroinitializer, %entry ]
+ %.fca.0.insert = insertvalue { <2 x float>, <2 x float> } poison, <2 x float> %velocity.sroa.0.0, 0
+ %.fca.1.insert = insertvalue { <2 x float>, <2 x float> } %.fca.0.insert, <2 x float> %velocity.sroa.14.0, 1
+ ret { <2 x float>, <2 x float> } %.fca.1.insert
+}
+
+declare float @llvm.sqrt.f32(float)
+
+!0 = !{}
|
// Returns an integer indicating how aggressive the target wants for | ||
// eliminating unpredictable branches. A zero return value means extra | ||
// optimization applied to them should be minimal. | ||
unsigned getBranchMispredictPenalty() const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we be more specific about how this penalty (and the threshold) is used - should it be comparable to TCK_Latency costs (and should it return InstructrionCost?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm actually not sure about it and I'd like to have more discussion. IMO this is still very new and more use cases are needed to determine what's the best way for optimizations to leverage it. But for the time being, using it as an latency seems to be good.
Not sure about InstructionCost. With InstructionCost we need to:
- Define what Invalid represents. Perhaps just means unknown and no extra optimization.
- Define what 0 represents, and how it differs from Invalid. Maybe a (potentially hypothetical) hardware with very cheap branching, in which case we need to avoid folding branches at best effort? I can't think about a practical case yet.
✅ With the latest revision this PR passed the C/C++ code formatter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Let's wait one or two days to see if there are comments from other reviewers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if it's possible to remove the FoldTwoEntryPHINode
altogether and hand it in EarlyIfConversion.
llvm/test/Transforms/SimplifyCFG/two-entry-phi-fold-unpredictable.ll
Outdated
Show resolved
Hide resolved
is unpredictable. The `!unpredictable` metadata has been present for a long time, but it's usage in optimizations is still limited. This patch teaches `FoldTwoEntryPHINode()` to be more aggressive with an unpredictable branch to reduce mispredictions. A TTI interface `getBranchMispredictPenalty()` is added to distinguish between different hardwares to ensure we don't go too far for simpler cores. For simplicity, only a naive x86 implementation is included for the time being.
c64d637
to
7a8e32b
Compare
This is intended to be a minimum viable product for 19.x. That's the reason why it's limited to x86 target for the time being. Moving to EarlyIfConversion can be a next step.
It is still "transforms from if to select". I added a |
|
||
InstructionCost X86TTIImpl::getBranchMispredictPenalty() const { | ||
// TODO: Hook MispredictPenalty of SchedMachineModel into this. | ||
return 14; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How did you arrive at the value 14? Why is the TODO a TODO and not implemented in this PR?
If I understand correctly, EarlyIfConversion already does this optimization, just on the machine level. Currently, it only allows critical path extension by half the mispredict penalty. It would seem reasonable to allow a larger extension for My own 2c on SimplifyCFG vs EarlyIfConversion is that doing it in SimplifyCFG can allow it to benefit from other select optimizations in the middle end, at the cost of losing branch based optimizations. Though the way this was implemented, with this transform only running at the very end of the middle-end pipeline, you wouldn't get the benefit of other middle-end optimizations anyway. If interaction with other middle-end optimizations is the reason for having this in SimplifyCFG, it may make sense to have a phase ordering test to demonstrate them. |
I don't think removing it entirely makes sense or is possible, because InstCombine only recognizes many patterns in select form. So it's important that we convert "small" cases to selects in the middle-end. (I don't think the same is really true for the kinds of huge pattern handled here though.) |
… is unpredictable. (#98495) The `!unpredictable` metadata has been present for a long time, but it's usage in optimizations is still limited. This patch teaches `FoldTwoEntryPHINode()` to be more aggressive with an unpredictable branch to reduce mispredictions. A TTI interface `getBranchMispredictPenalty()` is added to distinguish between different hardwares to ensure we don't go too far for simpler cores. For simplicity, only a naive x86 implementation is included for the time being.
Any other comments? :) |
Just a heads-up, I'm working on migrating it to CodeGen, though it's a little more work than expected because some of the infrastructures are not there yet (#102101). |
The
!unpredictable
metadata has been present for a long time, butit's usage in optimizations is still limited. This patch teaches
FoldTwoEntryPHINode()
to be more aggressive with an unpredictablebranch to reduce mispredictions.
A TTI interface
getBranchMispredictPenalty()
is added to distinguishbetween different hardwares to ensure we don't go too far for simpler
cores. For simplicity, only a naive x86 implementation is included for
the time being.