-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[AArch64][GlobalISel] Disable fixed-point iteration in all Combiners #102167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AArch64][GlobalISel] Disable fixed-point iteration in all Combiners #102167
Conversation
@llvm/pr-subscribers-backend-aarch64 @llvm/pr-subscribers-llvm-globalisel Author: Tobias Stadler (tobias-stadler) ChangesDisable fixed-point iteration in all AArch64 Combiners after #102163. Full diff: https://github.com/llvm/llvm-project/pull/102167.diff 10 Files Affected:
diff --git a/llvm/lib/Target/AArch64/GISel/AArch64PostLegalizerCombiner.cpp b/llvm/lib/Target/AArch64/GISel/AArch64PostLegalizerCombiner.cpp
index f71fe323a6d35..28d9f4f50f388 100644
--- a/llvm/lib/Target/AArch64/GISel/AArch64PostLegalizerCombiner.cpp
+++ b/llvm/lib/Target/AArch64/GISel/AArch64PostLegalizerCombiner.cpp
@@ -566,6 +566,11 @@ bool AArch64PostLegalizerCombiner::runOnMachineFunction(MachineFunction &MF) {
CombinerInfo CInfo(/*AllowIllegalOps*/ true, /*ShouldLegalizeIllegal*/ false,
/*LegalizerInfo*/ nullptr, EnableOpt, F.hasOptSize(),
F.hasMinSize());
+ // Disable fixed-point iteration to reduce compile-time
+ CInfo.MaxIterations = 1;
+ CInfo.ObserverLvl = CombinerInfo::ObserverLevel::SinglePass;
+ // Legalizer performs DCE, so a full DCE pass is unnecessary.
+ CInfo.EnableFullDCE = false;
AArch64PostLegalizerCombinerImpl Impl(MF, CInfo, TPC, *KB, CSEInfo,
RuleConfig, ST, MDT, LI);
bool Changed = Impl.combineMachineInstrs();
diff --git a/llvm/lib/Target/AArch64/GISel/AArch64PostLegalizerLowering.cpp b/llvm/lib/Target/AArch64/GISel/AArch64PostLegalizerLowering.cpp
index 4a1977ba1a00f..56e9e4d76eeed 100644
--- a/llvm/lib/Target/AArch64/GISel/AArch64PostLegalizerLowering.cpp
+++ b/llvm/lib/Target/AArch64/GISel/AArch64PostLegalizerLowering.cpp
@@ -1290,6 +1290,11 @@ bool AArch64PostLegalizerLowering::runOnMachineFunction(MachineFunction &MF) {
CombinerInfo CInfo(/*AllowIllegalOps*/ true, /*ShouldLegalizeIllegal*/ false,
/*LegalizerInfo*/ nullptr, /*OptEnabled=*/true,
F.hasOptSize(), F.hasMinSize());
+ // Disable fixed-point iteration to reduce compile-time
+ CInfo.MaxIterations = 1;
+ CInfo.ObserverLvl = CombinerInfo::ObserverLevel::SinglePass;
+ // PostLegalizerCombiner performs DCE, so a full DCE pass is unnecessary.
+ CInfo.EnableFullDCE = false;
AArch64PostLegalizerLoweringImpl Impl(MF, CInfo, TPC, /*CSEInfo*/ nullptr,
RuleConfig, ST);
return Impl.combineMachineInstrs();
diff --git a/llvm/lib/Target/AArch64/GISel/AArch64PreLegalizerCombiner.cpp b/llvm/lib/Target/AArch64/GISel/AArch64PreLegalizerCombiner.cpp
index 8a50cb26b2c2f..6e689d743804a 100644
--- a/llvm/lib/Target/AArch64/GISel/AArch64PreLegalizerCombiner.cpp
+++ b/llvm/lib/Target/AArch64/GISel/AArch64PreLegalizerCombiner.cpp
@@ -861,6 +861,12 @@ bool AArch64PreLegalizerCombiner::runOnMachineFunction(MachineFunction &MF) {
CombinerInfo CInfo(/*AllowIllegalOps*/ true, /*ShouldLegalizeIllegal*/ false,
/*LegalizerInfo*/ nullptr, EnableOpt, F.hasOptSize(),
F.hasMinSize());
+ // Disable fixed-point iteration to reduce compile-time
+ CInfo.MaxIterations = 1;
+ CInfo.ObserverLvl = CombinerInfo::ObserverLevel::SinglePass;
+ // This is the first Combiner, so the input IR might contain dead
+ // instructions.
+ CInfo.EnableFullDCE = true;
AArch64PreLegalizerCombinerImpl Impl(MF, CInfo, &TPC, *KB, CSEInfo,
RuleConfig, ST, MDT, LI);
return Impl.combineMachineInstrs();
diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/combine-logic-of-compare.mir b/llvm/test/CodeGen/AArch64/GlobalISel/combine-logic-of-compare.mir
index 1eb445c03efcd..d5356711585fa 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/combine-logic-of-compare.mir
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/combine-logic-of-compare.mir
@@ -203,8 +203,8 @@ body: |
; CHECK-LABEL: name: test_icmp_and_icmp_9_2
; CHECK: liveins: $x0, $x1
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: [[C:%[0-9]+]]:_(s64) = G_CONSTANT i64 0
- ; CHECK-NEXT: $x0 = COPY [[C]](s64)
+ ; CHECK-NEXT: %zext:_(s64) = G_CONSTANT i64 0
+ ; CHECK-NEXT: $x0 = COPY %zext(s64)
%0:_(s64) = COPY $x0
%nine:_(s64) = G_CONSTANT i64 9
%two:_(s64) = G_CONSTANT i64 2
diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/combine-overflow.mir b/llvm/test/CodeGen/AArch64/GlobalISel/combine-overflow.mir
index ec66892b98fc7..bc4b5ae7c066a 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/combine-overflow.mir
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/combine-overflow.mir
@@ -76,9 +76,9 @@ body: |
; CHECK: liveins: $w0, $w1
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: [[COPY:%[0-9]+]]:_(s32) = COPY $w2
- ; CHECK-NEXT: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 0
+ ; CHECK-NEXT: %o_wide:_(s32) = G_CONSTANT i32 0
; CHECK-NEXT: $w0 = COPY [[COPY]](s32)
- ; CHECK-NEXT: $w1 = COPY [[C]](s32)
+ ; CHECK-NEXT: $w1 = COPY %o_wide(s32)
; CHECK-NEXT: RET_ReallyLR implicit $w0
%0:_(s32) = COPY $w0
%1:_(s32) = COPY $w1
@@ -101,10 +101,10 @@ body: |
; CHECK: liveins: $w0, $w1
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: [[COPY:%[0-9]+]]:_(s32) = COPY $w0
- ; CHECK-NEXT: %const:_(s32) = G_CONSTANT i32 0
+ ; CHECK-NEXT: %o_wide:_(s32) = G_CONSTANT i32 0
; CHECK-NEXT: $w0 = COPY [[COPY]](s32)
; CHECK-NEXT: $w1 = COPY [[COPY]](s32)
- ; CHECK-NEXT: $w2 = COPY %const(s32)
+ ; CHECK-NEXT: $w2 = COPY %o_wide(s32)
; CHECK-NEXT: RET_ReallyLR implicit $w0
%0:_(s32) = COPY $w0
%const:_(s32) = G_CONSTANT i32 0
diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/combine-udiv.ll b/llvm/test/CodeGen/AArch64/GlobalISel/combine-udiv.ll
index d465e0237201b..b681e3b223117 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/combine-udiv.ll
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/combine-udiv.ll
@@ -257,10 +257,10 @@ define i32 @udiv_div_by_180(i32 %x)
;
; GISEL-LABEL: udiv_div_by_180:
; GISEL: // %bb.0:
-; GISEL-NEXT: uxtb w8, w0
-; GISEL-NEXT: mov w9, #5826 // =0x16c2
-; GISEL-NEXT: movk w9, #364, lsl #16
-; GISEL-NEXT: umull x8, w8, w9
+; GISEL-NEXT: mov w8, #5826 // =0x16c2
+; GISEL-NEXT: and w9, w0, #0xff
+; GISEL-NEXT: movk w8, #364, lsl #16
+; GISEL-NEXT: umull x8, w9, w8
; GISEL-NEXT: lsr x0, x8, #32
; GISEL-NEXT: // kill: def $w0 killed $w0 killed $x0
; GISEL-NEXT: ret
diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/fold-global-offsets.mir b/llvm/test/CodeGen/AArch64/GlobalISel/fold-global-offsets.mir
index 1e9ef108c99ce..f0fbfecf219fd 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/fold-global-offsets.mir
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/fold-global-offsets.mir
@@ -151,9 +151,9 @@ body: |
; CHECK-LABEL: name: ptr_add_chain
; CHECK: liveins: $x0
; CHECK-NEXT: {{ $}}
- ; CHECK-NEXT: [[GV:%[0-9]+]]:_(p0) = G_GLOBAL_VALUE @g + 1
- ; CHECK-NEXT: [[C:%[0-9]+]]:_(s64) = G_CONSTANT i64 4
- ; CHECK-NEXT: %dont_fold_me:_(p0) = G_PTR_ADD [[GV]], [[C]](s64)
+ ; CHECK-NEXT: [[GV:%[0-9]+]]:_(p0) = G_GLOBAL_VALUE @g + 4
+ ; CHECK-NEXT: %offset:_(s64) = G_CONSTANT i64 1
+ ; CHECK-NEXT: %dont_fold_me:_(p0) = G_PTR_ADD [[GV]], %offset(s64)
; CHECK-NEXT: $x0 = COPY %dont_fold_me(p0)
; CHECK-NEXT: RET_ReallyLR implicit $x0
%global:_(p0) = G_GLOBAL_VALUE @g
diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/postlegalizer-combiner-sameopcode-hands-crash.mir b/llvm/test/CodeGen/AArch64/GlobalISel/postlegalizer-combiner-sameopcode-hands-crash.mir
index f8510a2bd3e71..2d286ede38d5d 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/postlegalizer-combiner-sameopcode-hands-crash.mir
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/postlegalizer-combiner-sameopcode-hands-crash.mir
@@ -9,6 +9,8 @@ body: |
bb.1:
; CHECK-LABEL: name: crash_fn
; CHECK: [[C:%[0-9]+]]:_(p0) = G_CONSTANT i64 0
+ ; CHECK-NEXT: [[C1:%[0-9]+]]:_(s32) = G_CONSTANT i32 0
+ ; CHECK-NEXT: [[C2:%[0-9]+]]:_(s1) = G_CONSTANT i1 false
; CHECK-NEXT: [[ZEXTLOAD:%[0-9]+]]:_(s32) = G_ZEXTLOAD [[C]](p0) :: (load (s16), align 8)
; CHECK-NEXT: [[ZEXTLOAD1:%[0-9]+]]:_(s32) = G_ZEXTLOAD [[C]](p0) :: (load (s16))
; CHECK-NEXT: [[AND:%[0-9]+]]:_(s32) = G_AND [[ZEXTLOAD1]], [[ZEXTLOAD]]
diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/postlegalizer-lowering-swap-compare-operands.mir b/llvm/test/CodeGen/AArch64/GlobalISel/postlegalizer-lowering-swap-compare-operands.mir
index ce7450c690503..b8020bd50a2b0 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/postlegalizer-lowering-swap-compare-operands.mir
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/postlegalizer-lowering-swap-compare-operands.mir
@@ -64,21 +64,28 @@ body: |
; LOWER-NEXT: {{ $}}
; LOWER-NEXT: %reg0:_(s64) = COPY $x0
; LOWER-NEXT: %cmp_lhs:_(s64) = G_SEXT_INREG %reg0, 8
+ ; LOWER-NEXT: %reg1:_(s64) = COPY $x1
+ ; LOWER-NEXT: %cmp1:_(s32) = G_ICMP intpred(sge), %cmp_lhs(s64), %reg1
; LOWER-NEXT: %add:_(s64) = G_ADD %cmp_lhs, %reg0
; LOWER-NEXT: %cmp2:_(s32) = G_ICMP intpred(sge), %cmp_lhs(s64), %add
; LOWER-NEXT: $w0 = COPY %cmp2(s32)
- ; LOWER-NEXT: RET_ReallyLR implicit $w0
+ ; LOWER-NEXT: $w1 = COPY %cmp1(s32)
+ ; LOWER-NEXT: RET_ReallyLR implicit $w0, implicit $w1
;
; SELECT-LABEL: name: dont_swap_more_than_one_use
; SELECT: liveins: $x0, $x1
; SELECT-NEXT: {{ $}}
; SELECT-NEXT: %reg0:gpr64 = COPY $x0
; SELECT-NEXT: %cmp_lhs:gpr64 = SBFMXri %reg0, 0, 7
+ ; SELECT-NEXT: %reg1:gpr64 = COPY $x1
+ ; SELECT-NEXT: [[SUBSXrr:%[0-9]+]]:gpr64 = SUBSXrr %cmp_lhs, %reg1, implicit-def $nzcv
+ ; SELECT-NEXT: %cmp1:gpr32 = CSINCWr $wzr, $wzr, 11, implicit $nzcv
; SELECT-NEXT: %add:gpr64 = ADDXrr %cmp_lhs, %reg0
- ; SELECT-NEXT: [[SUBSXrr:%[0-9]+]]:gpr64 = SUBSXrr %cmp_lhs, %add, implicit-def $nzcv
+ ; SELECT-NEXT: [[SUBSXrr1:%[0-9]+]]:gpr64 = SUBSXrr %cmp_lhs, %add, implicit-def $nzcv
; SELECT-NEXT: %cmp2:gpr32 = CSINCWr $wzr, $wzr, 11, implicit $nzcv
; SELECT-NEXT: $w0 = COPY %cmp2
- ; SELECT-NEXT: RET_ReallyLR implicit $w0
+ ; SELECT-NEXT: $w1 = COPY %cmp1
+ ; SELECT-NEXT: RET_ReallyLR implicit $w0, implicit $w1
%reg0:_(s64) = COPY $x0
%cmp_lhs:_(s64) = G_SEXT_INREG %reg0, 8
%reg1:_(s64) = COPY $x1
@@ -88,7 +95,8 @@ body: |
%cmp2:_(s32) = G_ICMP intpred(sge), %cmp_lhs(s64), %add
$w0 = COPY %cmp2(s32)
- RET_ReallyLR implicit $w0
+ $w1 = COPY %cmp1(s32)
+ RET_ReallyLR implicit $w0, implicit $w1
...
---
diff --git a/llvm/test/CodeGen/AArch64/setcc_knownbits.ll b/llvm/test/CodeGen/AArch64/setcc_knownbits.ll
index 8be63b04d8ce5..4dd2a896cd81c 100644
--- a/llvm/test/CodeGen/AArch64/setcc_knownbits.ll
+++ b/llvm/test/CodeGen/AArch64/setcc_knownbits.ll
@@ -33,8 +33,7 @@ define noundef i1 @logger(i32 noundef %logLevel, ptr %ea, ptr %pll) {
; CHECK-GI-NEXT: b.hi .LBB1_2
; CHECK-GI-NEXT: // %bb.1: // %land.rhs
; CHECK-GI-NEXT: ldr x8, [x1]
-; CHECK-GI-NEXT: ldrb w8, [x8]
-; CHECK-GI-NEXT: and w0, w8, #0x1
+; CHECK-GI-NEXT: ldrb w0, [x8]
; CHECK-GI-NEXT: .LBB1_2: // %land.end
; CHECK-GI-NEXT: ret
entry:
|
CTMark O0:
CTMark O2:
|
Created using spr 1.3.6-bogner-wip [skip ci]
Created using spr 1.3.6-bogner-wip
; GISEL-NEXT: mov w8, #5826 // =0x16c2 | ||
; GISEL-NEXT: and w9, w0, #0xff | ||
; GISEL-NEXT: movk w8, #364, lsl #16 | ||
; GISEL-NEXT: umull x8, w9, w8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
expanding the div generates a bunch of instructions and the order in which the combines are applied results in different ouput
; CHECK-NEXT: [[C1:%[0-9]+]]:_(s32) = G_CONSTANT i32 0 | ||
; CHECK-NEXT: [[C2:%[0-9]+]]:_(s1) = G_CONSTANT i1 false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are dead in the input IR, which doesn't happen inside the CodeGen pipeline.
@@ -88,7 +95,8 @@ body: | | |||
%cmp2:_(s32) = G_ICMP intpred(sge), %cmp_lhs(s64), %add | |||
|
|||
$w0 = COPY %cmp2(s32) | |||
RET_ReallyLR implicit $w0 | |||
$w1 = COPY %cmp1(s32) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cmp1 was dead in the input IR, but according to the comment this seems unintended, so I have fixed this.
@@ -33,8 +33,7 @@ define noundef i1 @logger(i32 noundef %logLevel, ptr %ea, ptr %pll) { | |||
; CHECK-GI-NEXT: b.hi .LBB1_2 | |||
; CHECK-GI-NEXT: // %bb.1: // %land.rhs | |||
; CHECK-GI-NEXT: ldr x8, [x1] | |||
; CHECK-GI-NEXT: ldrb w8, [x8] | |||
; CHECK-GI-NEXT: and w0, w8, #0x1 | |||
; CHECK-GI-NEXT: ldrb w0, [x8] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The immediate DCEing and better ordering of combines prevents the PreLegalizerCombiner from generating a bunch of useless artifacts that it can't combine away again. These artifacts are converted into G_AND by the ArtifactCombiner, which can't be combined away by the redundant_and combine, because KnownBits can't look through the implicit anyext of the G_LOAD. We are probably missing some combines that convert the load into sext/zext versions.
Is this a fundamental issue of the combiner or do we have to revisit the situation when the combiner becomes slowly more powerful? |
This is a fundamental issue of combiner-style algorithms. Fixed-point iteration just burns too much compile-time for no good reason. Both the InstCombiner and the DAGCombiner don't use fixed-point iteration anymore for the same compile-time reasons. The heuristics I have implemented should be mostly on-par with how thoroughly the other combiner implementations currently handle retrying combines. The main difference is that we can rely on the Observer to detect changes for retrying combines and the other combiner implementations need to use WorkList-aware functions for performing changes on the IR. If this approach is good enough for SelectionDAG, I don't see this approach becoming a problem for GlobalISel. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are some very nice improvements, thanks for working on this. None of the test output changes look to be exposing problems with this patch, so LGTM.
Continues the work for disabling fixed-point iteration in the Combiner (#94291). This introduces improved Observer-based heuristics in the GISel Combiner to retry combining defs/uses of modified instructions and for performing sparse dead code elimination. I have experimented a lot with the heuristics and this seems to be the minimal set of heuristics that allows disabling fixed-point iteration for AArch64 CTMark O2 without regressions. Enabling this globally would pass all regression tests for all official targets (apart from small benign diffs), but I have made this fully opt-in for now, because I can't quantify the impact for other targets. This should mostly be on-par with how the WorkList-aware functions in the InstCombiner and DAGCombiner handle rescheduling instructions for recombining. For performance numbers see my follow-up patch for AArch64 (#102167) Pull Request: #102163
Created using spr 1.3.6-wip [skip ci]
Thanks a lot for reviewing this whole stack for me! |
Disable fixed-point iteration in all AArch64 Combiners after #102163.
See inline comments for justification of test changes.