Skip to content

[AArch64][GlobalISel] Disable fixed-point iteration in all Combiners #102167

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

tobias-stadler
Copy link
Contributor

Disable fixed-point iteration in all AArch64 Combiners after #102163.
See inline comments for justification of test changes.

@llvmbot
Copy link
Member

llvmbot commented Aug 6, 2024

@llvm/pr-subscribers-backend-aarch64

@llvm/pr-subscribers-llvm-globalisel

Author: Tobias Stadler (tobias-stadler)

Changes

Disable fixed-point iteration in all AArch64 Combiners after #102163.
See inline comments for justification of test changes.


Full diff: https://github.com/llvm/llvm-project/pull/102167.diff

10 Files Affected:

  • (modified) llvm/lib/Target/AArch64/GISel/AArch64PostLegalizerCombiner.cpp (+5)
  • (modified) llvm/lib/Target/AArch64/GISel/AArch64PostLegalizerLowering.cpp (+5)
  • (modified) llvm/lib/Target/AArch64/GISel/AArch64PreLegalizerCombiner.cpp (+6)
  • (modified) llvm/test/CodeGen/AArch64/GlobalISel/combine-logic-of-compare.mir (+2-2)
  • (modified) llvm/test/CodeGen/AArch64/GlobalISel/combine-overflow.mir (+4-4)
  • (modified) llvm/test/CodeGen/AArch64/GlobalISel/combine-udiv.ll (+4-4)
  • (modified) llvm/test/CodeGen/AArch64/GlobalISel/fold-global-offsets.mir (+3-3)
  • (modified) llvm/test/CodeGen/AArch64/GlobalISel/postlegalizer-combiner-sameopcode-hands-crash.mir (+2)
  • (modified) llvm/test/CodeGen/AArch64/GlobalISel/postlegalizer-lowering-swap-compare-operands.mir (+12-4)
  • (modified) llvm/test/CodeGen/AArch64/setcc_knownbits.ll (+1-2)
diff --git a/llvm/lib/Target/AArch64/GISel/AArch64PostLegalizerCombiner.cpp b/llvm/lib/Target/AArch64/GISel/AArch64PostLegalizerCombiner.cpp
index f71fe323a6d35..28d9f4f50f388 100644
--- a/llvm/lib/Target/AArch64/GISel/AArch64PostLegalizerCombiner.cpp
+++ b/llvm/lib/Target/AArch64/GISel/AArch64PostLegalizerCombiner.cpp
@@ -566,6 +566,11 @@ bool AArch64PostLegalizerCombiner::runOnMachineFunction(MachineFunction &MF) {
   CombinerInfo CInfo(/*AllowIllegalOps*/ true, /*ShouldLegalizeIllegal*/ false,
                      /*LegalizerInfo*/ nullptr, EnableOpt, F.hasOptSize(),
                      F.hasMinSize());
+  // Disable fixed-point iteration to reduce compile-time
+  CInfo.MaxIterations = 1;
+  CInfo.ObserverLvl = CombinerInfo::ObserverLevel::SinglePass;
+  // Legalizer performs DCE, so a full DCE pass is unnecessary.
+  CInfo.EnableFullDCE = false;
   AArch64PostLegalizerCombinerImpl Impl(MF, CInfo, TPC, *KB, CSEInfo,
                                         RuleConfig, ST, MDT, LI);
   bool Changed = Impl.combineMachineInstrs();
diff --git a/llvm/lib/Target/AArch64/GISel/AArch64PostLegalizerLowering.cpp b/llvm/lib/Target/AArch64/GISel/AArch64PostLegalizerLowering.cpp
index 4a1977ba1a00f..56e9e4d76eeed 100644
--- a/llvm/lib/Target/AArch64/GISel/AArch64PostLegalizerLowering.cpp
+++ b/llvm/lib/Target/AArch64/GISel/AArch64PostLegalizerLowering.cpp
@@ -1290,6 +1290,11 @@ bool AArch64PostLegalizerLowering::runOnMachineFunction(MachineFunction &MF) {
   CombinerInfo CInfo(/*AllowIllegalOps*/ true, /*ShouldLegalizeIllegal*/ false,
                      /*LegalizerInfo*/ nullptr, /*OptEnabled=*/true,
                      F.hasOptSize(), F.hasMinSize());
+  // Disable fixed-point iteration to reduce compile-time
+  CInfo.MaxIterations = 1;
+  CInfo.ObserverLvl = CombinerInfo::ObserverLevel::SinglePass;
+  // PostLegalizerCombiner performs DCE, so a full DCE pass is unnecessary.
+  CInfo.EnableFullDCE = false;
   AArch64PostLegalizerLoweringImpl Impl(MF, CInfo, TPC, /*CSEInfo*/ nullptr,
                                         RuleConfig, ST);
   return Impl.combineMachineInstrs();
diff --git a/llvm/lib/Target/AArch64/GISel/AArch64PreLegalizerCombiner.cpp b/llvm/lib/Target/AArch64/GISel/AArch64PreLegalizerCombiner.cpp
index 8a50cb26b2c2f..6e689d743804a 100644
--- a/llvm/lib/Target/AArch64/GISel/AArch64PreLegalizerCombiner.cpp
+++ b/llvm/lib/Target/AArch64/GISel/AArch64PreLegalizerCombiner.cpp
@@ -861,6 +861,12 @@ bool AArch64PreLegalizerCombiner::runOnMachineFunction(MachineFunction &MF) {
   CombinerInfo CInfo(/*AllowIllegalOps*/ true, /*ShouldLegalizeIllegal*/ false,
                      /*LegalizerInfo*/ nullptr, EnableOpt, F.hasOptSize(),
                      F.hasMinSize());
+  // Disable fixed-point iteration to reduce compile-time
+  CInfo.MaxIterations = 1;
+  CInfo.ObserverLvl = CombinerInfo::ObserverLevel::SinglePass;
+  // This is the first Combiner, so the input IR might contain dead
+  // instructions.
+  CInfo.EnableFullDCE = true;
   AArch64PreLegalizerCombinerImpl Impl(MF, CInfo, &TPC, *KB, CSEInfo,
                                        RuleConfig, ST, MDT, LI);
   return Impl.combineMachineInstrs();
diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/combine-logic-of-compare.mir b/llvm/test/CodeGen/AArch64/GlobalISel/combine-logic-of-compare.mir
index 1eb445c03efcd..d5356711585fa 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/combine-logic-of-compare.mir
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/combine-logic-of-compare.mir
@@ -203,8 +203,8 @@ body:             |
     ; CHECK-LABEL: name: test_icmp_and_icmp_9_2
     ; CHECK: liveins: $x0, $x1
     ; CHECK-NEXT: {{  $}}
-    ; CHECK-NEXT: [[C:%[0-9]+]]:_(s64) = G_CONSTANT i64 0
-    ; CHECK-NEXT: $x0 = COPY [[C]](s64)
+    ; CHECK-NEXT: %zext:_(s64) = G_CONSTANT i64 0
+    ; CHECK-NEXT: $x0 = COPY %zext(s64)
     %0:_(s64) = COPY $x0
     %nine:_(s64) = G_CONSTANT i64 9
     %two:_(s64) = G_CONSTANT i64 2
diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/combine-overflow.mir b/llvm/test/CodeGen/AArch64/GlobalISel/combine-overflow.mir
index ec66892b98fc7..bc4b5ae7c066a 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/combine-overflow.mir
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/combine-overflow.mir
@@ -76,9 +76,9 @@ body:             |
     ; CHECK: liveins: $w0, $w1
     ; CHECK-NEXT: {{  $}}
     ; CHECK-NEXT: [[COPY:%[0-9]+]]:_(s32) = COPY $w2
-    ; CHECK-NEXT: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 0
+    ; CHECK-NEXT: %o_wide:_(s32) = G_CONSTANT i32 0
     ; CHECK-NEXT: $w0 = COPY [[COPY]](s32)
-    ; CHECK-NEXT: $w1 = COPY [[C]](s32)
+    ; CHECK-NEXT: $w1 = COPY %o_wide(s32)
     ; CHECK-NEXT: RET_ReallyLR implicit $w0
     %0:_(s32) = COPY $w0
     %1:_(s32) = COPY $w1
@@ -101,10 +101,10 @@ body:             |
     ; CHECK: liveins: $w0, $w1
     ; CHECK-NEXT: {{  $}}
     ; CHECK-NEXT: [[COPY:%[0-9]+]]:_(s32) = COPY $w0
-    ; CHECK-NEXT: %const:_(s32) = G_CONSTANT i32 0
+    ; CHECK-NEXT: %o_wide:_(s32) = G_CONSTANT i32 0
     ; CHECK-NEXT: $w0 = COPY [[COPY]](s32)
     ; CHECK-NEXT: $w1 = COPY [[COPY]](s32)
-    ; CHECK-NEXT: $w2 = COPY %const(s32)
+    ; CHECK-NEXT: $w2 = COPY %o_wide(s32)
     ; CHECK-NEXT: RET_ReallyLR implicit $w0
     %0:_(s32) = COPY $w0
     %const:_(s32) = G_CONSTANT i32 0
diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/combine-udiv.ll b/llvm/test/CodeGen/AArch64/GlobalISel/combine-udiv.ll
index d465e0237201b..b681e3b223117 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/combine-udiv.ll
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/combine-udiv.ll
@@ -257,10 +257,10 @@ define i32 @udiv_div_by_180(i32 %x)
 ;
 ; GISEL-LABEL: udiv_div_by_180:
 ; GISEL:       // %bb.0:
-; GISEL-NEXT:    uxtb w8, w0
-; GISEL-NEXT:    mov w9, #5826 // =0x16c2
-; GISEL-NEXT:    movk w9, #364, lsl #16
-; GISEL-NEXT:    umull x8, w8, w9
+; GISEL-NEXT:    mov w8, #5826 // =0x16c2
+; GISEL-NEXT:    and w9, w0, #0xff
+; GISEL-NEXT:    movk w8, #364, lsl #16
+; GISEL-NEXT:    umull x8, w9, w8
 ; GISEL-NEXT:    lsr x0, x8, #32
 ; GISEL-NEXT:    // kill: def $w0 killed $w0 killed $x0
 ; GISEL-NEXT:    ret
diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/fold-global-offsets.mir b/llvm/test/CodeGen/AArch64/GlobalISel/fold-global-offsets.mir
index 1e9ef108c99ce..f0fbfecf219fd 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/fold-global-offsets.mir
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/fold-global-offsets.mir
@@ -151,9 +151,9 @@ body:             |
     ; CHECK-LABEL: name: ptr_add_chain
     ; CHECK: liveins: $x0
     ; CHECK-NEXT: {{  $}}
-    ; CHECK-NEXT: [[GV:%[0-9]+]]:_(p0) = G_GLOBAL_VALUE @g + 1
-    ; CHECK-NEXT: [[C:%[0-9]+]]:_(s64) = G_CONSTANT i64 4
-    ; CHECK-NEXT: %dont_fold_me:_(p0) = G_PTR_ADD [[GV]], [[C]](s64)
+    ; CHECK-NEXT: [[GV:%[0-9]+]]:_(p0) = G_GLOBAL_VALUE @g + 4
+    ; CHECK-NEXT: %offset:_(s64) = G_CONSTANT i64 1
+    ; CHECK-NEXT: %dont_fold_me:_(p0) = G_PTR_ADD [[GV]], %offset(s64)
     ; CHECK-NEXT: $x0 = COPY %dont_fold_me(p0)
     ; CHECK-NEXT: RET_ReallyLR implicit $x0
     %global:_(p0) = G_GLOBAL_VALUE @g
diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/postlegalizer-combiner-sameopcode-hands-crash.mir b/llvm/test/CodeGen/AArch64/GlobalISel/postlegalizer-combiner-sameopcode-hands-crash.mir
index f8510a2bd3e71..2d286ede38d5d 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/postlegalizer-combiner-sameopcode-hands-crash.mir
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/postlegalizer-combiner-sameopcode-hands-crash.mir
@@ -9,6 +9,8 @@ body:             |
   bb.1:
     ; CHECK-LABEL: name: crash_fn
     ; CHECK: [[C:%[0-9]+]]:_(p0) = G_CONSTANT i64 0
+    ; CHECK-NEXT: [[C1:%[0-9]+]]:_(s32) = G_CONSTANT i32 0
+    ; CHECK-NEXT: [[C2:%[0-9]+]]:_(s1) = G_CONSTANT i1 false
     ; CHECK-NEXT: [[ZEXTLOAD:%[0-9]+]]:_(s32) = G_ZEXTLOAD [[C]](p0) :: (load (s16), align 8)
     ; CHECK-NEXT: [[ZEXTLOAD1:%[0-9]+]]:_(s32) = G_ZEXTLOAD [[C]](p0) :: (load (s16))
     ; CHECK-NEXT: [[AND:%[0-9]+]]:_(s32) = G_AND [[ZEXTLOAD1]], [[ZEXTLOAD]]
diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/postlegalizer-lowering-swap-compare-operands.mir b/llvm/test/CodeGen/AArch64/GlobalISel/postlegalizer-lowering-swap-compare-operands.mir
index ce7450c690503..b8020bd50a2b0 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/postlegalizer-lowering-swap-compare-operands.mir
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/postlegalizer-lowering-swap-compare-operands.mir
@@ -64,21 +64,28 @@ body:             |
     ; LOWER-NEXT: {{  $}}
     ; LOWER-NEXT: %reg0:_(s64) = COPY $x0
     ; LOWER-NEXT: %cmp_lhs:_(s64) = G_SEXT_INREG %reg0, 8
+    ; LOWER-NEXT: %reg1:_(s64) = COPY $x1
+    ; LOWER-NEXT: %cmp1:_(s32) = G_ICMP intpred(sge), %cmp_lhs(s64), %reg1
     ; LOWER-NEXT: %add:_(s64) = G_ADD %cmp_lhs, %reg0
     ; LOWER-NEXT: %cmp2:_(s32) = G_ICMP intpred(sge), %cmp_lhs(s64), %add
     ; LOWER-NEXT: $w0 = COPY %cmp2(s32)
-    ; LOWER-NEXT: RET_ReallyLR implicit $w0
+    ; LOWER-NEXT: $w1 = COPY %cmp1(s32)
+    ; LOWER-NEXT: RET_ReallyLR implicit $w0, implicit $w1
     ;
     ; SELECT-LABEL: name: dont_swap_more_than_one_use
     ; SELECT: liveins: $x0, $x1
     ; SELECT-NEXT: {{  $}}
     ; SELECT-NEXT: %reg0:gpr64 = COPY $x0
     ; SELECT-NEXT: %cmp_lhs:gpr64 = SBFMXri %reg0, 0, 7
+    ; SELECT-NEXT: %reg1:gpr64 = COPY $x1
+    ; SELECT-NEXT: [[SUBSXrr:%[0-9]+]]:gpr64 = SUBSXrr %cmp_lhs, %reg1, implicit-def $nzcv
+    ; SELECT-NEXT: %cmp1:gpr32 = CSINCWr $wzr, $wzr, 11, implicit $nzcv
     ; SELECT-NEXT: %add:gpr64 = ADDXrr %cmp_lhs, %reg0
-    ; SELECT-NEXT: [[SUBSXrr:%[0-9]+]]:gpr64 = SUBSXrr %cmp_lhs, %add, implicit-def $nzcv
+    ; SELECT-NEXT: [[SUBSXrr1:%[0-9]+]]:gpr64 = SUBSXrr %cmp_lhs, %add, implicit-def $nzcv
     ; SELECT-NEXT: %cmp2:gpr32 = CSINCWr $wzr, $wzr, 11, implicit $nzcv
     ; SELECT-NEXT: $w0 = COPY %cmp2
-    ; SELECT-NEXT: RET_ReallyLR implicit $w0
+    ; SELECT-NEXT: $w1 = COPY %cmp1
+    ; SELECT-NEXT: RET_ReallyLR implicit $w0, implicit $w1
     %reg0:_(s64) = COPY $x0
     %cmp_lhs:_(s64) = G_SEXT_INREG %reg0, 8
     %reg1:_(s64) = COPY $x1
@@ -88,7 +95,8 @@ body:             |
     %cmp2:_(s32) = G_ICMP intpred(sge), %cmp_lhs(s64), %add
 
     $w0 = COPY %cmp2(s32)
-    RET_ReallyLR implicit $w0
+    $w1 = COPY %cmp1(s32)
+    RET_ReallyLR implicit $w0, implicit $w1
 
 ...
 ---
diff --git a/llvm/test/CodeGen/AArch64/setcc_knownbits.ll b/llvm/test/CodeGen/AArch64/setcc_knownbits.ll
index 8be63b04d8ce5..4dd2a896cd81c 100644
--- a/llvm/test/CodeGen/AArch64/setcc_knownbits.ll
+++ b/llvm/test/CodeGen/AArch64/setcc_knownbits.ll
@@ -33,8 +33,7 @@ define noundef i1 @logger(i32 noundef %logLevel, ptr %ea, ptr %pll) {
 ; CHECK-GI-NEXT:    b.hi .LBB1_2
 ; CHECK-GI-NEXT:  // %bb.1: // %land.rhs
 ; CHECK-GI-NEXT:    ldr x8, [x1]
-; CHECK-GI-NEXT:    ldrb w8, [x8]
-; CHECK-GI-NEXT:    and w0, w8, #0x1
+; CHECK-GI-NEXT:    ldrb w0, [x8]
 ; CHECK-GI-NEXT:  .LBB1_2: // %land.end
 ; CHECK-GI-NEXT:    ret
 entry:

@tobias-stadler
Copy link
Contributor Author

CTMark O0:

Program                                       compile_instructions                        size..text
                                              base-O0              patch-O0        diff   base-O0    patch-O0   diff
7zip/7zip-benchmark                           141676688007.00      141409424579.00 -0.19% 1004756.00 1004756.00 0.00%
Bullet/bullet                                  61212623785.00       61017628199.00 -0.32%  873356.00  873356.00 0.00%
SPASS/SPASS                                    14400540825.00       14348447793.00 -0.36%  645136.00  645136.00 0.00%
tramp3d-v4/tramp3d-v4                          18450535356.00       18380583241.00 -0.38%  894656.00  894656.00 0.00%
kimwitu++/kc                                   22538775759.00       22426150589.00 -0.50%  813492.00  813492.00 0.00%
ClamAV/clamscan                                14536543546.00       14378298132.00 -1.09%  702832.00  702832.00 0.00%
lencod/lencod                                  12679661465.00       12525031047.00 -1.22%  820564.00  820564.00 0.00%
mafft/pairlocalalign                            6940116013.00        6851372697.00 -1.28%  423000.00  423000.00 0.00%
consumer-typeset/consumer-typeset              12193354451.00       12003246725.00 -1.56%  908844.00  908844.00 0.00%
sqlite3/sqlite3                                 4596588654.00        4512822506.00 -1.82%  459904.00  459904.00 0.00%
                           Geomean difference                                      -0.87%                       0.00%

CTMark O2:

Program                                       compile_instructions                        size..text
                                              base-O2              patch-O2        diff   base-O2    patch-O2  diff
7zip/7zip-benchmark                           207008981549.00      205988150072.00 -0.49% 824088.00  824088.00 0.00%
Bullet/bullet                                  99712646588.00       99179976524.00 -0.53% 515864.00  515864.00 0.00%
SPASS/SPASS                                    43192888774.00       42858161952.00 -0.77% 443056.00  443056.00 0.00%
kimwitu++/kc                                   41563609655.00       41217223921.00 -0.83% 461284.00  461284.00 0.00%
tramp3d-v4/tramp3d-v4                          65778761906.00       65208111169.00 -0.87% 570560.00  570560.00 0.00%
lencod/lencod                                  57111009799.00       56497616800.00 -1.07% 541960.00  541960.00 0.00%
ClamAV/clamscan                                53935475581.00       53350116553.00 -1.09% 456352.00  456352.00 0.00%
mafft/pairlocalalign                           32799063008.00       32439200934.00 -1.10% 320888.00  320888.00 0.00%
consumer-typeset/consumer-typeset              35357798970.00       34865643450.00 -1.39% 419436.00  419436.00 0.00%
sqlite3/sqlite3                                35219936375.00       34678565893.00 -1.54% 436160.00  436160.00 0.00%
                           Geomean difference                                      -0.97%                      0.00%

Created using spr 1.3.6-bogner-wip

[skip ci]
Created using spr 1.3.6-bogner-wip
Comment on lines +260 to +263
; GISEL-NEXT: mov w8, #5826 // =0x16c2
; GISEL-NEXT: and w9, w0, #0xff
; GISEL-NEXT: movk w8, #364, lsl #16
; GISEL-NEXT: umull x8, w9, w8
Copy link
Contributor Author

@tobias-stadler tobias-stadler Aug 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

expanding the div generates a bunch of instructions and the order in which the combines are applied results in different ouput

Comment on lines +12 to +13
; CHECK-NEXT: [[C1:%[0-9]+]]:_(s32) = G_CONSTANT i32 0
; CHECK-NEXT: [[C2:%[0-9]+]]:_(s1) = G_CONSTANT i1 false
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are dead in the input IR, which doesn't happen inside the CodeGen pipeline.

@@ -88,7 +95,8 @@ body: |
%cmp2:_(s32) = G_ICMP intpred(sge), %cmp_lhs(s64), %add

$w0 = COPY %cmp2(s32)
RET_ReallyLR implicit $w0
$w1 = COPY %cmp1(s32)
Copy link
Contributor Author

@tobias-stadler tobias-stadler Aug 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cmp1 was dead in the input IR, but according to the comment this seems unintended, so I have fixed this.

@@ -33,8 +33,7 @@ define noundef i1 @logger(i32 noundef %logLevel, ptr %ea, ptr %pll) {
; CHECK-GI-NEXT: b.hi .LBB1_2
; CHECK-GI-NEXT: // %bb.1: // %land.rhs
; CHECK-GI-NEXT: ldr x8, [x1]
; CHECK-GI-NEXT: ldrb w8, [x8]
; CHECK-GI-NEXT: and w0, w8, #0x1
; CHECK-GI-NEXT: ldrb w0, [x8]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The immediate DCEing and better ordering of combines prevents the PreLegalizerCombiner from generating a bunch of useless artifacts that it can't combine away again. These artifacts are converted into G_AND by the ArtifactCombiner, which can't be combined away by the redundant_and combine, because KnownBits can't look through the implicit anyext of the G_LOAD. We are probably missing some combines that convert the load into sext/zext versions.

@tschuett
Copy link

tschuett commented Aug 6, 2024

Is this a fundamental issue of the combiner or do we have to revisit the situation when the combiner becomes slowly more powerful?

@tobias-stadler
Copy link
Contributor Author

Is this a fundamental issue of the combiner or do we have to revisit the situation when the combiner becomes slowly more powerful?

This is a fundamental issue of combiner-style algorithms. Fixed-point iteration just burns too much compile-time for no good reason. Both the InstCombiner and the DAGCombiner don't use fixed-point iteration anymore for the same compile-time reasons. The heuristics I have implemented should be mostly on-par with how thoroughly the other combiner implementations currently handle retrying combines. The main difference is that we can rely on the Observer to detect changes for retrying combines and the other combiner implementations need to use WorkList-aware functions for performing changes on the IR. If this approach is good enough for SelectionDAG, I don't see this approach becoming a problem for GlobalISel.

Copy link
Contributor

@aemerson aemerson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are some very nice improvements, thanks for working on this. None of the test output changes look to be exposing problems with this patch, so LGTM.

tobias-stadler added a commit that referenced this pull request Aug 15, 2024
Continues the work for disabling fixed-point iteration in the Combiner
(#94291).

This introduces improved Observer-based heuristics in the
GISel Combiner to retry combining defs/uses of modified instructions and
for performing sparse dead code elimination.

I have experimented a lot with the heuristics and this seems to be the
minimal set of heuristics that allows disabling fixed-point iteration
for AArch64 CTMark O2 without regressions.
Enabling this globally would pass all regression tests for all official
targets (apart from small benign diffs), but I have made this fully
opt-in for now, because I can't quantify the impact for other targets.

This should mostly be on-par with how the WorkList-aware functions
in the InstCombiner and DAGCombiner handle rescheduling instructions
for recombining.

For performance numbers see my follow-up patch for AArch64 (#102167)

Pull Request: #102163
Created using spr 1.3.6-wip

[skip ci]
Created using spr 1.3.6-wip
@tobias-stadler tobias-stadler changed the base branch from users/tobias-stadler/spr/main.aarch64globalisel-disable-fixed-point-iteration-in-all-combiners to main August 16, 2024 13:15
@tobias-stadler tobias-stadler merged commit abdc2b1 into main Aug 16, 2024
9 of 11 checks passed
@tobias-stadler tobias-stadler deleted the users/tobias-stadler/spr/aarch64globalisel-disable-fixed-point-iteration-in-all-combiners branch August 16, 2024 13:15
@tobias-stadler
Copy link
Contributor Author

These are some very nice improvements, thanks for working on this. None of the test output changes look to be exposing problems with this patch, so LGTM.

Thanks a lot for reviewing this whole stack for me!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants