Skip to content

RegisterCoalescer: Fix implicit operand handling during rematerialize #75271

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Dec 13, 2023

Conversation

arsenm
Copy link
Contributor

@arsenm arsenm commented Dec 13, 2023

If the rematerialize was placing a subregister into a super register,
and implicit operands referenced the original register, we need to add
undef flags to the now-subregister indexed implicit operands.

Depends #75152

If the copy being hoisted was undef, we have the same problems that
eliminateUndefCopy needs to solve. We would effectively be introducing a
new live out implicit_def. We need to add an undef flag to avoid artificially
introducing a live through undef value. Previously, the verifier would fail
due to the dead def inside the loop providing the live in value for the %1 use.
If the rematerialize was placing a subregister into a super register,
and implicit operands referenced the original register, we need to add
undef flags to the now-subregister indexed implicit operands.
@llvmbot
Copy link
Member

llvmbot commented Dec 13, 2023

@llvm/pr-subscribers-backend-x86

@llvm/pr-subscribers-llvm-regalloc

Author: Matt Arsenault (arsenm)

Changes

If the rematerialize was placing a subregister into a super register,
and implicit operands referenced the original register, we need to add
undef flags to the now-subregister indexed implicit operands.

Depends #75152


Full diff: https://github.com/llvm/llvm-project/pull/75271.diff

3 Files Affected:

  • (modified) llvm/lib/CodeGen/RegisterCoalescer.cpp (+16-2)
  • (added) llvm/test/CodeGen/X86/coalescer-partial-redundancy-clear-dead-flag-undef-copy.mir (+47)
  • (added) llvm/test/CodeGen/X86/coalescer-remat-with-undef-implicit-def-operand.mir (+123)
diff --git a/llvm/lib/CodeGen/RegisterCoalescer.cpp b/llvm/lib/CodeGen/RegisterCoalescer.cpp
index c067d87a9fd810..c1af37c8510ff5 100644
--- a/llvm/lib/CodeGen/RegisterCoalescer.cpp
+++ b/llvm/lib/CodeGen/RegisterCoalescer.cpp
@@ -1201,6 +1201,8 @@ bool RegisterCoalescer::removePartialRedundancy(const CoalescerPair &CP,
                       << printMBBReference(MBB) << '\t' << CopyMI);
   }
 
+  const bool IsUndefCopy = CopyMI.getOperand(1).isUndef();
+
   // Remove CopyMI.
   // Note: This is fine to remove the copy before updating the live-ranges.
   // While updating the live-ranges, we only look at slot indices and
@@ -1214,6 +1216,19 @@ bool RegisterCoalescer::removePartialRedundancy(const CoalescerPair &CP,
   LIS->pruneValue(*static_cast<LiveRange *>(&IntB), CopyIdx.getRegSlot(),
                   &EndPoints);
   BValNo->markUnused();
+
+  if (IsUndefCopy) {
+    // We're introducing an undef phi def, and need to set undef on any users of
+    // the previously local def to avoid artifically extending the lifetime
+    // through the block.
+    for (MachineOperand &MO : MRI->use_nodbg_operands(IntB.reg())) {
+      const MachineInstr &MI = *MO.getParent();
+      SlotIndex UseIdx = LIS->getInstructionIndex(MI);
+      if (!IntB.liveAt(UseIdx))
+        MO.setIsUndef(true);
+    }
+  }
+
   // Extend IntB to the EndPoints of its original live interval.
   LIS->extendToIndices(IntB, EndPoints);
 
@@ -1596,8 +1611,7 @@ bool RegisterCoalescer::reMaterializeTrivialDef(const CoalescerPair &CP,
         LR->createDeadDef(NewMIIdx.getRegSlot(), LIS->getVNInfoAllocator());
   }
 
-  if (NewMI.getOperand(0).getSubReg())
-    NewMI.getOperand(0).setIsUndef();
+  NewMI.setRegisterDefReadUndef(NewMI.getOperand(0).getReg());
 
   // Transfer over implicit operands to the rematerialized instruction.
   for (MachineOperand &MO : ImplicitOps)
diff --git a/llvm/test/CodeGen/X86/coalescer-partial-redundancy-clear-dead-flag-undef-copy.mir b/llvm/test/CodeGen/X86/coalescer-partial-redundancy-clear-dead-flag-undef-copy.mir
new file mode 100644
index 00000000000000..5f33be0bc15559
--- /dev/null
+++ b/llvm/test/CodeGen/X86/coalescer-partial-redundancy-clear-dead-flag-undef-copy.mir
@@ -0,0 +1,47 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 4
+# RUN: llc -mtriple=x86_64-pc-linux-gnu -run-pass=register-coalescer -verify-coalescing -o - %s | FileCheck %s
+
+# Check for "Live range continues after dead def flag".
+
+# There are 2 copies of undef, but the registers also appear to be
+# live due to block live outs, and thus were not deleted as
+# eliminateUndefCopy only considered the live range, and not the undef
+# flag.
+#
+# removePartialRedundancy would move the COPY undef %0 in bb.1 to
+# bb.0.  The live range of %1 would then be extended to be live out of
+# %bb.1 for the backedge phi. This would then fail the verifier, since
+# the dead flag was no longer valid. This was fixed by directly
+# considering the undef flag to avoid considering this special case.
+
+---
+name: partial_redundancy_coalesce_undef_copy_live_out
+tracksRegLiveness: true
+body:             |
+  ; CHECK-LABEL: name: partial_redundancy_coalesce_undef_copy_live_out
+  ; CHECK: bb.0:
+  ; CHECK-NEXT:   successors: %bb.1(0x80000000)
+  ; CHECK-NEXT:   liveins: $rdi
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT:   [[COPY:%[0-9]+]]:gr32 = COPY $rdi
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.1:
+  ; CHECK-NEXT:   successors: %bb.1(0x80000000)
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT:   dead [[XOR32ri:%[0-9]+]]:gr32 = XOR32ri undef [[XOR32ri]], 1, implicit-def dead $eflags
+  ; CHECK-NEXT:   dead [[MOV32rr:%[0-9]+]]:gr32 = MOV32rr [[COPY]]
+  ; CHECK-NEXT:   [[COPY:%[0-9]+]]:gr32 = IMPLICIT_DEF
+  ; CHECK-NEXT:   JMP_1 %bb.1
+  bb.0:
+    liveins: $rdi
+
+    %0:gr32 = COPY $rdi
+
+  bb.1:
+    %1:gr32 = COPY undef %0
+    dead %1:gr32 = XOR32ri %1, 1, implicit-def dead $eflags
+    dead %2:gr32 = MOV32rr killed %0
+    %0:gr32 = COPY killed undef %1
+    JMP_1 %bb.1
+
+...
diff --git a/llvm/test/CodeGen/X86/coalescer-remat-with-undef-implicit-def-operand.mir b/llvm/test/CodeGen/X86/coalescer-remat-with-undef-implicit-def-operand.mir
new file mode 100644
index 00000000000000..42d17130412b61
--- /dev/null
+++ b/llvm/test/CodeGen/X86/coalescer-remat-with-undef-implicit-def-operand.mir
@@ -0,0 +1,123 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 4
+# RUN: llc -mtriple=x86_64-pc-linux-gnu -verify-coalescing -run-pass=register-coalescer -o - %s | FileCheck %s
+
+# The %1 = MOV32r0 is rematerialized as a subregister of %2. The
+# implicit-def %1 operand needs to have an undef added, just like the
+# main result operand.
+
+---
+name:           remat_into_subregister_set_undef_implicit_operand_subregisters
+tracksRegLiveness: true
+body:             |
+  ; CHECK-LABEL: name: remat_into_subregister_set_undef_implicit_operand_subregisters
+  ; CHECK: bb.0:
+  ; CHECK-NEXT:   successors: %bb.1(0x80000000)
+  ; CHECK-NEXT:   liveins: $rdi
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT:   undef [[MOV32r0_:%[0-9]+]].sub_32bit:gr64_with_sub_8bit = MOV32r0 implicit-def dead $eflags, implicit-def [[MOV32r0_]]
+  ; CHECK-NEXT:   [[MOV32r0_1:%[0-9]+]]:gr32 = MOV32r0 implicit-def dead $eflags, implicit-def [[MOV32r0_1]]
+  ; CHECK-NEXT:   undef [[MOV32r0_2:%[0-9]+]].sub_32bit:gr64_with_sub_8bit = MOV32r0 implicit-def dead $eflags, implicit-def undef [[MOV32r0_2]].sub_32bit, implicit-def [[MOV32r0_2]]
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.1:
+  ; CHECK-NEXT:   successors: %bb.2(0x80000000)
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT:   [[MOV32r0_2:%[0-9]+]].sub_32bit:gr64_with_sub_8bit = XOR32ri [[MOV32r0_2]].sub_32bit, 1, implicit-def dead $eflags
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.2:
+  ; CHECK-NEXT:   successors: %bb.4(0x40000000), %bb.3(0x40000000)
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT:   JCC_1 %bb.4, 5, implicit killed undef $eflags
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.3:
+  ; CHECK-NEXT:   successors: %bb.4(0x80000000)
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.4:
+  ; CHECK-NEXT:   successors: %bb.1(0x80000000)
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT:   dead [[MOV32rr:%[0-9]+]]:gr32 = MOV32rr [[MOV32r0_1]]
+  ; CHECK-NEXT:   dead [[SHL64ri:%[0-9]+]]:gr64_nosp = SHL64ri [[MOV32r0_]], 4, implicit-def dead $eflags
+  ; CHECK-NEXT:   [[MOV32r0_1:%[0-9]+]]:gr32 = COPY [[MOV32r0_2]].sub_32bit
+  ; CHECK-NEXT:   JMP_1 %bb.1
+  bb.0:
+    liveins: $rdi
+
+    undef %0.sub_32bit:gr64_with_sub_8bit = MOV32r0 implicit-def dead $eflags, implicit-def %0
+    %1:gr32 = MOV32r0 implicit-def dead $eflags, implicit-def %1
+    undef %2.sub_32bit:gr64_with_sub_8bit = COPY %1, implicit-def %2
+
+  bb.1:
+    %2.sub_32bit:gr64_with_sub_8bit = XOR32ri %2.sub_32bit, 1, implicit-def dead $eflags
+
+  bb.2:
+    JCC_1 %bb.4, 5, implicit killed undef $eflags
+
+  bb.3:
+
+  bb.4:
+    dead %3:gr32 = MOV32rr %1
+    dead %4:gr64_nosp = SHL64ri %0, 4, implicit-def dead $eflags
+    %1:gr32 = COPY %2.sub_32bit
+    JMP_1 %bb.1
+
+...
+
+# Same, except the implicit-def on the original instruction already
+# has a subregister index.
+
+---
+name:           remat_into_subregister_set_undef_implicit_operand_subregisters_with_subreg
+tracksRegLiveness: true
+body:             |
+  ; CHECK-LABEL: name: remat_into_subregister_set_undef_implicit_operand_subregisters_with_subreg
+  ; CHECK: bb.0:
+  ; CHECK-NEXT:   successors: %bb.1(0x80000000)
+  ; CHECK-NEXT:   liveins: $rdi
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT:   undef [[MOV32r0_:%[0-9]+]].sub_32bit:gr64_with_sub_8bit = MOV32r0 implicit-def dead $eflags, implicit-def [[MOV32r0_]]
+  ; CHECK-NEXT:   [[MOV32r0_1:%[0-9]+]]:gr32 = MOV32r0 implicit-def dead $eflags, implicit-def undef [[MOV32r0_1]].sub_8bit
+  ; CHECK-NEXT:   undef [[MOV32r0_2:%[0-9]+]].sub_32bit:gr64_with_sub_8bit = MOV32r0 implicit-def dead $eflags, implicit-def undef [[MOV32r0_2]].sub_8bit, implicit-def [[MOV32r0_2]]
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.1:
+  ; CHECK-NEXT:   successors: %bb.2(0x80000000)
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT:   [[MOV32r0_2:%[0-9]+]].sub_32bit:gr64_with_sub_8bit = XOR32ri [[MOV32r0_2]].sub_32bit, 1, implicit-def dead $eflags
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.2:
+  ; CHECK-NEXT:   successors: %bb.4(0x40000000), %bb.3(0x40000000)
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT:   JCC_1 %bb.4, 5, implicit killed undef $eflags
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.3:
+  ; CHECK-NEXT:   successors: %bb.4(0x80000000)
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.4:
+  ; CHECK-NEXT:   successors: %bb.1(0x80000000)
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT:   dead [[MOV32rr:%[0-9]+]]:gr32 = MOV32rr [[MOV32r0_1]]
+  ; CHECK-NEXT:   dead [[SHL64ri:%[0-9]+]]:gr64_nosp = SHL64ri [[MOV32r0_]], 4, implicit-def dead $eflags
+  ; CHECK-NEXT:   [[MOV32r0_1:%[0-9]+]]:gr32 = COPY [[MOV32r0_2]].sub_32bit
+  ; CHECK-NEXT:   JMP_1 %bb.1
+  bb.0:
+    liveins: $rdi
+
+    undef %0.sub_32bit:gr64_with_sub_8bit = MOV32r0 implicit-def dead $eflags, implicit-def %0
+    %1:gr32 = MOV32r0 implicit-def dead $eflags, undef implicit-def %1.sub_8bit
+    undef %2.sub_32bit:gr64_with_sub_8bit = COPY %1, implicit-def %2
+
+  bb.1:
+    %2.sub_32bit:gr64_with_sub_8bit = XOR32ri %2.sub_32bit, 1, implicit-def dead $eflags
+
+  bb.2:
+    JCC_1 %bb.4, 5, implicit killed undef $eflags
+
+  bb.3:
+
+  bb.4:
+    dead %3:gr32 = MOV32rr %1
+    dead %4:gr64_nosp = SHL64ri %0, 4, implicit-def dead $eflags
+    %1:gr32 = COPY %2.sub_32bit
+    JMP_1 %bb.1
+
+...

@arsenm arsenm merged commit 300a550 into llvm:main Dec 13, 2023
@arsenm arsenm deleted the coalescer-set-register-def-read-undef branch December 13, 2023 08:15
bzEq pushed a commit that referenced this pull request Dec 18, 2023
…fault (#75772)

If MI is not PPC specific instructions, let base implementation decide
if MI is rematerizable.
This can fix failure in #75570 after #75271 .
bzEq pushed a commit to bzEq/llvm-project that referenced this pull request Dec 25, 2023
Demonstrate `IMPLICIT_DEF implicit-def ...` can be generated after
coalescing on PPC.
bzEq pushed a commit that referenced this pull request Dec 25, 2023
Demonstrate `IMPLICIT_DEF implicit-def ...` can be generated after
coalescing on PPC.

The case is reduced from failure in #75570. The failure is triggered
after #75271 .
aeubanks added a commit that referenced this pull request Jan 3, 2024
This reverts commit 5cfc7b3.

This depends on 0e46b49 which is being reverted.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants