Skip to content

[CodeGen] Correctly handle non-standard cases in RemoveLoadsIntoFakeUses #111551

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Jan 28, 2025
70 changes: 42 additions & 28 deletions llvm/lib/CodeGen/RemoveLoadsIntoFakeUses.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@
#include "llvm/IR/Function.h"
#include "llvm/InitializePasses.h"
#include "llvm/Support/Debug.h"
#include "llvm/Target/TargetMachine.h"

using namespace llvm;

Expand Down Expand Up @@ -74,6 +75,10 @@ INITIALIZE_PASS_END(RemoveLoadsIntoFakeUses, DEBUG_TYPE,
"Remove Loads Into Fake Uses", false, false)

bool RemoveLoadsIntoFakeUses::runOnMachineFunction(MachineFunction &MF) {
// Skip this pass if we would use VarLoc-based LDV, as there may be DBG_VALUE
// instructions of the restored values that would become invalid.
if (!MF.useDebugInstrRef())
return false;
// Only run this for functions that have fake uses.
if (!MF.hasFakeUses() || skipFunction(MF.getFunction()))
return false;
Expand All @@ -86,20 +91,20 @@ bool RemoveLoadsIntoFakeUses::runOnMachineFunction(MachineFunction &MF) {
const TargetInstrInfo *TII = ST.getInstrInfo();
const TargetRegisterInfo *TRI = ST.getRegisterInfo();

SmallDenseMap<Register, SmallVector<MachineInstr *>> RegFakeUses;
SmallVector<MachineInstr *> RegFakeUses;
LivePhysRegs.init(*TRI);
SmallVector<MachineInstr *, 16> Statepoints;
for (MachineBasicBlock *MBB : post_order(&MF)) {
RegFakeUses.clear();
LivePhysRegs.addLiveOuts(*MBB);

for (MachineInstr &MI : make_early_inc_range(reverse(*MBB))) {
if (MI.isFakeUse()) {
for (const MachineOperand &MO : MI.operands()) {
// Track the Fake Uses that use this register so that we can delete
// them if we delete the corresponding load.
if (MO.isReg())
RegFakeUses[MO.getReg()].push_back(&MI);
}
if (MI.getNumOperands() == 0 || !MI.getOperand(0).isReg())
continue;
// Track the Fake Uses that use these register units so that we can
// delete them if we delete the corresponding load.
RegFakeUses.push_back(&MI);
// Do not record FAKE_USE uses in LivePhysRegs so that we can recognize
// otherwise-unused loads.
continue;
Expand All @@ -109,31 +114,38 @@ bool RemoveLoadsIntoFakeUses::runOnMachineFunction(MachineFunction &MF) {
// reload of a spilled register.
if (MI.getRestoreSize(TII)) {
Register Reg = MI.getOperand(0).getReg();
assert(Reg.isPhysical() && "VReg seen in function with NoVRegs set?");
// Don't delete live physreg defs, or any reserved register defs.
if (!LivePhysRegs.available(Reg) || MRI->isReserved(Reg))
continue;
// There should be an exact match between the loaded register and the
// FAKE_USE use. If not, this is a load that is unused by anything? It
// should probably be deleted, but that's outside of this pass' scope.
if (RegFakeUses.contains(Reg)) {
// There should typically be an exact match between the loaded register
// and the FAKE_USE, but sometimes regalloc will choose to load a larger
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try to use MachineInstr::readsRegister and modifiesRegister queries?

// value than is needed. Therefore, as long as the load isn't used by
Comment on lines +120 to +122
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds like it's just an optimization bug. Do you have an example?

Copy link
Contributor Author

@SLTozer SLTozer Oct 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can post a full file if you want, but to summarize with snippets:
Pre-regalloc we have:

bb.5.for.end16:
  %2:gr32 = COPY killed %19:gr32
  FAKE_USE killed %2:gr32
  ...

After some merging, including %2 merging with %19, we get:

544B	bb.5.for.end16:
576B	  FAKE_USE %20.sub_32bit:gr64_with_sub_8bit

Then, after the following update:

selectOrSplit GR64_with_sub_8bit:%20 [120r,432B:0)[472r,544B:1)[544B,576r:2) 0@120r 1@472r 2@544B-phi  weight:5.118239e-02 w=5.118239e-02
RS_Spill Cascade 0
Inline spilling GR64_with_sub_8bit:%20 [120r,432B:0)[472r,544B:1)[544B,576r:2) 0@120r 1@472r 2@544B-phi  weight:5.118239e-02
From original %14
	cannot remat for 576e	FAKE_USE %20.sub_32bit:gr64_with_sub_8bit
Merged spilled regs: SS#0 [120r,432B:0)[472r,576r:0) 0@x  weight:0.000000e+00
spillAroundUses %20
	merged orig valno 1: SS#0 [112B,432B:0)[472r,576r:0) 0@x  weight:0.000000e+00
Checking redundant spills for 1@112B in %21 [96r,112B:0)[112B,120r:1)[592r,672B:2) 0@96r 1@112B-phi 2@592r  weight:1.348007e-01
Merged to stack int: SS#0 [112B,432B:0)[472r,576r:0) 0@x  weight:0.000000e+00
Checking redundant spills for 0@120r in %20 [120r,432B:0)[472r,544B:1)[544B,576r:2) 0@120r 1@472r 2@544B-phi  weight:5.118239e-02
	hoisted: 112B	MOV64mr %stack.0, 1, $noreg, 0, $noreg, %21:gr64_with_sub_8bit :: (store (s64) into %stack.0)
	folded:   472r	MOV64mi32 %stack.0, 1, $noreg, 0, $noreg, 0 :: (store (s64) into %stack.0)
	reload:   552r	%23:gr64_with_sub_8bit = MOV64rm %stack.0, 1, $noreg, 0, $noreg :: (load (s64) from %stack.0)
	rewrite: 576r	FAKE_USE killed %23.sub_32bit:gr64_with_sub_8bit

We end up with:

544B	bb.5.for.end16:
552B	  %23:gr64 = MOV64rm %stack.0, 1, $noreg, 0, $noreg :: (load (s64) from %stack.0)
576B	  FAKE_USE %23.sub_32bit:gr64

Which becomes:

bb.5.for.end16:
  renamable $rax = MOV64rm %stack.0, 1, $noreg, 0, $noreg :: (load (s64) from %stack.0)
  FAKE_USE renamable $eax, implicit killed $rax

Where $rax is entirely unused after the MOV other than the FAKE_USE of $eax.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right, the spilling and splitting code does not know about sub registers. This is a severe problem for AMDGPU (and is near the top of my issues I would like to fix)

// anything except at least one FAKE_USE, we will delete it. If it isn't
// used by any fake uses, it should still be safe to delete but we
// choose to ignore it so that this pass has no side effects unrelated
// to fake uses.
SmallDenseSet<MachineInstr *> FakeUsesToDelete;
SmallVector<MachineInstr *> RemainingFakeUses;
for (MachineInstr *&FakeUse : reverse(RegFakeUses)) {
if (FakeUse->readsRegister(Reg, TRI)) {
FakeUsesToDelete.insert(FakeUse);
RegFakeUses.erase(&FakeUse);
}
}
if (!FakeUsesToDelete.empty()) {
LLVM_DEBUG(dbgs() << "RemoveLoadsIntoFakeUses: DELETING: " << MI);
// It is possible that some DBG_VALUE instructions refer to this
// instruction. They will be deleted in the live debug variable
// analysis.
// Since this load only exists to restore a spilled register and we
// haven't, run LiveDebugValues yet, there shouldn't be any DBG_VALUEs
// for this load; otherwise, deleting this would be incorrect.
Comment on lines +137 to +139
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the nasty feeling that this is only true with instruction-referencing LiveDebugValues, where we're able to track the location of values through stack spills and restores. IIRC, in the location-based model LiveDebugVariables will issue DBG_VALUEs when values get re-loaded? (98% sure, we have to insert or not-insert DW_OP_deref in that model when such transitions occur). The failure mode in the location-based model would be setting a variable location to a register that is now no longer defined by a restore, i.e. pointing it at some garbage value.

Is there anything we can easily do about this? Only supporting fake-uses with instruction-referencing debug-info is an easy workaround with technical justification, but we should put reasonable effort into making this more general. There are non-x86 folks out there who're using fake-uses downstream who might be inconvenienced -- they're not immune from this failure-mode though, and would have the workaround of disabling this pass.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the easiest way would be to disable this specific pass for non-instruction-referencing builds - it's a relatively small optimization, and not necessary for correctness. The alternative would be to track uses by DBG_VALUES to check that the loaded value isn't used by one; any preferences?

MI.eraseFromParent();
AnyChanges = true;
++NumLoadsDeleted;
// Each FAKE_USE now appears to be a fake use of the previous value
// of the loaded register; delete them to avoid incorrectly
// interpreting them as such.
for (MachineInstr *FakeUse : RegFakeUses[Reg]) {
for (MachineInstr *FakeUse : FakeUsesToDelete) {
LLVM_DEBUG(dbgs()
<< "RemoveLoadsIntoFakeUses: DELETING: " << *FakeUse);
FakeUse->eraseFromParent();
}
NumFakeUsesDeleted += RegFakeUses[Reg].size();
RegFakeUses[Reg].clear();
NumFakeUsesDeleted += FakeUsesToDelete.size();
}
continue;
}
Expand All @@ -143,13 +155,15 @@ bool RemoveLoadsIntoFakeUses::runOnMachineFunction(MachineFunction &MF) {
// that register.
if (!RegFakeUses.empty()) {
for (const MachineOperand &MO : MI.operands()) {
if (MO.isReg() && MO.isDef()) {
Register Reg = MO.getReg();
assert(Reg.isPhysical() &&
"VReg seen in function with NoVRegs set?");
for (MCRegUnit Unit : TRI->regunits(Reg))
RegFakeUses.erase(Unit);
}
if (!MO.isReg())
continue;
Register Reg = MO.getReg();
// We clear RegFakeUses for this register and all subregisters,
// because any such FAKE_USE encountered prior is no longer relevant
// for later encountered loads.
for (MachineInstr *&FakeUse : reverse(RegFakeUses))
if (FakeUse->readsRegister(Reg, TRI))
RegFakeUses.erase(&FakeUse);
}
}
LivePhysRegs.stepBackward(MI);
Expand Down
171 changes: 171 additions & 0 deletions llvm/test/CodeGen/X86/fake-use-remove-loads.mir
Original file line number Diff line number Diff line change
@@ -0,0 +1,171 @@
# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
# Ensure that loads into FAKE_USEs are correctly removed by the
# remove-loads-into-fake-uses pass, and that if the function does not use
# instruction referencing then no changes are made.
# RUN: llc %s -run-pass remove-loads-into-fake-uses -mtriple=x86_64-unknown-linux -debug-only=remove-loads-into-fake-uses 2>&1 -o - | FileCheck %s
# REQUIRES: asserts
#
## We verify that:
## - The load into the FAKE_USE is removed, along with the FAKE_USE itself,
## even when the FAKE_USE is for a subregister of the move.
## - We correctly handle situations where FAKE_USE has additional `killed`
## operands added by other passes.
## - The store to the stack slot still exists.
## - When the register has a use between the restore and the FAKE_USE, we do
## not delete the load or fake use.


---
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simplify this down to one block?

name: enabled
alignment: 16
tracksRegLiveness: true
noPhis: true
noVRegs: true
hasFakeUses: true
tracksDebugUserValues: true
debugInstrRef: true
liveins:
- { reg: '$rdi', virtual-reg: '' }
- { reg: '$esi', virtual-reg: '' }
- { reg: '$rdx', virtual-reg: '' }
frameInfo:
isCalleeSavedInfoValid: true
stack:
- { id: 0, name: '', type: spill-slot, offset: -8, size: 8, alignment: 8,
stack-id: default, callee-saved-register: '', callee-saved-restored: true,
debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
- { id: 1, name: '', type: spill-slot, offset: -16, size: 8, alignment: 8,
stack-id: default, callee-saved-register: '', callee-saved-restored: true,
debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
body: |
bb.0:
liveins: $esi, $rdi, $rdx, $r15, $r14, $r13, $r12, $r11, $rbx

; CHECK-LABEL: name: enabled
; CHECK: liveins: $esi, $rdi, $rdx, $r15, $r14, $r13, $r12, $r11, $rbx
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: $rbx = MOV64rr $rdx
; CHECK-NEXT: $r14d = MOV32rr $esi
; CHECK-NEXT: $r15 = MOV64rr $rdi
; CHECK-NEXT: renamable $r12d = XOR32rr undef $r12d, undef $r12d, implicit-def dead $eflags, implicit-def $r12
; CHECK-NEXT: renamable $eax = XOR32rr undef $eax, undef $eax, implicit-def dead $eflags, implicit-def $rax

;; The store to the stack slot is still present.
; CHECK-NEXT: MOV64mr $rbp, 1, $noreg, -48, $noreg, killed renamable $rax :: (store (s64) into %stack.0)

; CHECK-NEXT: MOV64mr $rbp, 1, $noreg, -40, $noreg, killed renamable $r11 :: (store (s64) into %stack.1)
; CHECK-NEXT: renamable $eax = XOR32rr undef $eax, undef $eax, implicit-def dead $eflags
; CHECK-NEXT: $r13d = MOV32rr killed $eax
; CHECK-NEXT: $rdi = MOV64rr $r15
; CHECK-NEXT: CALL64r renamable $r12, csr_64, implicit $rsp, implicit $ssp, implicit $rdi, implicit-def $rsp, implicit-def $ssp
; CHECK-NEXT: dead renamable $eax = MOV32rm renamable $rbx, 1, $noreg, 0, $noreg
; CHECK-NEXT: renamable $eax = MOV32ri 1
; CHECK-NEXT: TEST8ri renamable $r14b, 1, implicit-def $eflags

;; First FAKE_USE and its corresponding load are removed; second FAKE_USE of
;; a restored value that is also used is preserved.
; CHECK-NEXT: renamable $r11 = MOV64rm $rbp, 1, $noreg, -40, $noreg :: (load (s64) from %stack.1)
; CHECK-NEXT: renamable $r12d = XOR32rr $r12d, $r11d, implicit-def dead $eflags
; CHECK-NEXT: FAKE_USE killed renamable $r11d

; CHECK-NEXT: TEST32rr killed renamable $r13d, renamable $r13d, implicit-def $eflags
; CHECK-NEXT: RET64

$rbx = MOV64rr $rdx
$r14d = MOV32rr $esi
$r15 = MOV64rr $rdi
renamable $r12d = XOR32rr undef $r12d, undef $r12d, implicit-def dead $eflags, implicit-def $r12
renamable $eax = XOR32rr undef $eax, undef $eax, implicit-def dead $eflags, implicit-def $rax
MOV64mr $rbp, 1, $noreg, -48, $noreg, killed renamable $rax :: (store (s64) into %stack.0)
MOV64mr $rbp, 1, $noreg, -40, $noreg, killed renamable $r11 :: (store (s64) into %stack.1)
renamable $eax = XOR32rr undef $eax, undef $eax, implicit-def dead $eflags
$r13d = MOV32rr killed $eax
$rdi = MOV64rr $r15
CALL64r renamable $r12, csr_64, implicit $rsp, implicit $ssp, implicit $rdi, implicit-def $rsp, implicit-def $ssp
dead renamable $eax = MOV32rm renamable $rbx, 1, $noreg, 0, $noreg
renamable $eax = MOV32ri 1
TEST8ri renamable $r14b, 1, implicit-def $eflags
renamable $rax = MOV64rm $rbp, 1, $noreg, -48, $noreg :: (load (s64) from %stack.0)
FAKE_USE renamable $eax, implicit killed $rax
renamable $r11 = MOV64rm $rbp, 1, $noreg, -40, $noreg :: (load (s64) from %stack.1)
renamable $r12d = XOR32rr $r12d, $r11d, implicit-def dead $eflags
FAKE_USE killed renamable $r11d
TEST32rr killed renamable $r13d, renamable $r13d, implicit-def $eflags
RET64

...
---
name: disabled
alignment: 16
tracksRegLiveness: true
noPhis: true
noVRegs: true
hasFakeUses: true
tracksDebugUserValues: true
debugInstrRef: false
liveins:
- { reg: '$rdi', virtual-reg: '' }
- { reg: '$esi', virtual-reg: '' }
- { reg: '$rdx', virtual-reg: '' }
frameInfo:
isCalleeSavedInfoValid: true
stack:
- { id: 0, name: '', type: spill-slot, offset: -8, size: 8, alignment: 8,
stack-id: default, callee-saved-register: '', callee-saved-restored: true,
debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
- { id: 1, name: '', type: spill-slot, offset: -16, size: 8, alignment: 8,
stack-id: default, callee-saved-register: '', callee-saved-restored: true,
debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
body: |
bb.0:
liveins: $esi, $rdi, $rdx, $r15, $r14, $r13, $r12, $r11, $rbx

; CHECK-LABEL: name: disabled
; CHECK: liveins: $esi, $rdi, $rdx, $r15, $r14, $r13, $r12, $r11, $rbx
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: $rbx = MOV64rr $rdx
; CHECK-NEXT: $r14d = MOV32rr $esi
; CHECK-NEXT: $r15 = MOV64rr $rdi
; CHECK-NEXT: renamable $r12d = XOR32rr undef $r12d, undef $r12d, implicit-def dead $eflags, implicit-def $r12
; CHECK-NEXT: renamable $eax = XOR32rr undef $eax, undef $eax, implicit-def dead $eflags, implicit-def $rax
; CHECK-NEXT: MOV64mr $rbp, 1, $noreg, -48, $noreg, killed renamable $rax :: (store (s64) into %stack.0)
; CHECK-NEXT: MOV64mr $rbp, 1, $noreg, -40, $noreg, killed renamable $r11 :: (store (s64) into %stack.1)
; CHECK-NEXT: renamable $eax = XOR32rr undef $eax, undef $eax, implicit-def dead $eflags
; CHECK-NEXT: $r13d = MOV32rr killed $eax
; CHECK-NEXT: $rdi = MOV64rr $r15
; CHECK-NEXT: CALL64r renamable $r12, csr_64, implicit $rsp, implicit $ssp, implicit $rdi, implicit-def $rsp, implicit-def $ssp
; CHECK-NEXT: dead renamable $eax = MOV32rm renamable $rbx, 1, $noreg, 0, $noreg
; CHECK-NEXT: renamable $eax = MOV32ri 1
; CHECK-NEXT: TEST8ri renamable $r14b, 1, implicit-def $eflags

;; Verify that when instr-ref is disabled, we do not remove fake uses.
; CHECK-NEXT: renamable $rax = MOV64rm $rbp, 1, $noreg, -48, $noreg :: (load (s64) from %stack.0)
; CHECK-NEXT: FAKE_USE renamable $eax, implicit killed $rax
; CHECK-NEXT: renamable $r11 = MOV64rm $rbp, 1, $noreg, -40, $noreg :: (load (s64) from %stack.1)
; CHECK-NEXT: renamable $r12d = XOR32rr $r12d, $r11d, implicit-def dead $eflags
; CHECK-NEXT: FAKE_USE killed renamable $r11d
; CHECK-NEXT: TEST32rr killed renamable $r13d, renamable $r13d, implicit-def $eflags
; CHECK-NEXT: RET64
$rbx = MOV64rr $rdx
$r14d = MOV32rr $esi
$r15 = MOV64rr $rdi
renamable $r12d = XOR32rr undef $r12d, undef $r12d, implicit-def dead $eflags, implicit-def $r12
renamable $eax = XOR32rr undef $eax, undef $eax, implicit-def dead $eflags, implicit-def $rax
MOV64mr $rbp, 1, $noreg, -48, $noreg, killed renamable $rax :: (store (s64) into %stack.0)
MOV64mr $rbp, 1, $noreg, -40, $noreg, killed renamable $r11 :: (store (s64) into %stack.1)
renamable $eax = XOR32rr undef $eax, undef $eax, implicit-def dead $eflags
$r13d = MOV32rr killed $eax
$rdi = MOV64rr $r15
CALL64r renamable $r12, csr_64, implicit $rsp, implicit $ssp, implicit $rdi, implicit-def $rsp, implicit-def $ssp
dead renamable $eax = MOV32rm renamable $rbx, 1, $noreg, 0, $noreg
renamable $eax = MOV32ri 1
TEST8ri renamable $r14b, 1, implicit-def $eflags
renamable $rax = MOV64rm $rbp, 1, $noreg, -48, $noreg :: (load (s64) from %stack.0)
FAKE_USE renamable $eax, implicit killed $rax
renamable $r11 = MOV64rm $rbp, 1, $noreg, -40, $noreg :: (load (s64) from %stack.1)
renamable $r12d = XOR32rr $r12d, $r11d, implicit-def dead $eflags
FAKE_USE killed renamable $r11d
TEST32rr killed renamable $r13d, renamable $r13d, implicit-def $eflags
RET64

...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Negative test with an intervening use of other sub registers

Loading