Skip to content

[AArch64][GlobalISel] Add support for pre-indexed loads/stores. #70185

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Oct 26, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 20 additions & 3 deletions llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1141,11 +1141,28 @@ bool CombinerHelper::findPreIndexCandidate(GLoadStore &LdSt, Register &Addr,
return false;
}

// Avoid increasing cross-block register pressure.
for (auto &AddrUse : MRI.use_nodbg_instructions(Addr))
if (AddrUse.getParent() != LdSt.getParent())
return false;

// FIXME: check whether all uses of the base pointer are constant PtrAdds.
// That might allow us to end base's liveness here by adjusting the constant.

return all_of(MRI.use_nodbg_instructions(Addr),
[&](MachineInstr &UseMI) { return dominates(LdSt, UseMI); });
bool RealUse = false;
for (auto &AddrUse : MRI.use_nodbg_instructions(Addr)) {
if (!dominates(LdSt, AddrUse))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defer the dominance check until after you know the mode is foldable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The foldable check isn't an early exit, so we still need to do this for every use.

return false; // All use must be dominated by the load/store.

// If Ptr may be folded in addressing mode of other use, then it's
// not profitable to do this transformation.
if (auto *UseLdSt = dyn_cast<GLoadStore>(&AddrUse)) {
if (!canFoldInAddressingMode(UseLdSt, TLI, MRI))
RealUse = true;
} else {
RealUse = true;
}
}
return RealUse;
}

bool CombinerHelper::matchCombineIndexedLoadStore(
Expand Down
4 changes: 0 additions & 4 deletions llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -23699,10 +23699,6 @@ bool AArch64TargetLowering::mayBeEmittedAsTailCall(const CallInst *CI) const {
bool AArch64TargetLowering::isIndexingLegal(MachineInstr &MI, Register Base,
Register Offset, bool IsPre,
MachineRegisterInfo &MRI) const {
// HACK
if (IsPre)
return false; // Until we implement.

auto CstOffset = getIConstantVRegVal(Offset, MRI);
if (!CstOffset || CstOffset->isZero())
return false;
Expand Down
83 changes: 52 additions & 31 deletions llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -5659,24 +5659,34 @@ bool AArch64InstructionSelector::selectIndexedLoad(MachineInstr &MI,
Register WriteBack = Ld.getWritebackReg();
Register Base = Ld.getBaseReg();
Register Offset = Ld.getOffsetReg();

if (Ld.isPre())
return false; // TODO: add pre-inc support

unsigned Opc = 0;
static constexpr unsigned GPROpcodes[] = {
AArch64::LDRBBpost, AArch64::LDRHHpost, AArch64::LDRWpost,
AArch64::LDRXpost};
static constexpr unsigned FPROpcodes[] = {
AArch64::LDRBpost, AArch64::LDRHpost, AArch64::LDRSpost,
AArch64::LDRDpost, AArch64::LDRQpost};

LLT Ty = MRI.getType(Dst);
assert(Ty.getSizeInBits() <= 128 && "Unexpected type for indexed load");
unsigned MemSize = Ld.getMMO().getMemoryType().getSizeInBytes();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dumb question: are we guaranteed that Ty.getSizeInBits() == MemSize, or could the size of the memory operand be wider than what's actually loaded?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. We have separate opcodes for extending loads, G_INDEXED_ZEXTLOAD/G_INDEXED_SEXTLOAD. If the load's destination type is > memory type here it's an any extending load, so we just use the memory type to decide which instruction to pick.

if (RBI.getRegBank(Dst, MRI, TRI)->getID() == AArch64::FPRRegBankID)
Opc = FPROpcodes[Log2_32(MemSize)];
else
Opc = GPROpcodes[Log2_32(MemSize)];

unsigned Opc = 0;
if (Ld.isPre()) {
static constexpr unsigned GPROpcodes[] = {
AArch64::LDRBBpre, AArch64::LDRHHpre, AArch64::LDRWpre,
AArch64::LDRXpre};
static constexpr unsigned FPROpcodes[] = {
AArch64::LDRBpre, AArch64::LDRHpre, AArch64::LDRSpre, AArch64::LDRDpre,
AArch64::LDRQpre};
if (RBI.getRegBank(Dst, MRI, TRI)->getID() == AArch64::FPRRegBankID)
Opc = FPROpcodes[Log2_32(MemSize)];
else
Opc = GPROpcodes[Log2_32(MemSize)];
} else {
static constexpr unsigned GPROpcodes[] = {
AArch64::LDRBBpost, AArch64::LDRHHpost, AArch64::LDRWpost,
AArch64::LDRXpost};
static constexpr unsigned FPROpcodes[] = {
AArch64::LDRBpost, AArch64::LDRHpost, AArch64::LDRSpost,
AArch64::LDRDpost, AArch64::LDRQpost};
if (RBI.getRegBank(Dst, MRI, TRI)->getID() == AArch64::FPRRegBankID)
Opc = FPROpcodes[Log2_32(MemSize)];
else
Opc = GPROpcodes[Log2_32(MemSize)];
}
auto Cst = getIConstantVRegVal(Offset, MRI);
if (!Cst)
return false; // Shouldn't happen, but just in case.
Expand All @@ -5695,23 +5705,34 @@ bool AArch64InstructionSelector::selectIndexedStore(GIndexedStore &I,
Register Base = I.getBaseReg();
Register Offset = I.getOffsetReg();
LLT ValTy = MRI.getType(Val);

if (I.isPre())
return false; // TODO: add pre-inc support
assert(ValTy.getSizeInBits() <= 128 && "Unexpected type for indexed store");

unsigned Opc = 0;
static constexpr unsigned GPROpcodes[] = {
AArch64::STRBBpost, AArch64::STRHHpost, AArch64::STRWpost,
AArch64::STRXpost};
static constexpr unsigned FPROpcodes[] = {
AArch64::STRBpost, AArch64::STRHpost, AArch64::STRSpost,
AArch64::STRDpost, AArch64::STRQpost};

assert(ValTy.getSizeInBits() <= 128);
if (RBI.getRegBank(Val, MRI, TRI)->getID() == AArch64::FPRRegBankID)
Opc = FPROpcodes[Log2_32(ValTy.getSizeInBytes())];
else
Opc = GPROpcodes[Log2_32(ValTy.getSizeInBytes())];
if (I.isPre()) {
static constexpr unsigned GPROpcodes[] = {
AArch64::STRBBpre, AArch64::STRHHpre, AArch64::STRWpre,
AArch64::STRXpre};
static constexpr unsigned FPROpcodes[] = {
AArch64::STRBpre, AArch64::STRHpre, AArch64::STRSpre, AArch64::STRDpre,
AArch64::STRQpre};

if (RBI.getRegBank(Val, MRI, TRI)->getID() == AArch64::FPRRegBankID)
Opc = FPROpcodes[Log2_32(ValTy.getSizeInBytes())];
else
Opc = GPROpcodes[Log2_32(ValTy.getSizeInBytes())];
} else {
static constexpr unsigned GPROpcodes[] = {
AArch64::STRBBpost, AArch64::STRHHpost, AArch64::STRWpost,
AArch64::STRXpost};
static constexpr unsigned FPROpcodes[] = {
AArch64::STRBpost, AArch64::STRHpost, AArch64::STRSpost,
AArch64::STRDpost, AArch64::STRQpost};

if (RBI.getRegBank(Val, MRI, TRI)->getID() == AArch64::FPRRegBankID)
Opc = FPROpcodes[Log2_32(ValTy.getSizeInBytes())];
else
Opc = GPROpcodes[Log2_32(ValTy.getSizeInBytes())];
}

auto Cst = getIConstantVRegVal(Offset, MRI);
if (!Cst)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -87,3 +87,25 @@ body: |
$q0 = COPY %dst
RET_ReallyLR implicit $x0, implicit $q0
...
---
name: pre_store_s64
body: |
bb.0:
liveins: $x0, $x1

; CHECK-LABEL: name: pre_store_s64
; CHECK: liveins: $x0, $x1
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: %ptr:_(p0) = COPY $x0
; CHECK-NEXT: %val:_(s64) = COPY $x1
; CHECK-NEXT: %offset:_(s64) = G_CONSTANT i64 8
; CHECK-NEXT: %writeback:_(p0) = G_INDEXED_STORE %val(s64), %ptr, %offset(s64), 1 :: (store (s64))
; CHECK-NEXT: $x0 = COPY %writeback(p0)
; CHECK-NEXT: RET_ReallyLR implicit $x0
%ptr:_(p0) = COPY $x0
%val:_(s64) = COPY $x1
%offset:_(s64) = G_CONSTANT i64 8
%writeback:_(p0) = G_INDEXED_STORE %val, %ptr, %offset, 1 :: (store (s64), align 8)
$x0 = COPY %writeback
RET_ReallyLR implicit $x0
...
4 changes: 2 additions & 2 deletions llvm/test/CodeGen/AArch64/GlobalISel/store-merging.ll
Original file line number Diff line number Diff line change
Expand Up @@ -83,8 +83,8 @@ define void @test_simple_vector(ptr %ptr) {
; CHECK-NEXT: mov w8, #5 ; =0x5
; CHECK-NEXT: strh w9, [x0, #2]
; CHECK-NEXT: mov w9, #8 ; =0x8
; CHECK-NEXT: strh w8, [x0, #4]
; CHECK-NEXT: strh w9, [x0, #6]
; CHECK-NEXT: strh w8, [x0, #4]!
; CHECK-NEXT: strh w9, [x0, #2]
; CHECK-NEXT: ret
store <2 x i16> <i16 4, i16 7>, ptr %ptr
%addr2 = getelementptr <2 x i16>, ptr %ptr, i64 1
Expand Down
Loading