Skip to content

[GlobalIsel] Avoid aligning alloca size. #132064

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 31 additions & 13 deletions llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -3111,21 +3111,39 @@ bool IRTranslator::translateAlloca(const User &U,
getOrCreateVReg(*ConstantInt::get(IntPtrIRTy, DL->getTypeAllocSize(Ty)));
MIRBuilder.buildMul(AllocSize, NumElts, TySize);

// Round the size of the allocation up to the stack alignment size
// by add SA-1 to the size. This doesn't overflow because we're computing
// an address inside an alloca.
Align StackAlign = MF->getSubtarget().getFrameLowering()->getStackAlign();
auto SAMinusOne = MIRBuilder.buildConstant(IntPtrTy, StackAlign.value() - 1);
auto AllocAdd = MIRBuilder.buildAdd(IntPtrTy, AllocSize, SAMinusOne,
MachineInstr::NoUWrap);
auto AlignCst =
MIRBuilder.buildConstant(IntPtrTy, ~(uint64_t)(StackAlign.value() - 1));
auto AlignedAlloc = MIRBuilder.buildAnd(IntPtrTy, AllocAdd, AlignCst);

const TargetFrameLowering *TFI = MF->getSubtarget().getFrameLowering();
Align StackAlign = TFI->getStackAlign();
Align Alignment = std::max(AI.getAlign(), DL->getPrefTypeAlign(Ty));
if (Alignment <= StackAlign)

// If the stack alignment is stricter than the alloca's alignment, ignore the
// alloca's alignment. We will align the size of the alloca to the stack
// alignment, which will guarantee that the alloca's alignment is satisfied.
bool IsUnderAligned = Alignment <= StackAlign;
if (IsUnderAligned)
Alignment = Align(1);
MIRBuilder.buildDynStackAlloc(getOrCreateVReg(AI), AlignedAlloc, Alignment);

// If the stack grows up, adding the alloca's size to SP without padding may
// leave SP not aligned (to the stack alignment) after the alloca because we
// align SP (to the stack align or alloca align) *before* adding the alloca
// size. On the other hand, if the stack grows down, we will align SP *after*
// decrementing it, so there is no need to pad the size.
if (TFI->getStackGrowthDirection() == TargetFrameLowering::StackGrowsUp ||
IsUnderAligned) {
// Round the size of the allocation up to the stack alignment size
// by add SA-1 to the size. This doesn't overflow because we're computing
// an address inside an alloca.
auto SAMinusOne =
MIRBuilder.buildConstant(IntPtrTy, StackAlign.value() - 1);
auto AllocAdd = MIRBuilder.buildAdd(IntPtrTy, AllocSize, SAMinusOne,
MachineInstr::NoUWrap);
auto AlignCst =
MIRBuilder.buildConstant(IntPtrTy, ~(uint64_t)(StackAlign.value() - 1));
auto AlignedAlloc = MIRBuilder.buildAnd(IntPtrTy, AllocAdd, AlignCst);

MIRBuilder.buildDynStackAlloc(getOrCreateVReg(AI), AlignedAlloc, Alignment);
} else {
MIRBuilder.buildDynStackAlloc(getOrCreateVReg(AI), AllocSize, Alignment);
}

MF->getFrameInfo().CreateVariableSizedObject(Alignment, &AI);
assert(MF->getFrameInfo().hasVarSizedObjects());
Expand Down
43 changes: 27 additions & 16 deletions llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -4447,24 +4447,35 @@ void SelectionDAGBuilder::visitAlloca(const AllocaInst &I) {
DAG.getZExtOrTrunc(TySizeValue, dl, IntPtr));
}

// Handle alignment. If the requested alignment is less than or equal to
// the stack alignment, ignore it. If the size is greater than or equal to
// the stack alignment, we note this in the DYNAMIC_STACKALLOC node.
Align StackAlign = DAG.getSubtarget().getFrameLowering()->getStackAlign();
if (*Alignment <= StackAlign)
// Handle alignment. If the requested alignment is less than or equal to the
// stack alignment, ignore it since we will align the size. If the size is
// greater than or equal to the stack alignment, we note this in the
// DYNAMIC_STACKALLOC node.
const TargetFrameLowering *TFI = DAG.getSubtarget().getFrameLowering();
Align StackAlign = TFI->getStackAlign();
bool IsUnderAligned = *Alignment <= StackAlign;
if (IsUnderAligned)
Alignment = std::nullopt;

const uint64_t StackAlignMask = StackAlign.value() - 1U;
// Round the size of the allocation up to the stack alignment size
// by add SA-1 to the size. This doesn't overflow because we're computing
// an address inside an alloca.
AllocSize = DAG.getNode(ISD::ADD, dl, AllocSize.getValueType(), AllocSize,
DAG.getConstant(StackAlignMask, dl, IntPtr),
SDNodeFlags::NoUnsignedWrap);

// Mask out the low bits for alignment purposes.
AllocSize = DAG.getNode(ISD::AND, dl, AllocSize.getValueType(), AllocSize,
DAG.getSignedConstant(~StackAlignMask, dl, IntPtr));
// If the stack grows up, adding the alloca's size to SP without padding may
// leave SP not aligned (to the stack alignment) after the alloca because we
// align SP (to the stack align or alloca align) *before* adding the alloca
// size. On the other hand, if the stack grows down, we will align SP *after*
// decrementing it, so there is no need to align the size.
if (TFI->getStackGrowthDirection() == TargetFrameLowering::StackGrowsUp ||
IsUnderAligned) {
const uint64_t StackAlignMask = StackAlign.value() - 1U;
// Round the size of the allocation up to the stack alignment size
// by add SA-1 to the size. This doesn't overflow because we're computing
// an address inside an alloca.
AllocSize = DAG.getNode(ISD::ADD, dl, AllocSize.getValueType(), AllocSize,
DAG.getConstant(StackAlignMask, dl, IntPtr),
SDNodeFlags::NoUnsignedWrap);

// Mask out the low bits for alignment purposes.
AllocSize = DAG.getNode(ISD::AND, dl, AllocSize.getValueType(), AllocSize,
DAG.getSignedConstant(~StackAlignMask, dl, IntPtr));
}

SDValue Ops[] = {
getRoot(), AllocSize,
Expand Down
6 changes: 1 addition & 5 deletions llvm/test/CodeGen/AArch64/GlobalISel/dynamic-alloca.ll
Original file line number Diff line number Diff line change
Expand Up @@ -28,11 +28,7 @@ define ptr @test_aligned_alloca(i32 %numelts) {
; CHECK: [[C:%[0-9]+]]:_(s64) = G_CONSTANT i64 1
; CHECK: [[ZEXT:%[0-9]+]]:_(s64) = G_ZEXT [[COPY]](s32)
; CHECK: [[MUL:%[0-9]+]]:_(s64) = G_MUL [[ZEXT]], [[C]]
; CHECK: [[C1:%[0-9]+]]:_(s64) = G_CONSTANT i64 15
; CHECK: [[ADD:%[0-9]+]]:_(s64) = nuw G_ADD [[MUL]], [[C1]]
; CHECK: [[C2:%[0-9]+]]:_(s64) = G_CONSTANT i64 -16
; CHECK: [[AND:%[0-9]+]]:_(s64) = G_AND [[ADD]], [[C2]]
; CHECK: [[DYN_STACKALLOC:%[0-9]+]]:_(p0) = G_DYN_STACKALLOC [[AND]](s64), 32
; CHECK: [[DYN_STACKALLOC:%[0-9]+]]:_(p0) = G_DYN_STACKALLOC [[MUL]](s64), 32
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aemerson does this change seem reasonable to you?

; CHECK: $x0 = COPY [[DYN_STACKALLOC]](p0)
; CHECK: RET_ReallyLR implicit $x0
%addr = alloca i8, i32 %numelts, align 32
Expand Down
6 changes: 1 addition & 5 deletions llvm/test/CodeGen/AArch64/sme-framelower-use-bp.ll
Original file line number Diff line number Diff line change
Expand Up @@ -160,11 +160,7 @@ define void @quux() #1 {
; CHECK-NEXT: mov x9, sp
; CHECK-NEXT: subs x9, x9, #16
; CHECK-NEXT: mov sp, x9
; CHECK-NEXT: addvl x9, x8, #2
; CHECK-NEXT: mov w0, w9
; CHECK-NEXT: // implicit-def: $x9
; CHECK-NEXT: mov w9, w0
; CHECK-NEXT: and x9, x9, #0x7f0
; CHECK-NEXT: rdvl x9, #2
; CHECK-NEXT: mov x10, sp
; CHECK-NEXT: subs x10, x10, x9
; CHECK-NEXT: and x10, x10, #0xffffffffffffffe0
Expand Down
14 changes: 4 additions & 10 deletions llvm/test/CodeGen/AArch64/stack-probing-dynamic.ll
Original file line number Diff line number Diff line change
Expand Up @@ -107,13 +107,10 @@ define void @dynamic_align_64(i64 %size, ptr %out) #0 {
; CHECK-NEXT: .cfi_offset w29, -32
; CHECK-NEXT: sub x9, sp, #32
; CHECK-NEXT: and sp, x9, #0xffffffffffffffc0
; CHECK-NEXT: add x9, x0, #15
; CHECK-NEXT: mov x8, sp
; CHECK-DAG: str xzr, [sp]
; CHECK-DAG: and x9, x9, #0xfffffffffffffff0
; CHECK-NOT: INVALID_TO_BREAK_UP_CHECK_DAG
; CHECK-NEXT: str xzr, [sp]
; CHECK-DAG: mov x19, sp
; CHECK-DAG: sub x8, x8, x9
; CHECK-DAG: sub x8, x8, x0
; CHECK-NEXT: and x8, x8, #0xffffffffffffffc0
; CHECK-NEXT: .LBB2_1: // =>This Inner Loop Header: Depth=1
; CHECK-NEXT: sub sp, sp, #1, lsl #12 // =4096
Expand Down Expand Up @@ -167,13 +164,10 @@ define void @dynamic_align_8192(i64 %size, ptr %out) #0 {
; CHECK-NEXT: b .LBB3_1
; CHECK-NEXT: .LBB3_3:
; CHECK-NEXT: mov sp, x9
; CHECK-NEXT: add x9, x0, #15
; CHECK-NEXT: mov x8, sp
; CHECK-DAG: ldr xzr, [sp]
; CHECK-DAG: and x9, x9, #0xfffffffffffffff0
; CHECK-NOT: INVALID_TO_BREAK_UP_CHECK_DAG
; CHECK-NEXT: ldr xzr, [sp]
; CHECK-DAG: mov x19, sp
; CHECK-DAG: sub x8, x8, x9
; CHECK-DAG: sub x8, x8, x0
; CHECK-NEXT: and x8, x8, #0xffffffffffffe000
; CHECK-NEXT: .LBB3_4: // =>This Inner Loop Header: Depth=1
; CHECK-NEXT: sub sp, sp, #1, lsl #12 // =4096
Expand Down
4 changes: 1 addition & 3 deletions llvm/test/CodeGen/AArch64/sve-alloca.ll
Original file line number Diff line number Diff line change
Expand Up @@ -54,10 +54,8 @@ define void @foo(<vscale x 4 x i64> %dst, i1 %cond) {
; CHECK-NEXT: .cfi_escape 0x10, 0x4d, 0x0a, 0x11, 0x60, 0x22, 0x11, 0x50, 0x92, 0x2e, 0x00, 0x1e, 0x22 // $d13 @ cfa - 32 - 48 * VG
; CHECK-NEXT: .cfi_escape 0x10, 0x4e, 0x0a, 0x11, 0x60, 0x22, 0x11, 0x48, 0x92, 0x2e, 0x00, 0x1e, 0x22 // $d14 @ cfa - 32 - 56 * VG
; CHECK-NEXT: .cfi_escape 0x10, 0x4f, 0x0a, 0x11, 0x60, 0x22, 0x11, 0x40, 0x92, 0x2e, 0x00, 0x1e, 0x22 // $d15 @ cfa - 32 - 64 * VG
; CHECK-NEXT: rdvl x9, #2
; CHECK-NEXT: mov x8, sp
; CHECK-NEXT: add x9, x9, #15
; CHECK-NEXT: and x9, x9, #0xfffffffffffffff0
; CHECK-NEXT: cnth x9, all, mul #4
; CHECK-NEXT: sub x8, x8, x9
; CHECK-NEXT: and x0, x8, #0xffffffffffffffe0
; CHECK-NEXT: mov sp, x0
Expand Down
9 changes: 2 additions & 7 deletions llvm/test/CodeGen/PowerPC/aix-framepointer-save-restore.ll
Original file line number Diff line number Diff line change
Expand Up @@ -82,10 +82,8 @@ define dso_local void @frameptr_realigned(i32 %n) {
; AIX32-NEXT: slwi 3, 3, 2
; AIX32-NEXT: lwz 4, 0(1)
; AIX32-NEXT: li 5, -64
; AIX32-NEXT: addi 3, 3, 15
; AIX32-NEXT: mr 31, 1
; AIX32-NEXT: rlwinm 3, 3, 0, 0, 27
; AIX32-NEXT: neg 3, 3
; AIX32-NEXT: mr 31, 1
; AIX32-NEXT: and 5, 3, 5
; AIX32-NEXT: stwux 4, 1, 5
; AIX32-NEXT: addi 3, 1, 64
Expand All @@ -111,11 +109,8 @@ define dso_local void @frameptr_realigned(i32 %n) {
; AIX64-NEXT: rldic 3, 3, 2, 30
; AIX64-NEXT: ld 4, 0(1)
; AIX64-NEXT: li 5, -64
; AIX64-NEXT: addi 3, 3, 15
; AIX64-NEXT: mr 31, 1
; AIX64-NEXT: rldicl 3, 3, 60, 4
; AIX64-NEXT: rldicl 3, 3, 4, 29
; AIX64-NEXT: neg 3, 3
; AIX64-NEXT: mr 31, 1
; AIX64-NEXT: and 5, 3, 5
; AIX64-NEXT: stdux 4, 1, 5
; AIX64-NEXT: addi 3, 1, 128
Expand Down
3 changes: 0 additions & 3 deletions llvm/test/CodeGen/PowerPC/pr46759.ll
Original file line number Diff line number Diff line change
Expand Up @@ -33,9 +33,6 @@ define void @foo(i32 %vla_size) #0 {
; CHECK-LE-NEXT: li r4, -2048
; CHECK-LE-NEXT: li r6, -4096
; CHECK-LE-NEXT: mr r31, r1
; CHECK-LE-NEXT: addi r3, r3, 15
; CHECK-LE-NEXT: rldicl r3, r3, 60, 4
; CHECK-LE-NEXT: rldicl r3, r3, 4, 31
; CHECK-LE-NEXT: neg r5, r3
; CHECK-LE-NEXT: ld r3, 0(r1)
; CHECK-LE-NEXT: and r4, r5, r4
Expand Down
16 changes: 4 additions & 12 deletions llvm/test/CodeGen/PowerPC/stack-clash-prologue.ll
Original file line number Diff line number Diff line change
Expand Up @@ -835,20 +835,17 @@ define void @f11(i32 %vla_size, i64 %i) #0 {
; CHECK-LE-NEXT: .cfi_def_cfa_register r30
; CHECK-LE-NEXT: .cfi_offset r31, -8
; CHECK-LE-NEXT: .cfi_offset r30, -16
; CHECK-LE-NEXT: clrldi r3, r3, 32
; CHECK-LE-NEXT: lis r5, 1
; CHECK-LE-NEXT: mr r31, r1
; CHECK-LE-NEXT: li r6, 1
; CHECK-LE-NEXT: sldi r4, r4, 2
; CHECK-LE-NEXT: addi r3, r3, 15
; CHECK-LE-NEXT: li r6, 1
; CHECK-LE-NEXT: clrldi r3, r3, 32
; CHECK-LE-NEXT: ori r5, r5, 0
; CHECK-LE-NEXT: rldicl r3, r3, 60, 4
; CHECK-LE-NEXT: add r5, r31, r5
; CHECK-LE-NEXT: rldicl r3, r3, 4, 31
; CHECK-LE-NEXT: stwx r6, r5, r4
; CHECK-LE-NEXT: neg r5, r3
; CHECK-LE-NEXT: li r4, -32768
; CHECK-LE-NEXT: li r6, -4096
; CHECK-LE-NEXT: neg r5, r3
; CHECK-LE-NEXT: ld r3, 0(r1)
; CHECK-LE-NEXT: and r4, r5, r4
; CHECK-LE-NEXT: mr r5, r4
Expand Down Expand Up @@ -896,16 +893,13 @@ define void @f11(i32 %vla_size, i64 %i) #0 {
; CHECK-BE-NEXT: .cfi_def_cfa_register r30
; CHECK-BE-NEXT: .cfi_offset r31, -8
; CHECK-BE-NEXT: .cfi_offset r30, -16
; CHECK-BE-NEXT: clrldi r3, r3, 32
; CHECK-BE-NEXT: lis r5, 1
; CHECK-BE-NEXT: addi r3, r3, 15
; CHECK-BE-NEXT: mr r31, r1
; CHECK-BE-NEXT: ori r5, r5, 0
; CHECK-BE-NEXT: rldicl r3, r3, 60, 4
; CHECK-BE-NEXT: add r5, r31, r5
; CHECK-BE-NEXT: sldi r4, r4, 2
; CHECK-BE-NEXT: li r6, 1
; CHECK-BE-NEXT: rldicl r3, r3, 4, 31
; CHECK-BE-NEXT: clrldi r3, r3, 32
; CHECK-BE-NEXT: stwx r6, r5, r4
; CHECK-BE-NEXT: neg r7, r3
; CHECK-BE-NEXT: li r4, -32768
Expand Down Expand Up @@ -964,11 +958,9 @@ define void @f11(i32 %vla_size, i64 %i) #0 {
; CHECK-32-NEXT: lis r4, 1
; CHECK-32-NEXT: mr r31, r1
; CHECK-32-NEXT: ori r4, r4, 0
; CHECK-32-NEXT: addi r3, r3, 15
; CHECK-32-NEXT: add r4, r31, r4
; CHECK-32-NEXT: li r5, 1
; CHECK-32-NEXT: slwi r6, r6, 2
; CHECK-32-NEXT: rlwinm r3, r3, 0, 0, 27
; CHECK-32-NEXT: neg r7, r3
; CHECK-32-NEXT: stwx r5, r4, r6
; CHECK-32-NEXT: li r4, -32768
Expand Down
12 changes: 2 additions & 10 deletions llvm/test/CodeGen/RISCV/rvv/stack-probing-dynamic.ll
Original file line number Diff line number Diff line change
Expand Up @@ -180,8 +180,6 @@ define void @dynamic_align_64(i64 %size, ptr %out) #0 {
; RV64I-NEXT: .cfi_def_cfa s0, 0
; RV64I-NEXT: andi sp, sp, -64
; RV64I-NEXT: mv s1, sp
; RV64I-NEXT: addi a0, a0, 15
; RV64I-NEXT: andi a0, a0, -16
; RV64I-NEXT: sub a0, sp, a0
; RV64I-NEXT: andi a0, a0, -64
; RV64I-NEXT: lui a2, 1
Expand Down Expand Up @@ -219,8 +217,6 @@ define void @dynamic_align_64(i64 %size, ptr %out) #0 {
; RV32I-NEXT: .cfi_def_cfa s0, 0
; RV32I-NEXT: andi sp, sp, -64
; RV32I-NEXT: mv s1, sp
; RV32I-NEXT: addi a0, a0, 15
; RV32I-NEXT: andi a0, a0, -16
; RV32I-NEXT: sub a0, sp, a0
; RV32I-NEXT: andi a0, a0, -64
; RV32I-NEXT: lui a1, 1
Expand Down Expand Up @@ -278,10 +274,8 @@ define void @dynamic_align_8192(i64 %size, ptr %out) #0 {
; RV64I-NEXT: srli a2, sp, 13
; RV64I-NEXT: slli sp, a2, 13
; RV64I-NEXT: mv s1, sp
; RV64I-NEXT: addi a0, a0, 15
; RV64I-NEXT: lui a2, 1048574
; RV64I-NEXT: andi a0, a0, -16
; RV64I-NEXT: sub a0, sp, a0
; RV64I-NEXT: lui a2, 1048574
; RV64I-NEXT: and a0, a0, a2
; RV64I-NEXT: lui a2, 1
; RV64I-NEXT: .LBB3_1: # =>This Inner Loop Header: Depth=1
Expand Down Expand Up @@ -329,10 +323,8 @@ define void @dynamic_align_8192(i64 %size, ptr %out) #0 {
; RV32I-NEXT: srli a1, sp, 13
; RV32I-NEXT: slli sp, a1, 13
; RV32I-NEXT: mv s1, sp
; RV32I-NEXT: addi a0, a0, 15
; RV32I-NEXT: lui a1, 1048574
; RV32I-NEXT: andi a0, a0, -16
; RV32I-NEXT: sub a0, sp, a0
; RV32I-NEXT: lui a1, 1048574
; RV32I-NEXT: and a0, a0, a1
; RV32I-NEXT: lui a1, 1
; RV32I-NEXT: .LBB3_1: # =>This Inner Loop Header: Depth=1
Expand Down
6 changes: 1 addition & 5 deletions llvm/test/CodeGen/RISCV/stack-clash-prologue.ll
Original file line number Diff line number Diff line change
Expand Up @@ -642,8 +642,6 @@ define void @f11(i32 %vla_size, i64 %i) #0 {
; RV64I-NEXT: slli a0, a0, 32
; RV64I-NEXT: srli a0, a0, 32
; RV64I-NEXT: sw a2, 0(a1)
; RV64I-NEXT: addi a0, a0, 15
; RV64I-NEXT: andi a0, a0, -16
; RV64I-NEXT: sub a0, sp, a0
; RV64I-NEXT: andi a0, a0, -2048
; RV64I-NEXT: lui a1, 1
Expand Down Expand Up @@ -698,11 +696,9 @@ define void @f11(i32 %vla_size, i64 %i) #0 {
; RV32I-NEXT: add a2, s1, a2
; RV32I-NEXT: add a1, a2, a1
; RV32I-NEXT: li a2, 1
; RV32I-NEXT: addi a0, a0, 15
; RV32I-NEXT: andi a0, a0, -16
; RV32I-NEXT: sw a2, 0(a1)
; RV32I-NEXT: sub a0, sp, a0
; RV32I-NEXT: andi a0, a0, -2048
; RV32I-NEXT: sw a2, 0(a1)
; RV32I-NEXT: lui a1, 1
; RV32I-NEXT: .LBB11_3: # =>This Inner Loop Header: Depth=1
; RV32I-NEXT: sub sp, sp, a1
Expand Down
14 changes: 2 additions & 12 deletions llvm/test/CodeGen/SPARC/alloca-align.ll
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ define void @variable_alloca_with_overalignment(i32 %num) nounwind {
; CHECK32-LABEL: variable_alloca_with_overalignment:
; CHECK32: ! %bb.0:
; CHECK32-NEXT: save %sp, -96, %sp
; CHECK32-NEXT: add %sp, 80, %i1
; CHECK32-NEXT: add %sp, 84, %i1
; CHECK32-NEXT: and %i1, -64, %o0
; CHECK32-NEXT: add %o0, -96, %sp
; CHECK32-NEXT: add %i0, 7, %i0
Expand All @@ -21,7 +21,7 @@ define void @variable_alloca_with_overalignment(i32 %num) nounwind {
; CHECK64-LABEL: variable_alloca_with_overalignment:
; CHECK64: ! %bb.0:
; CHECK64-NEXT: save %sp, -128, %sp
; CHECK64-NEXT: add %sp, 2159, %i1
; CHECK64-NEXT: add %sp, 2171, %i1
; CHECK64-NEXT: and %i1, -64, %o0
; CHECK64-NEXT: add %o0, -2175, %sp
; CHECK64-NEXT: srl %i0, 0, %i0
Expand Down Expand Up @@ -52,8 +52,6 @@ define void @variable_alloca_with_overalignment_2(i32 %num) nounwind {
; CHECK32-LABEL: variable_alloca_with_overalignment_2:
; CHECK32: ! %bb.0:
; CHECK32-NEXT: save %sp, -96, %sp
; CHECK32-NEXT: add %i0, 7, %i0
; CHECK32-NEXT: and %i0, -8, %i0
; CHECK32-NEXT: sub %sp, %i0, %i0
; CHECK32-NEXT: add %i0, 88, %i0
; CHECK32-NEXT: and %i0, -64, %o1
Expand All @@ -67,14 +65,6 @@ define void @variable_alloca_with_overalignment_2(i32 %num) nounwind {
; CHECK64: ! %bb.0:
; CHECK64-NEXT: save %sp, -128, %sp
; CHECK64-NEXT: srl %i0, 0, %i0
; CHECK64-NEXT: add %i0, 15, %i0
; CHECK64-NEXT: sethi 4194303, %i1
; CHECK64-NEXT: or %i1, 1008, %i1
; CHECK64-NEXT: sethi 0, %i2
; CHECK64-NEXT: or %i2, 1, %i2
; CHECK64-NEXT: sllx %i2, 32, %i2
; CHECK64-NEXT: or %i2, %i1, %i1
; CHECK64-NEXT: and %i0, %i1, %i0
; CHECK64-NEXT: sub %sp, %i0, %i0
; CHECK64-NEXT: add %i0, 2175, %i0
; CHECK64-NEXT: and %i0, -64, %o1
Expand Down
4 changes: 2 additions & 2 deletions llvm/test/CodeGen/SPARC/stack-align.ll
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ define void @stack_realign(i32 %a, i32 %b, i32 %c, i32 %d, i32 %e, i32 %f, i32 %
; CHECK32: ! %bb.0: ! %entry
; CHECK32-NEXT: save %sp, -96, %sp
; CHECK32-NEXT: ld [%fp+92], %o0
; CHECK32-NEXT: add %sp, 80, %i0
; CHECK32-NEXT: add %sp, 84, %i0
; CHECK32-NEXT: and %i0, -64, %o1
; CHECK32-NEXT: call stack_realign_helper
; CHECK32-NEXT: add %o1, -96, %sp
Expand All @@ -23,7 +23,7 @@ define void @stack_realign(i32 %a, i32 %b, i32 %c, i32 %d, i32 %e, i32 %f, i32 %
; CHECK64-LABEL: stack_realign:
; CHECK64: ! %bb.0: ! %entry
; CHECK64-NEXT: save %sp, -128, %sp
; CHECK64-NEXT: add %sp, 2159, %i0
; CHECK64-NEXT: add %sp, 2171, %i0
; CHECK64-NEXT: and %i0, -64, %o1
; CHECK64-NEXT: add %o1, -2175, %sp
; CHECK64-NEXT: add %sp, -48, %sp
Expand Down
2 changes: 1 addition & 1 deletion llvm/test/CodeGen/SystemZ/alloca-03.ll
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,7 @@ define void @f5() {
; CHECK-NEXT: lgr %r11, %r15
; CHECK-NEXT: .cfi_def_cfa_register %r11
; CHECK-NEXT: lgr %r1, %r15
; CHECK-NEXT: aghi %r1, -128
; CHECK-NEXT: aghi %r1, -124
; CHECK-NEXT: la %r2, 280(%r1)
; CHECK-NEXT: nill %r2, 65408
; CHECK-NEXT: lgr %r15, %r1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks clearly wrong. The sequence lgr %r1, %r15 ; aghi %r1, -124 ; lgr %r15, %r1 subtracts 124 from the stack pointer, with the result that the stack pointer is now no longer properly aligned as required by the ABI.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Off hand I am surprised my change has kicked in for this test since it is a static alloca:

void SelectionDAGBuilder::visitAlloca(const AllocaInst &I) {
  // If this is a fixed sized alloca in the entry block of the function,
  // allocate it statically on the stack.
  if (FuncInfo.StaticAllocaMap.count(&I))
    return;   // getValue will auto-populate this.

I expected this return to be reached but it is not. I will investigate further and get back once I understand the issue.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isStackRealignable() is false on SystemZ, so it's treated as a dynamic allocation.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that SystemZTargetLowering::lowerDYNAMIC_STACKALLOC_ELF computes the amount subtracted from the stack pointer, called NeededSpace like this:

NeededSpace = AllocaSize + (RequiredAlign - StackAlign)

With my change, we get 4 + (128 - 8) which is 124. It makes sense that we would want AllocaSize to be a multiple of StackAlign to ensure that NeededSpace is also a multiple of StackAlign.

Do you think it is reasonable to add the code rounding the alloca size up to the nearest StackAlign to lowerDYNAMIC_STACKALLOC_ELF? Another option is to disable my change for targets that don't have a realignable stack.

I think the SPARC backend has the same behavior.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced that isStackRealignable() is relevant here. That only changes the behavior for overaligned allocas with compile-time constant sizes. For a non-constant size, we'd still have the same problem even if isStackRealignable() were true.

I think the core issue may rather be the assumption whether (or not) the result of a DYNAMIC_STACKALLOC is equal to the stack pointer or not. On some targets they are the same, and therefore this one value simply needs to be aligned to the maximum of the ABI stack alignment and the alloca alignment.

But on other targets, like s390x, these are two different values - on s390x at the very bottom of the stack there is an ABI mandated area for outgoing arguments and a register save area, which is preserved even across dynamic allocations. In this case the dynamic allocation result in two new values: a new stack pointer (which must respect the stack alignment) and a result pointer (which must respect the alloca alignment requirement).

I don't have a strong opinion on whether the alignment code should generated by common code or the target back-end. However, the interface should be clearly defined - specifically, what if any assumptions the back-end implementation of lowering DYNAMIC_STACKALLOC can make about the alignment of its size argument must be clearly documented. That documentation currently reads:

  /// DYNAMIC_STACKALLOC - Allocate some number of bytes on the stack aligned
  /// to a specified boundary.  This node always has two return values: a new
  /// stack pointer value and a chain. The first operand is the token chain,
  /// the second is the number of bytes to allocate, and the third is the
  /// alignment boundary.  The size is guaranteed to be a multiple of the
  /// stack alignment, and the alignment is guaranteed to be bigger than the
  /// stack alignment (if required) or 0 to get standard stack alignment.

If this is no longer correct, the documentation needs to be updated (and existing code that may have relied on that documentation should be reviewed).

Copy link
Author

@jcogan-nv jcogan-nv Mar 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced that isStackRealignable() is relevant here.

Agreed I think the property I am looking for is something like: whether the target uses the stack pointer as the result of DYNAMIC_STACKALLOC.

But on other targets, like s390x...

Makes sense, thanks for the explanation.

If this is no longer correct, the documentation needs to be updated...

Agreed I would need to update this documentation and audit existing uses. That may be too much effort for this change. GlobalISel uses an opcode called G_DYN_STACKALLOC. Is that assumed to have the same interface/semantics as DYNAMIC_STACKALLOC? I don't see any reference to size alignment in the documentation for G_DYN_STACKALLOC, here is what I found:

G_DYN_STACKALLOC
^^^^^^^^^^^^^^^^

Dynamically realigns the stack pointer to the specified size and alignment.
An alignment value of `0` or `1` means no specific alignment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really familiar with the GlobalISel side of things, sorry - we don't support this on SystemZ at the moment.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the change to SelectionDAGBuilder with the latest commit due to the documentation you linked.

Expand Down
Loading