Skip to content

[lld][LoongArch] Relax call36/tail36: R_LARCH_CALL36 #123576

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Mar 9, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 41 additions & 0 deletions lld/ELF/Arch/LoongArch.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,8 @@ enum Op {
LD_W = 0x28800000,
LD_D = 0x28c00000,
JIRL = 0x4c000000,
B = 0x50000000,
BL = 0x54000000,
};

enum Reg {
Expand Down Expand Up @@ -830,6 +832,37 @@ static void relaxPCHi20Lo12(Ctx &ctx, const InputSection &sec, size_t i,
remove = 4;
}

// Relax code sequence.
// From:
// pcaddu18i $ra, %call36(foo)
// jirl $ra, $ra, 0
// To:
// b/bl foo
static void relaxCall36(Ctx &ctx, const InputSection &sec, size_t i,
uint64_t loc, Relocation &r, uint32_t &remove) {
const uint64_t dest =
(r.expr == R_PLT_PC ? r.sym->getPltVA(ctx) : r.sym->getVA(ctx)) +
r.addend;

const int64_t displace = dest - loc;
// Check if the displace aligns 4 bytes or exceeds the range of b[l].
if ((displace & 0x3) != 0 || !isInt<28>(displace))
return;

const uint32_t nextInsn = read32le(sec.content().data() + r.offset + 4);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To match ld.bfd behavior we should check isJirl(nextInsn) before continuing. In #127312 (comment) I commented that we should probably add stronger checks for ld.bfd too, but let's make the current behavior aligned at first.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is not necessary. Here are the reasons:

  1. Existing compilers, whether GCC or Clang, always generate the correct instruction sequence corresponding to the R_LARCH_CALL36 relocation when it is produced.
  2. For handwritten assembly, correct use of pseudo-instructions will not lead to errors. For maliciously written incorrect assembly cases, we do not need to ensure their correctness, as such erroneous cases can always be constructed in most architectures.
  3. Instruction checking should be avoided as much as possible in lld. (https://github.com/llvm/llvm-project/pull/127312/files#r1957198091)

It would be greatly appreciated if you could provide an example of an illegal case or present more substantial reasons.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is not necessary. Here are the reasons:

1. Existing compilers, whether GCC or Clang, always generate the correct instruction sequence corresponding to the R_LARCH_CALL36 relocation when it is produced.

2. For handwritten assembly, correct use of pseudo-instructions will not lead to errors. For maliciously written incorrect assembly cases, we do not need to ensure their correctness, as such erroneous cases can always be constructed in most architectures.

3. Instruction checking should be avoided as much as possible in lld. (https://github.com/llvm/llvm-project/pull/127312/files#r1957198091)

It would be greatly appreciated if you could provide an example of an illegal case or present more substantial reasons.

I think it's probably okay to not add more checks, but for ld.bfd interoperability the check on jirl is necessary. If you think this should not be done either, the ld.bfd implementation likely needs change as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After further consideration and revisiting the manual again(https://github.com/loongson/la-abi-specs/blob/release/laelf.adoc),
image
the generated pcaddu18i and jirl by GCC and Clang must be adjacent.

As for the interoperability you mentioned, lld handles instruction sequences generated by gas. And gas should not generate non-contiguous instruction sequences either. Therefore, we think that adding a check in lld is unnecessary.

Additionally, if jirl check is added here, other relaxations (such as R_LARCH_{PCALA,GOT_PC}_{HI20,LO12}, etc) would also need to check the instructions, which contradicts the principle of avoiding instruction checks unless absolutely necessary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After further consideration and revisiting the manual again(https://github.com/loongson/la-abi-specs/blob/release/laelf.adoc), the generated pcaddu18i and jirl by GCC and Clang must be adjacent.

As for the interoperability you mentioned, lld handles instruction sequences generated by gas. And gas should not generate non-contiguous instruction sequences either. Therefore, we think that adding a check in lld is unnecessary.

Additionally, if jirl check is added here, other relaxations (such as R_LARCH_{PCALA,GOT_PC}_{HI20,LO12}, etc) would also need to check the instructions, which contradicts the principle of avoiding instruction checks unless absolutely necessary.

I'm not objecting to the current revision per se, but rather behavior consistency, no matter whether the input is well-formed or not -- the so-called "bug-for-bug compatibility". This means:

  • either we add a check for jirl here in LLD and exactly replicate ld.bfd behavior, or
  • we leave the code as-is, and remove the check at ld.bfd side, with your comment here as justification,

so eventually we're going to have 2 consistent implementations of LoongArch ELF psABI and happy users. What do you think here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed with developers of ld's loongarch port, but they think the check is necessary in ld and lld don't need to align with it.

How about adding an assertion here as a compromise?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I have added an assertion to check jirl. 52bec2b

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late reply -- I didn't realize "assertion" is just assert before.

But now debug builds of lld will error out on malformed R_LARCH_CALL36 underlying data -- this doesn't align with ld.bfd either because it will simply skip relaxing this piece instead of dying, so I fear the assertion may do more harm than good 🤷

Copy link
Contributor Author

@ylzsx ylzsx Mar 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed with developers of ld's loongarch port, but they think the check is necessary in ld and lld don't need to align with it.

@MaskRay Would you agree to adding a check for jirl in this situation?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still prefer that we remove the checks. We should avoid assert, when it is actually reachable.

If BFD wants to be guard against malformed input, that's ok. It's fine to diverge from it.

if (getD5(nextInsn) == R_RA) {
// convert jirl to bl
sec.relaxAux->relocTypes[i] = R_LARCH_B26;
sec.relaxAux->writes.push_back(insn(BL, 0, 0, 0));
remove = 4;
} else if (getD5(nextInsn) == R_ZERO) {
// convert jirl to b
sec.relaxAux->relocTypes[i] = R_LARCH_B26;
sec.relaxAux->writes.push_back(insn(B, 0, 0, 0));
remove = 4;
}
}

static bool relax(Ctx &ctx, InputSection &sec) {
const uint64_t secAddr = sec.getVA();
const MutableArrayRef<Relocation> relocs = sec.relocs();
Expand Down Expand Up @@ -874,6 +907,10 @@ static bool relax(Ctx &ctx, InputSection &sec) {
if (isPairRelaxable(relocs, i))
relaxPCHi20Lo12(ctx, sec, i, loc, r, relocs[i + 2], remove);
break;
case R_LARCH_CALL36:
if (relaxable(relocs, i))
relaxCall36(ctx, sec, i, loc, r, remove);
break;
}

// For all anchors whose offsets are <= r.offset, they are preceded by
Expand Down Expand Up @@ -977,6 +1014,10 @@ void LoongArch::finalizeRelax(int passes) const {
// RelExpr is needed for relocating.
r.expr = r.sym->hasFlag(NEEDS_PLT) ? R_PLT_PC : R_PC;
break;
case R_LARCH_B26:
skip = 4;
write32le(p, aux.writes[writesIdx++]);
break;
default:
llvm_unreachable("unsupported type");
}
Expand Down
65 changes: 65 additions & 0 deletions lld/test/ELF/loongarch-relax-call36-2.s
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
# REQUIRES: loongarch
## Relax R_LARCH_CALL36. This test tests boundary cases and some special symbols.

# RUN: rm -rf %t && split-file %s %t && cd %t
# RUN: llvm-mc -filetype=obj -triple=loongarch64 -mattr=+relax a.s -o a.o

# RUN: ld.lld -T lds a.o -o a
# RUN: llvm-objdump -d --no-show-raw-insn a | FileCheck %s --check-prefixes=RELAX,RELAX-MID

## Unsure whether this needs a diagnostic. GNU ld allows this.
# RUN: ld.lld -T lds -pie a.o -o a.pie
# RUN: llvm-objdump -d --no-show-raw-insn a.pie | FileCheck %s --check-prefixes=RELAX,RELAX-MID

# RUN: ld.lld -T lds -pie -z notext -z ifunc-noplt a.o -o a.ifunc-noplt
# RUN: llvm-objdump -d --no-show-raw-insn a.ifunc-noplt | FileCheck %s --check-prefixes=RELAX,NORELAX-MID

# RELAX-LABEL: <_start>:
## offset = 0x10000000 - 0x8000000 = 0x8000000(134217728), hi=512, lo18=0
# RELAX-NEXT: 8000000: pcaddu18i $ra, 512
# RELAX-NEXT: jirl $ra, $ra, 0
# RELAX-NEXT: bl 134217720
# RELAX-NEXT: bl -134217728
## offset = 12 - 0x8000010 = -0x8000004(-134217732), hi=512, lo18=-4
# RELAX-NEXT: 8000010: pcaddu18i $ra, -512
# RELAX-NEXT: jirl $ra, $ra, -4
# RELAX-EMPTY:

# RELAX-MID-LABEL: <.mid>:
## offset = 0x8010000 - 0x8008000 = 32768
# RELAX-MID-NEXT: 8008000: bl 32768
# RELAX-MID-NEXT: b 32764
# RELAX-MID-EMPTY:

# NORELAX-MID-LABEL: <.mid>:
# NORELAX-MID-NEXT: 8008000: pcaddu18i $ra, 0
# NORELAX-MID-NEXT: jirl $ra, $ra, 0
# NORELAX-MID-NEXT: pcaddu18i $t0, 0
# NORELAX-MID-NEXT: jr $t0
# NORELAX-MID-EMPTY:

#--- a.s
.global _start, ifunc
_start:
call36 pos # exceed positive range (.text+0x7fffffc), not relaxed
call36 pos # relaxed
call36 neg # relaxed
call36 neg # exceed negative range (.text+16-0x8000000), not relaxed

.section .mid,"ax",@progbits
.balign 16
call36 ifunc@plt # enable ifunc, not relaxed
tail36 $t0, ifunc@plt # enable ifunc, not relaxed

.type ifunc, @gnu_indirect_function
ifunc:
ret

#--- lds
SECTIONS {
.text 0x8000000 : { *(.text) }
.mid 0x8008000 : { *(.mid) }
.iplt 0x8010000 : { *(.iplt) }
}
neg = 12;
pos = 0x10000000;
136 changes: 136 additions & 0 deletions lld/test/ELF/loongarch-relax-call36.s
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
# REQUIRES: loongarch
## Relax R_LARCH_CALL36, which involves the macro instructions call36/tail36.

# RUN: rm -rf %t && split-file %s %t && cd %t

# RUN: llvm-mc -filetype=obj -triple=loongarch64 -mattr=+relax a.s -o a.64.o
# RUN: llvm-mc -filetype=obj -triple=loongarch64 -mattr=+relax b.s -o b.64.o
# RUN: ld.lld -shared -soname=b.so b.64.o -o b.64.so
# RUN: ld.lld -T lds a.64.o b.64.so -o 64
# RUN: llvm-objdump -td --no-show-raw-insn 64 | FileCheck %s --check-prefix=RELAX

## --no-relax disables relaxation.
# RUN: ld.lld -T lds a.64.o b.64.so --no-relax -o 64.norelax
# RUN: llvm-objdump -td --no-show-raw-insn 64.norelax | FileCheck %s --check-prefix=NORELAX

# RELAX: {{0*}}00010000 g .text {{0*}}0000001c _start
# RELAX: {{0*}}0001001c g .text {{0*}}00000000 _start_end
# RELAX: {{0*}}00010808 g .mid {{0*}}00000000 mid_end
# RELAX: {{0*}}10010010 g .high {{0*}}00000000 high_end

# RELAX-LABEL: <_start>:
## offset = 0x10018 - 0x10000 = 24
# RELAX-NEXT: 10000: bl 24 <a>
# RELAX-NEXT: b 20 <a>
# RELAX-NEXT: nop
# RELAX-NEXT: nop
## offset = .plt(0x10400)+32 - 0x10010 = 1040
# RELAX-NEXT: 10010: bl 1040 <bar+0x10420>
# RELAX-NEXT: b 1036 <bar+0x10420>
# RELAX-EMPTY:
# RELAX-NEXT: <a>:
# RELAX-NEXT: 10018: ret
# RELAX-EMPTY:

# RELAX-LABEL: <.mid>:
## offset = 0x10000 - 0x10800 = -2048
# RELAX-NEXT: 10800: bl -2048 <_start>
# RELAX-NEXT: b -2052 <_start>
# RELAX-EMPTY:

# RELAX-LABEL: <.mid2>:
## offset = 0x10000 - 0x1010000 = -16777216
# RELAX-NEXT: 1010000: bl -16777216 <_start>
# RELAX-NEXT: b -16777220 <_start>
# RELAX-EMPTY:

# RELAX-LABEL: <.high>:
## offset = 0x10000 - 0x10010000 = -0x10000000, hi=-1024, lo18=0
# RELAX-NEXT: 10010000: pcaddu18i $ra, -1024
# RELAX-NEXT: jirl $ra, $ra, 0
# RELAX-NEXT: pcaddu18i $t0, -1024
# RELAX-NEXT: jirl $zero, $t0, -8
# RELAX-EMPTY:


# NORELAX-LABEL: <_start>:
## offset = 0x10020 - 0x10000 = 0x20, hi=0, lo18=32
# NORELAX-NEXT: 10000: pcaddu18i $ra, 0
# NORELAX-NEXT: jirl $ra, $ra, 32
## offset = 0x10020 - 0x10008 = 0x18, hi=0, lo18=24
# NORELAX-NEXT: 10008: pcaddu18i $t0, 0
# NORELAX-NEXT: jirl $zero, $t0, 24
## offset = .plt(0x10400)+32 - 0x10010 = 0x410, hi=0, lo18=1040
# NORELAX-NEXT: 10010: pcaddu18i $ra, 0
# NORELAX-NEXT: jirl $ra, $ra, 1040
## offset = .plt(0x10400)+32 - 0x10018 = 0x408, hi=0, lo18=1032
# NORELAX-NEXT: 10018: pcaddu18i $t0, 0
# NORELAX-NEXT: jirl $zero, $t0, 1032
# NORELAX-EMPTY:
# NORELAX-NEXT: <a>:
# NORELAX-NEXT: 10020: ret
# NORELAX-EMPTY:

# NORELAX-LABEL: <.mid>:
## offset = 0x10000 - 0x10800 = -0x800, hi=0, lo18=-2048
# NORELAX-NEXT: 10800: pcaddu18i $ra, 0
# NORELAX-NEXT: jirl $ra, $ra, -2048
# NORELAX-NEXT: pcaddu18i $t0, 0
# NORELAX-NEXT: jirl $zero, $t0, -2056
# NORELAX-EMPTY:

# NORELAX-LABEL: <.mid2>:
## offset = 0x10000 - 0x1010000 = -0x1000000, hi=-64, lo18=0
# NORELAX-NEXT: 1010000: pcaddu18i $ra, -64
# NORELAX-NEXT: jirl $ra, $ra, 0
# NORELAX-NEXT: pcaddu18i $t0, -64
# NORELAX-NEXT: jirl $zero, $t0, -8
# NORELAX-EMPTY:

# NORELAX-LABEL: <.high>:
## offset = 0x10000 - 0x10010000 = -0x10000000, hi=-1024, lo18=0
# NORELAX-NEXT: 10010000: pcaddu18i $ra, -1024
# NORELAX-NEXT: jirl $ra, $ra, 0
# NORELAX-NEXT: pcaddu18i $t0, -1024
# NORELAX-NEXT: jirl $zero, $t0, -8
# NORELAX-EMPTY:

#--- a.s
.global _start, _start_end
_start:
call36 a # relaxed. la64: bl
tail36 $t0, a@plt # relaxed. la64: b
.balign 16
call36 bar # PLT call36 can be relaxed. la64: bl
tail36 $t0, bar # PLT tail36 can be relaxed. la64: bl

a:
ret
.size _start, . - _start
_start_end:

.section .mid,"ax",@progbits
call36 _start@plt # relaxed. la64: bl
tail36 $t0, _start@plt # relaxed. la64: b

.section .mid2,"ax",@progbits
call36 _start@plt # relaxed. la64: bl
tail36 $t0, _start@plt # relaxed. la64: b

.section .high,"ax",@progbits
call36 _start@plt # exceed range, not relaxed
tail36 $t0,_start@plt # exceed range, not relaxed

#--- b.s
.globl bar
bar:
ret

#--- lds
SECTIONS {
.text 0x10000 : { *(.text) }
.plt 0x10400 : { *(.plt) }
.mid 0x10800 : { *(.mid); mid_end = .; }
.mid2 0x1010000 : { *(.mid2) }
.high 0x10010000 : { *(.high); high_end = .; }
}
44 changes: 37 additions & 7 deletions lld/test/ELF/loongarch-relax-emit-relocs.s
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,11 @@
## Test that we can handle --emit-relocs while relaxing.

# RUN: llvm-mc --filetype=obj --triple=loongarch32 --mattr=+relax %s -o %t.32.o
# RUN: llvm-mc --filetype=obj --triple=loongarch64 --mattr=+relax %s -o %t.64.o
# RUN: llvm-mc --filetype=obj --triple=loongarch64 --mattr=+relax --defsym ELF64=1 %s -o %t.64.o
# RUN: ld.lld -Ttext=0x10000 -section-start=.got=0x20000 --emit-relocs %t.32.o -o %t.32
# RUN: ld.lld -Ttext=0x10000 -section-start=.got=0x20000 --emit-relocs %t.64.o -o %t.64
# RUN: llvm-objdump -dr %t.32 | FileCheck %s --check-prefix=RELAX
# RUN: llvm-objdump -dr %t.64 | FileCheck %s --check-prefix=RELAX
# RUN: llvm-objdump -dr %t.32 | FileCheck %s --check-prefixes=RELAX,RELAX32
# RUN: llvm-objdump -dr %t.64 | FileCheck %s --check-prefixes=RELAX,RELAX64

## -r should keep original relocations.
# RUN: ld.lld -r %t.64.o -o %t.64.r
Expand All @@ -27,10 +27,19 @@
# RELAX-NEXT: R_LARCH_RELAX *ABS*
# RELAX-NEXT: R_LARCH_PCREL20_S2 _start
# RELAX-NEXT: R_LARCH_RELAX *ABS*
# RELAX-NEXT: nop
# RELAX-NEXT: R_LARCH_ALIGN *ABS*+0xc
# RELAX-NEXT: nop
# RELAX-NEXT: ret
# RELAX32-NEXT: nop
# RELAX32-NEXT: R_LARCH_ALIGN *ABS*+0xc
# RELAX32-NEXT: nop
# RELAX32-NEXT: ret

# RELAX64-NEXT: bl -8
# RELAX64-NEXT: R_LARCH_B26 _start
# RELAX64-NEXT: R_LARCH_RELAX *ABS*
# RELAX64-NEXT: b -12
# RELAX64-NEXT: R_LARCH_B26 _start
# RELAX64-NEXT: R_LARCH_RELAX *ABS*
# RELAX64-NEXT: ret
# RELAX64-NEXT: R_LARCH_ALIGN *ABS*+0xc

# NORELAX: <_start>:
# NORELAX-NEXT: pcalau12i $a0, 0
Expand All @@ -45,6 +54,14 @@
# NORELAX-NEXT: ld.d $a0, $a0, 0
# NORELAX-NEXT: R_LARCH_GOT_PC_LO12 _start
# NORELAX-NEXT: R_LARCH_RELAX *ABS*
# NORELAX-NEXT: pcaddu18i $ra, 0
# NORELAX-NEXT: R_LARCH_CALL36 _start
# NORELAX-NEXT: R_LARCH_RELAX *ABS*
# NORELAX-NEXT: jirl $ra, $ra, -16
# NORELAX-NEXT: pcaddu18i $a0, 0
# NORELAX-NEXT: R_LARCH_CALL36 _start
# NORELAX-NEXT: R_LARCH_RELAX *ABS*
# NORELAX-NEXT: jirl $zero, $a0, -24
# NORELAX-NEXT: ret
# NORELAX-NEXT: R_LARCH_ALIGN *ABS*+0xc

Expand All @@ -61,6 +78,14 @@
# CHECKR-NEXT: ld.d $a0, $a0, 0
# CHECKR-NEXT: R_LARCH_GOT_PC_LO12 _start
# CHECKR-NEXT: R_LARCH_RELAX *ABS*
# CHECKR-NEXT: pcaddu18i $ra, 0
# CHECKR-NEXT: R_LARCH_CALL36 _start
# CHECKR-NEXT: R_LARCH_RELAX *ABS*
# CHECKR-NEXT: jirl $ra, $ra, 0
# CHECKR-NEXT: pcaddu18i $a0, 0
# CHECKR-NEXT: R_LARCH_CALL36 _start
# CHECKR-NEXT: R_LARCH_RELAX *ABS*
# CHECKR-NEXT: jr $a0
# CHECKR-NEXT: nop
# CHECKR-NEXT: R_LARCH_ALIGN *ABS*+0xc
# CHECKR-NEXT: nop
Expand All @@ -71,5 +96,10 @@
_start:
la.pcrel $a0, _start
la.got $a0, _start

.ifdef ELF64
call36 _start
tail36 $a0, _start
.endif
.p2align 4
ret
Loading