-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[AArch64] Avoid NEON ORR when NEON and SVE are unavailable #93940
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AArch64] Avoid NEON ORR when NEON and SVE are unavailable #93940
Conversation
For streaming-compatible functions with only +sme, we can't use a NEON ORR (aliased as 'mov') for copies of Q-registers, so we need to use a spill/fill instead. This also fixes the fill, which should use the post-incrementing addressing mode.
@llvm/pr-subscribers-backend-aarch64 Author: Sander de Smalen (sdesmalen-arm) ChangesFor streaming-compatible functions with only +sme, we can't use This also fixes the fill, which should use the post-incrementing Full diff: https://github.com/llvm/llvm-project/pull/93940.diff 4 Files Affected:
diff --git a/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp b/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
index aa0b7c93f8661..ce4b7e68fd625 100644
--- a/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
@@ -4659,7 +4659,7 @@ void AArch64InstrInfo::copyPhysReg(MachineBasicBlock &MBB,
.addReg(AArch64::Z0 + (DestReg - AArch64::Q0), RegState::Define)
.addReg(AArch64::Z0 + (SrcReg - AArch64::Q0))
.addReg(AArch64::Z0 + (SrcReg - AArch64::Q0));
- else if (Subtarget.hasNEON())
+ else if (Subtarget.isNeonAvailable())
BuildMI(MBB, I, DL, get(AArch64::ORRv16i8), DestReg)
.addReg(SrcReg)
.addReg(SrcReg, getKillRegState(KillSrc));
@@ -4669,7 +4669,7 @@ void AArch64InstrInfo::copyPhysReg(MachineBasicBlock &MBB,
.addReg(SrcReg, getKillRegState(KillSrc))
.addReg(AArch64::SP)
.addImm(-16);
- BuildMI(MBB, I, DL, get(AArch64::LDRQpre))
+ BuildMI(MBB, I, DL, get(AArch64::LDRQpost))
.addReg(AArch64::SP, RegState::Define)
.addReg(DestReg, RegState::Define)
.addReg(AArch64::SP)
diff --git a/llvm/test/CodeGen/AArch64/arm64-reg-copy-noneon.ll b/llvm/test/CodeGen/AArch64/arm64-reg-copy-noneon.ll
index 29255ef187c1c..69cd295c309d1 100644
--- a/llvm/test/CodeGen/AArch64/arm64-reg-copy-noneon.ll
+++ b/llvm/test/CodeGen/AArch64/arm64-reg-copy-noneon.ll
@@ -1,20 +1,27 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
; RUN: llc -mtriple=arm64-none-linux-gnu -mattr=-neon < %s | FileCheck %s
define float @copy_FPR32(float %a, float %b) {
-;CHECK-LABEL: copy_FPR32:
-;CHECK: fmov s0, s1
+; CHECK-LABEL: copy_FPR32:
+; CHECK: // %bb.0:
+; CHECK-NEXT: fmov s0, s1
+; CHECK-NEXT: ret
ret float %b;
}
-
+
define double @copy_FPR64(double %a, double %b) {
-;CHECK-LABEL: copy_FPR64:
-;CHECK: fmov d0, d1
+; CHECK-LABEL: copy_FPR64:
+; CHECK: // %bb.0:
+; CHECK-NEXT: fmov d0, d1
+; CHECK-NEXT: ret
ret double %b;
}
-
+
define fp128 @copy_FPR128(fp128 %a, fp128 %b) {
-;CHECK-LABEL: copy_FPR128:
-;CHECK: str q1, [sp, #-16]!
-;CHECK-NEXT: ldr q0, [sp, #16]!
+; CHECK-LABEL: copy_FPR128:
+; CHECK: // %bb.0:
+; CHECK-NEXT: str q1, [sp, #-16]!
+; CHECK-NEXT: ldr q0, [sp], #16
+; CHECK-NEXT: ret
ret fp128 %b;
}
diff --git a/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-masked-load.ll b/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-masked-load.ll
index be335c697707d..a689a539b0082 100644
--- a/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-masked-load.ll
+++ b/llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-masked-load.ll
@@ -980,7 +980,8 @@ define <32 x i8> @masked_load_v32i8(ptr %src, <32 x i1> %mask) {
; NONEON-NOSVE-NEXT: tbnz w8, #1, .LBB3_3
; NONEON-NOSVE-NEXT: b .LBB3_4
; NONEON-NOSVE-NEXT: .LBB3_2:
-; NONEON-NOSVE-NEXT: mov v0.16b, v1.16b
+; NONEON-NOSVE-NEXT: str q1, [sp, #-16]!
+; NONEON-NOSVE-NEXT: ldr q0, [sp], #16
; NONEON-NOSVE-NEXT: tbz w8, #1, .LBB3_4
; NONEON-NOSVE-NEXT: .LBB3_3: // %cond.load1
; NONEON-NOSVE-NEXT: ldrb w10, [x0, #1]
@@ -2095,7 +2096,8 @@ define <16 x half> @masked_load_v16f16(ptr %src, <16 x i1> %mask) {
; NONEON-NOSVE-NEXT: tbnz w8, #1, .LBB7_3
; NONEON-NOSVE-NEXT: b .LBB7_4
; NONEON-NOSVE-NEXT: .LBB7_2:
-; NONEON-NOSVE-NEXT: mov v0.16b, v1.16b
+; NONEON-NOSVE-NEXT: str q1, [sp, #-16]!
+; NONEON-NOSVE-NEXT: ldr q0, [sp], #16
; NONEON-NOSVE-NEXT: tbz w8, #1, .LBB7_4
; NONEON-NOSVE-NEXT: .LBB7_3: // %cond.load1
; NONEON-NOSVE-NEXT: ldr h2, [x0, #2]
@@ -2616,7 +2618,8 @@ define <8 x float> @masked_load_v8f32(ptr %src, <8 x i1> %mask) {
; NONEON-NOSVE-NEXT: tbnz w8, #1, .LBB10_3
; NONEON-NOSVE-NEXT: b .LBB10_4
; NONEON-NOSVE-NEXT: .LBB10_2:
-; NONEON-NOSVE-NEXT: mov v0.16b, v1.16b
+; NONEON-NOSVE-NEXT: str q1, [sp, #-16]!
+; NONEON-NOSVE-NEXT: ldr q0, [sp], #16
; NONEON-NOSVE-NEXT: tbz w8, #1, .LBB10_4
; NONEON-NOSVE-NEXT: .LBB10_3: // %cond.load1
; NONEON-NOSVE-NEXT: ldr s2, [x0, #4]
@@ -2839,7 +2842,8 @@ define <4 x double> @masked_load_v4f64(ptr %src, <4 x i1> %mask) {
; NONEON-NOSVE-NEXT: tbnz w8, #1, .LBB12_3
; NONEON-NOSVE-NEXT: b .LBB12_4
; NONEON-NOSVE-NEXT: .LBB12_2:
-; NONEON-NOSVE-NEXT: mov v0.16b, v1.16b
+; NONEON-NOSVE-NEXT: str q1, [sp, #-16]!
+; NONEON-NOSVE-NEXT: ldr q0, [sp], #16
; NONEON-NOSVE-NEXT: tbz w8, #1, .LBB12_4
; NONEON-NOSVE-NEXT: .LBB12_3: // %cond.load1
; NONEON-NOSVE-NEXT: ldr d2, [x0, #8]
diff --git a/llvm/test/CodeGen/AArch64/sve-streaming-mode-test-register-mov.ll b/llvm/test/CodeGen/AArch64/sve-streaming-mode-test-register-mov.ll
index 67cdde718e391..23adb1a4bc092 100644
--- a/llvm/test/CodeGen/AArch64/sve-streaming-mode-test-register-mov.ll
+++ b/llvm/test/CodeGen/AArch64/sve-streaming-mode-test-register-mov.ll
@@ -15,7 +15,8 @@ define fp128 @test_streaming_compatible_register_mov(fp128 %q0, fp128 %q1) {
;
; NONEON-NOSVE-LABEL: test_streaming_compatible_register_mov:
; NONEON-NOSVE: // %bb.0:
-; NONEON-NOSVE-NEXT: mov v0.16b, v1.16b
+; NONEON-NOSVE-NEXT: str q1, [sp, #-16]!
+; NONEON-NOSVE-NEXT: ldr q0, [sp], #16
; NONEON-NOSVE-NEXT: ret
ret fp128 %q1
}
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code sequence here assumes there's no red zone; is that actually enforced somewhere? Can we allocate a stack slot instead of modifying the stack pointer?
As an alternative, it might be possible to use an integer register here, but I guess allocating that might be a bit complicated.
Ping |
Good spot! From what I can see, it seems better to just disable the Red Zone entirely for now. The issue is that the way things currently work, the Fixing the problem with a stop-gap (disabling red zone) could be a good first step. And if this turns out to be a genuine performance issue, we can come up with a better mechanism later. Does that make sense? |
This was pointed out in PR llvm#93940.
That makes sense, sure. |
…lvm#94962) This was pointed out in PR llvm#93940.
For streaming-compatible functions with only +sme, we can't use
a NEON ORR (aliased as 'mov') for copies of Q-registers, so
we need to use a spill/fill instead.
This also fixes the fill, which should use the post-incrementing
addressing mode.