Skip to content

Commit c16f09c

Browse files
committed
[AArch64][SVE] Reduce MaxInterleaveFactor for A510 and A520
The default MaxInterleaveFactor for AArch64 targets is 2. This produces inefficient codegen on at least two in-order cores, those being Cortex-A510 and Cortex-A520. For example a simple vector add ``` void foo(float a, float b, float dst, unsigned n) { for (unsigned i = 0; i < n; ++i) dst[i] = a[i] + b[i]; } ``` Vectorizes the inner loop into the following interleaved sequence of instructions ``` add x12, x1, x10 ld1b { z0.b }, p0/z, [x1, x10] add x13, x2, x10 ld1b { z1.b }, p0/z, [x2, x10] ldr z2, [x12, #1, mul vl] ldr z3, [x13, #1, mul vl] dech x11 add x12, x0, x10 fadd z0.s, z1.s, z0.s fadd z1.s, z3.s, z2.s st1b { z0.b }, p0, [x0, x10] addvl x10, x10, #2 str z1, [x12, #1, mul vl] ``` while when we reduce MaxInterleaveFactor to 1 we get the following ``` .LBB0_13: // %vector.body // =>This Inner Loop Header: Depth=1 ld1w { z0.s }, p0/z, [x1, x10, lsl #2] ld1w { z1.s }, p0/z, [x2, x10, lsl #2] fadd z0.s, z1.s, z0.s st1w { z0.s }, p0, [x0, x10, lsl #2] incw x10 ``` This patch also introduces IR tests to showcase this. Change-Id: Ie1e862f6a1db851182a95534b3b987feb670d7ca
1 parent 64555e3 commit c16f09c

File tree

2 files changed

+361
-0
lines changed

2 files changed

+361
-0
lines changed

llvm/lib/Target/AArch64/AArch64Subtarget.cpp

+1
Original file line numberDiff line numberDiff line change
@@ -181,6 +181,7 @@ void AArch64Subtarget::initializeProperties(bool HasMinSize) {
181181
VScaleForTuning = 1;
182182
PrefLoopAlignment = Align(16);
183183
MaxBytesForLoopAlignment = 8;
184+
MaxInterleaveFactor = 1;
184185
break;
185186
case CortexA710:
186187
case CortexA715:

0 commit comments

Comments
 (0)