Closed
Description
After #70452 I noticed a miscompile on 502.gcc_r from SPEC CPU 2017 when compiling for rv64gv with -mrvv-vector-bits=zvl. A minimal reproducer involves a memset in a function with an exact vscale_range:
define void @foo(ptr %p) vscale_range(2,2) {
%q = getelementptr inbounds i8, ptr %p, i64 84
%x = load i32, ptr %q
call void @llvm.memset.p0.i64(ptr %p, i8 0, i64 96, i1 false)
store i32 %x, ptr %q
ret void
}
The previous codegen has a lw
and sw
pair
foo:
lw a1, 84(a0)
addi a2, a0, 80
vsetivli zero, 16, e8, m1, ta, ma
vmv.v.i v8, 0
vs1r.v v8, (a2)
vsetvli a2, zero, e8, m4, ta, ma
vmv.v.i v12, 0
vs4r.v v12, (a0)
addi a2, a0, 64
vs1r.v v8, (a2)
sw a1, 84(a0)
ret
After #70452 these seem to be mistakenly detected as dead and are omitted:
foo:
vsetvli a1, zero, e8, m4, ta, ma
vmv.v.i v8, 0
vs4r.v v8, (a0)
addi a1, a0, 80
vsetivli zero, 16, e8, m1, ta, ma
vmv.v.i v8, 0
vs1r.v v8, (a1)
addi a0, a0, 64
vs1r.v v8, (a0)
ret
The offending combine seems to happen in the post-legalize combine
Legalized selection DAG: %bb.0 'foo:'
SelectionDAG has 28 nodes:
t0: ch,glue = EntryToken
@@ -9,32 +8,29 @@
t58: v16i8 = extract_subvector t57, Constant:i64<0>
t49: nxv8i8 = insert_subvector undef:nxv8i8, t58, Constant:i64<0>
t26: i64 = add t2, Constant:i64<80>
- t50: ch = store<(store (s128) into %ir.p + 80, align 1)> t44:1, t49, t26, undef:i64
+ t50: ch = store<(store (<vscale x 1 x s128>) into %ir.p + 80, align 1)> t44:1, t49, t26, undef:i64
t46: ch = store<(store (s32) into %ir.q), trunc to i32> t50, t44, t4, undef:i64
t60: nxv32i8 = RISCVISD::VMV_V_X_VL undef:nxv32i8, OpaqueConstant:i64<0>, Register:i64 $x0
t61: v64i8 = extract_subvector t60, Constant:i64<0>
t53: nxv32i8 = insert_subvector undef:nxv32i8, t61, Constant:i64<0>
- t54: ch = store<(store (s512) into %ir.p, align 1)> t0, t53, t2, undef:i64
+ t54: ch = store<(store (<vscale x 1 x s512>) into %ir.p, align 1)> t0, t53, t2, undef:i64
t23: i64 = add t2, Constant:i64<64>
- t51: ch = store<(store (s128) into %ir.p + 64, align 1)> t0, t49, t23, undef:i64
+ t51: ch = store<(store (<vscale x 1 x s128>) into %ir.p + 64, align 1)> t0, t49, t23, undef:i64
t43: ch = TokenFactor t46, t54, t51
t30: ch = RISCVISD::RET_GLUE t43
Optimized legalized selection DAG: %bb.0 'foo:'
-SelectionDAG has 23 nodes:
+SelectionDAG has 19 nodes:
t0: ch,glue = EntryToken
t2: i64,ch = CopyFromReg t0, Register:i64 %0
- t4: i64 = add nuw t2, Constant:i64<84>
- t44: i64,ch = load<(load (s32) from %ir.q), sext from i32> t0, t4, undef:i64
t57: nxv8i8 = RISCVISD::VMV_V_X_VL undef:nxv8i8, OpaqueConstant:i64<0>, Register:i64 $x0
- t26: i64 = add t2, Constant:i64<80>
- t50: ch = store<(store (s128) into %ir.p + 80, align 1)> t44:1, t57, t26, undef:i64
- t46: ch = store<(store (s32) into %ir.q), trunc to i32> t50, t44, t4, undef:i64
t60: nxv32i8 = RISCVISD::VMV_V_X_VL undef:nxv32i8, OpaqueConstant:i64<0>, Register:i64 $x0
- t54: ch = store<(store (s512) into %ir.p, align 1)> t0, t60, t2, undef:i64
+ t54: ch = store<(store (<vscale x 1 x s512>) into %ir.p, align 1)> t0, t60, t2, undef:i64
t23: i64 = add t2, Constant:i64<64>
- t51: ch = store<(store (s128) into %ir.p + 64, align 1)> t0, t57, t23, undef:i64
- t43: ch = TokenFactor t46, t54, t51
- t30: ch = RISCVISD::RET_GLUE t43
+ t51: ch = store<(store (<vscale x 1 x s128>) into %ir.p + 64, align 1)> t0, t57, t23, undef:i64
+ t26: i64 = add t2, Constant:i64<80>
+ t62: ch = store<(store (<vscale x 1 x s128>) into %ir.p + 80, align 1)> t0, t57, t26, undef:i64
+ t68: ch = TokenFactor t54, t51, t62
+ t30: ch = RISCVISD::RET_GLUE t68
As an aside the generated code for the memset is a bit strange, we should be able to do it with one LMUL=8 vse8.v with a VL of 96.