Skip to content

[RISCV] Miscompile with exact VLEN/vscale and memset #90559

Closed
@lukel97

Description

@lukel97

After #70452 I noticed a miscompile on 502.gcc_r from SPEC CPU 2017 when compiling for rv64gv with -mrvv-vector-bits=zvl. A minimal reproducer involves a memset in a function with an exact vscale_range:

define void @foo(ptr %p) vscale_range(2,2) {
  %q = getelementptr inbounds i8, ptr %p, i64 84
  %x = load i32, ptr %q
  call void @llvm.memset.p0.i64(ptr %p, i8 0, i64 96, i1 false)
  store i32 %x, ptr %q
  ret void
}

The previous codegen has a lw and sw pair

foo:
	lw	a1, 84(a0)
	addi	a2, a0, 80
	vsetivli	zero, 16, e8, m1, ta, ma
	vmv.v.i	v8, 0
	vs1r.v	v8, (a2)
	vsetvli	a2, zero, e8, m4, ta, ma
	vmv.v.i	v12, 0
	vs4r.v	v12, (a0)
	addi	a2, a0, 64
	vs1r.v	v8, (a2)
	sw	a1, 84(a0)
	ret

After #70452 these seem to be mistakenly detected as dead and are omitted:

foo:
	vsetvli	a1, zero, e8, m4, ta, ma
	vmv.v.i	v8, 0
	vs4r.v	v8, (a0)
	addi	a1, a0, 80
	vsetivli	zero, 16, e8, m1, ta, ma
	vmv.v.i	v8, 0
	vs1r.v	v8, (a1)
	addi	a0, a0, 64
	vs1r.v	v8, (a0)
	ret

The offending combine seems to happen in the post-legalize combine

 Legalized selection DAG: %bb.0 'foo:'
 SelectionDAG has 28 nodes:
   t0: ch,glue = EntryToken
@@ -9,32 +8,29 @@
     t58: v16i8 = extract_subvector t57, Constant:i64<0>
   t49: nxv8i8 = insert_subvector undef:nxv8i8, t58, Constant:i64<0>
           t26: i64 = add t2, Constant:i64<80>
-        t50: ch = store<(store (s128) into %ir.p + 80, align 1)> t44:1, t49, t26, undef:i64
+        t50: ch = store<(store (<vscale x 1 x s128>) into %ir.p + 80, align 1)> t44:1, t49, t26, undef:i64
       t46: ch = store<(store (s32) into %ir.q), trunc to i32> t50, t44, t4, undef:i64
             t60: nxv32i8 = RISCVISD::VMV_V_X_VL undef:nxv32i8, OpaqueConstant:i64<0>, Register:i64 $x0
           t61: v64i8 = extract_subvector t60, Constant:i64<0>
         t53: nxv32i8 = insert_subvector undef:nxv32i8, t61, Constant:i64<0>
-      t54: ch = store<(store (s512) into %ir.p, align 1)> t0, t53, t2, undef:i64
+      t54: ch = store<(store (<vscale x 1 x s512>) into %ir.p, align 1)> t0, t53, t2, undef:i64
         t23: i64 = add t2, Constant:i64<64>
-      t51: ch = store<(store (s128) into %ir.p + 64, align 1)> t0, t49, t23, undef:i64
+      t51: ch = store<(store (<vscale x 1 x s128>) into %ir.p + 64, align 1)> t0, t49, t23, undef:i64
     t43: ch = TokenFactor t46, t54, t51
   t30: ch = RISCVISD::RET_GLUE t43
 
 
 
 Optimized legalized selection DAG: %bb.0 'foo:'
-SelectionDAG has 23 nodes:
+SelectionDAG has 19 nodes:
   t0: ch,glue = EntryToken
   t2: i64,ch = CopyFromReg t0, Register:i64 %0
-  t4: i64 = add nuw t2, Constant:i64<84>
-  t44: i64,ch = load<(load (s32) from %ir.q), sext from i32> t0, t4, undef:i64
   t57: nxv8i8 = RISCVISD::VMV_V_X_VL undef:nxv8i8, OpaqueConstant:i64<0>, Register:i64 $x0
-          t26: i64 = add t2, Constant:i64<80>
-        t50: ch = store<(store (s128) into %ir.p + 80, align 1)> t44:1, t57, t26, undef:i64
-      t46: ch = store<(store (s32) into %ir.q), trunc to i32> t50, t44, t4, undef:i64
         t60: nxv32i8 = RISCVISD::VMV_V_X_VL undef:nxv32i8, OpaqueConstant:i64<0>, Register:i64 $x0
-      t54: ch = store<(store (s512) into %ir.p, align 1)> t0, t60, t2, undef:i64
+      t54: ch = store<(store (<vscale x 1 x s512>) into %ir.p, align 1)> t0, t60, t2, undef:i64
         t23: i64 = add t2, Constant:i64<64>
-      t51: ch = store<(store (s128) into %ir.p + 64, align 1)> t0, t57, t23, undef:i64
-    t43: ch = TokenFactor t46, t54, t51
-  t30: ch = RISCVISD::RET_GLUE t43
+      t51: ch = store<(store (<vscale x 1 x s128>) into %ir.p + 64, align 1)> t0, t57, t23, undef:i64
+        t26: i64 = add t2, Constant:i64<80>
+      t62: ch = store<(store (<vscale x 1 x s128>) into %ir.p + 80, align 1)> t0, t57, t26, undef:i64
+    t68: ch = TokenFactor t54, t51, t62
+  t30: ch = RISCVISD::RET_GLUE t68

As an aside the generated code for the memset is a bit strange, we should be able to do it with one LMUL=8 vse8.v with a VL of 96.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions