Skip to content

[RISCV] Handle scalable ops with < EEW / 2 narrow types in combineBinOp_VLToVWBinOp_VL #84158

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 2 additions & 6 deletions llvm/lib/Target/RISCV/RISCVISelLowering.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -13654,12 +13654,8 @@ struct NodeExtensionHelper {

SDValue NarrowElt = OrigOperand.getOperand(0);
MVT NarrowVT = NarrowElt.getSimpleValueType();

unsigned ScalarBits = VT.getScalarSizeInBits();
unsigned NarrowScalarBits = NarrowVT.getScalarSizeInBits();

// Ensure the extension's semantic is equivalent to rvv vzext or vsext.
if (ScalarBits != NarrowScalarBits * 2)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If after the prior change which moves this transform after legalize types, the only case which needs this restriction to keep the transform between legalize types and legalize ops is the i1 vector case, why not simply check if the narrow vt is a i1 vector here? Wouldn't that be less disruptive than moving the combine after legalize ops?

Note that you should also be asserting that both narrow and wide are legal types.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If after the prior change which moves this transform after legalize types, the only case which needs this restriction to keep the transform between legalize types and legalize ops is the i1 vector case, why not simply check if the narrow vt is a i1 vector here?

I moved it to after the legalize vector ops phase since we weren't checking for i1 vectors in any of the other _VL nodes. So I think there was already an implicit invariant here that the combine would only run after legalize ops, and it seemed safer to just be explicit about it.

Wouldn't that be less disruptive than moving the combine after legalize ops?

Since the combine was already happening after legalize ops for the _VL nodes, this should only affect the ISD::ADD/SUB/MUL nodes that were added in #76785

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that you should also be asserting that both narrow and wide are legal types.

I've moved the narrow type assert in 0ef61ed so that we now check the narrow type for all extend node types, and we have an assert that the wide type is legal here:

assert(DAG.getTargetLoweringInfo().isTypeLegal(Root->getValueType(0)));

// i1 types are legal but we can't select V{S,Z}EXT_VLs with them.
if (NarrowVT.getVectorElementType() == MVT::i1)
break;

SupportsZExt = Opc == ISD::ZERO_EXTEND;
Expand Down
38 changes: 20 additions & 18 deletions llvm/test/CodeGen/RISCV/rvv/vscale-vw-web-simplification.ll
Original file line number Diff line number Diff line change
Expand Up @@ -283,18 +283,19 @@ define <vscale x 2 x i32> @vwop_vscale_sext_i8i32_multiple_users(ptr %x, ptr %y,
;
; FOLDING-LABEL: vwop_vscale_sext_i8i32_multiple_users:
; FOLDING: # %bb.0:
; FOLDING-NEXT: vsetvli a3, zero, e32, m1, ta, ma
; FOLDING-NEXT: vsetvli a3, zero, e16, mf2, ta, ma
; FOLDING-NEXT: vle8.v v8, (a0)
; FOLDING-NEXT: vle8.v v9, (a1)
; FOLDING-NEXT: vle8.v v10, (a2)
; FOLDING-NEXT: vsext.vf4 v11, v8
; FOLDING-NEXT: vsext.vf4 v8, v9
; FOLDING-NEXT: vsext.vf4 v9, v10
; FOLDING-NEXT: vmul.vv v8, v11, v8
; FOLDING-NEXT: vadd.vv v10, v11, v9
; FOLDING-NEXT: vsub.vv v9, v11, v9
; FOLDING-NEXT: vor.vv v8, v8, v10
; FOLDING-NEXT: vor.vv v8, v8, v9
; FOLDING-NEXT: vsext.vf2 v11, v8
; FOLDING-NEXT: vsext.vf2 v8, v9
; FOLDING-NEXT: vsext.vf2 v9, v10
; FOLDING-NEXT: vwmul.vv v10, v11, v8
; FOLDING-NEXT: vwadd.vv v8, v11, v9
; FOLDING-NEXT: vwsub.vv v12, v11, v9
; FOLDING-NEXT: vsetvli zero, zero, e32, m1, ta, ma
; FOLDING-NEXT: vor.vv v8, v10, v8
; FOLDING-NEXT: vor.vv v8, v8, v12
; FOLDING-NEXT: ret
%a = load <vscale x 2 x i8>, ptr %x
%b = load <vscale x 2 x i8>, ptr %y
Expand Down Expand Up @@ -563,18 +564,19 @@ define <vscale x 2 x i32> @vwop_vscale_zext_i8i32_multiple_users(ptr %x, ptr %y,
;
; FOLDING-LABEL: vwop_vscale_zext_i8i32_multiple_users:
; FOLDING: # %bb.0:
; FOLDING-NEXT: vsetvli a3, zero, e32, m1, ta, ma
; FOLDING-NEXT: vsetvli a3, zero, e16, mf2, ta, ma
; FOLDING-NEXT: vle8.v v8, (a0)
; FOLDING-NEXT: vle8.v v9, (a1)
; FOLDING-NEXT: vle8.v v10, (a2)
; FOLDING-NEXT: vzext.vf4 v11, v8
; FOLDING-NEXT: vzext.vf4 v8, v9
; FOLDING-NEXT: vzext.vf4 v9, v10
; FOLDING-NEXT: vmul.vv v8, v11, v8
; FOLDING-NEXT: vadd.vv v10, v11, v9
; FOLDING-NEXT: vsub.vv v9, v11, v9
; FOLDING-NEXT: vor.vv v8, v8, v10
; FOLDING-NEXT: vor.vv v8, v8, v9
; FOLDING-NEXT: vzext.vf2 v11, v8
; FOLDING-NEXT: vzext.vf2 v8, v9
; FOLDING-NEXT: vzext.vf2 v9, v10
; FOLDING-NEXT: vwmulu.vv v10, v11, v8
; FOLDING-NEXT: vwaddu.vv v8, v11, v9
; FOLDING-NEXT: vwsubu.vv v12, v11, v9
; FOLDING-NEXT: vsetvli zero, zero, e32, m1, ta, ma
; FOLDING-NEXT: vor.vv v8, v10, v8
; FOLDING-NEXT: vor.vv v8, v8, v12
; FOLDING-NEXT: ret
%a = load <vscale x 2 x i8>, ptr %x
%b = load <vscale x 2 x i8>, ptr %y
Expand Down
Loading