Skip to content

[SelectionDAG] Scalarize binary ops of splats before legal types #100749

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Aug 14, 2024
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 14 additions & 3 deletions llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -26975,7 +26975,7 @@ SDValue DAGCombiner::XformToShuffleWithZero(SDNode *N) {
/// If a vector binop is performed on splat values, it may be profitable to
/// extract, scalarize, and insert/splat.
static SDValue scalarizeBinOpOfSplats(SDNode *N, SelectionDAG &DAG,
const SDLoc &DL) {
const SDLoc &DL, bool LegalTypes) {
SDValue N0 = N->getOperand(0);
SDValue N1 = N->getOperand(1);
unsigned Opcode = N->getOpcode();
Expand All @@ -26993,11 +26993,22 @@ static SDValue scalarizeBinOpOfSplats(SDNode *N, SelectionDAG &DAG,
// TODO: use DAG.isSplatValue instead?
bool IsBothSplatVector = N0.getOpcode() == ISD::SPLAT_VECTOR &&
N1.getOpcode() == ISD::SPLAT_VECTOR;

// If binop is legal or custom on EltVT, scalarize should be profitable. The
// check is the same as isOperationLegalOrCustom without isTypeLegal. We
// can do this only before LegalTypes, because it may generate illegal `op
// EltVT` from legal `op VT (splat EltVT)`, where EltVT is not legal type but
// the result type of splat is legal.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove this comment now

if (!Src0 || !Src1 || Index0 != Index1 ||
Src0.getValueType().getVectorElementType() != EltVT ||
Src1.getValueType().getVectorElementType() != EltVT ||
!(IsBothSplatVector || TLI.isExtractVecEltCheap(VT, Index0)) ||
!TLI.isOperationLegalOrCustom(Opcode, EltVT))
// If before type legalization, allow scalar types that will eventually be
// made legal.
!TLI.isOperationLegalOrCustom(
Opcode, LegalTypes
? EltVT
: TLI.getTypeToTransformTo(*DAG.getContext(), EltVT)))
return SDValue();

SDValue IndexC = DAG.getVectorIdxConstant(Index0, DL);
Expand Down Expand Up @@ -27163,7 +27174,7 @@ SDValue DAGCombiner::SimplifyVBinOp(SDNode *N, const SDLoc &DL) {
}
}

if (SDValue V = scalarizeBinOpOfSplats(N, DAG, DL))
if (SDValue V = scalarizeBinOpOfSplats(N, DAG, DL, LegalTypes))
return V;

return SDValue();
Expand Down
13 changes: 6 additions & 7 deletions llvm/test/CodeGen/AArch64/dag-combine-concat-vectors.ll
Original file line number Diff line number Diff line change
Expand Up @@ -16,30 +16,29 @@ define fastcc i8 @allocno_reload_assign() {
; CHECK-NEXT: uzp1 p0.h, p0.h, p0.h
; CHECK-NEXT: uzp1 p0.b, p0.b, p0.b
; CHECK-NEXT: mov z0.b, p0/z, #1 // =0x1
; CHECK-NEXT: ptrue p0.b
; CHECK-NEXT: fmov w8, s0
; CHECK-NEXT: mov z0.b, #0 // =0x0
; CHECK-NEXT: sbfx x8, x8, #0, #1
; CHECK-NEXT: uunpklo z1.h, z0.b
; CHECK-NEXT: uunpkhi z0.h, z0.b
; CHECK-NEXT: whilelo p1.b, xzr, x8
; CHECK-NEXT: not p0.b, p0/z, p1.b
; CHECK-NEXT: mvn w8, w8
; CHECK-NEXT: sbfx x8, x8, #0, #1
; CHECK-NEXT: whilelo p0.b, xzr, x8
; CHECK-NEXT: uunpklo z2.s, z1.h
; CHECK-NEXT: uunpkhi z3.s, z1.h
; CHECK-NEXT: uunpklo z5.s, z0.h
; CHECK-NEXT: uunpkhi z7.s, z0.h
; CHECK-NEXT: punpklo p1.h, p0.b
; CHECK-NEXT: punpkhi p0.h, p0.b
; CHECK-NEXT: punpklo p2.h, p1.b
; CHECK-NEXT: punpkhi p3.h, p1.b
; CHECK-NEXT: uunpklo z0.d, z2.s
; CHECK-NEXT: uunpkhi z1.d, z2.s
; CHECK-NEXT: punpkhi p3.h, p1.b
; CHECK-NEXT: punpklo p5.h, p0.b
; CHECK-NEXT: uunpklo z2.d, z3.s
; CHECK-NEXT: uunpkhi z3.d, z3.s
; CHECK-NEXT: punpklo p5.h, p0.b
; CHECK-NEXT: punpkhi p7.h, p0.b
; CHECK-NEXT: uunpklo z4.d, z5.s
; CHECK-NEXT: uunpkhi z5.d, z5.s
; CHECK-NEXT: punpkhi p7.h, p0.b
; CHECK-NEXT: uunpklo z6.d, z7.s
; CHECK-NEXT: uunpkhi z7.d, z7.s
; CHECK-NEXT: punpklo p0.h, p2.b
Expand Down
Loading
Loading