-
Notifications
You must be signed in to change notification settings - Fork 13.5k
DAG: Improve fminimum/fmaximum vector expansion logic #93579
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DAG: Improve fminimum/fmaximum vector expansion logic #93579
Conversation
@llvm/pr-subscribers-backend-x86 @llvm/pr-subscribers-llvm-selectiondag Author: Matt Arsenault (arsenm) ChangesFirst, expandFMINIMUM_FMAXIMUM should be a never-fail API. The client Prefer using the min/max opcodes, and unrolling if we don't have a vselect. Patch is 1.19 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/93579.diff 12 Files Affected:
diff --git a/llvm/include/llvm/CodeGen/SelectionDAG.h b/llvm/include/llvm/CodeGen/SelectionDAG.h
index 96a6270690468..0dc237301abb4 100644
--- a/llvm/include/llvm/CodeGen/SelectionDAG.h
+++ b/llvm/include/llvm/CodeGen/SelectionDAG.h
@@ -1241,11 +1241,11 @@ class SelectionDAG {
/// Helper function to make it easier to build Select's if you just have
/// operands and don't want to check for vector.
SDValue getSelect(const SDLoc &DL, EVT VT, SDValue Cond, SDValue LHS,
- SDValue RHS) {
+ SDValue RHS, SDNodeFlags Flags = SDNodeFlags()) {
assert(LHS.getValueType() == VT && RHS.getValueType() == VT &&
"Cannot use select on differing types");
auto Opcode = Cond.getValueType().isVector() ? ISD::VSELECT : ISD::SELECT;
- return getNode(Opcode, DL, VT, Cond, LHS, RHS);
+ return getNode(Opcode, DL, VT, Cond, LHS, RHS, Flags);
}
/// Helper function to make it easier to build SelectCC's if you just have an
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
index 6acbc044d6731..503e09dd2077f 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
@@ -990,11 +990,8 @@ void VectorLegalizer::Expand(SDNode *Node, SmallVectorImpl<SDValue> &Results) {
break;
case ISD::FMINIMUM:
case ISD::FMAXIMUM:
- if (SDValue Expanded = TLI.expandFMINIMUM_FMAXIMUM(Node, DAG)) {
- Results.push_back(Expanded);
- return;
- }
- break;
+ Results.push_back(TLI.expandFMINIMUM_FMAXIMUM(Node, DAG));
+ return;
case ISD::SMIN:
case ISD::SMAX:
case ISD::UMIN:
diff --git a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
index 4e47f50ee4289..18ecd86c4739d 100644
--- a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
@@ -8428,10 +8428,7 @@ SDValue TargetLowering::expandFMINIMUM_FMAXIMUM(SDNode *N,
EVT VT = N->getValueType(0);
EVT CCVT = getSetCCResultType(DAG.getDataLayout(), *DAG.getContext(), VT);
bool IsMax = Opc == ISD::FMAXIMUM;
-
- if (VT.isVector() &&
- isOperationLegalOrCustomOrPromote(Opc, VT.getScalarType()))
- return SDValue();
+ SDNodeFlags Flags = N->getFlags();
// First, implement comparison not propagating NaN. If no native fmin or fmax
// available, use plain select with setcc instead.
@@ -8444,15 +8441,18 @@ SDValue TargetLowering::expandFMINIMUM_FMAXIMUM(SDNode *N,
bool MinMaxMustRespectOrderedZero = false;
if (isOperationLegalOrCustom(CompOpcIeee, VT)) {
- MinMax = DAG.getNode(CompOpcIeee, DL, VT, LHS, RHS);
+ MinMax = DAG.getNode(CompOpcIeee, DL, VT, LHS, RHS, Flags);
MinMaxMustRespectOrderedZero = true;
} else if (isOperationLegalOrCustom(CompOpc, VT)) {
- MinMax = DAG.getNode(CompOpc, DL, VT, LHS, RHS);
+ MinMax = DAG.getNode(CompOpc, DL, VT, LHS, RHS, Flags);
} else {
+ if (VT.isVector() && !isOperationLegalOrCustom(ISD::VSELECT, VT))
+ return DAG.UnrollVectorOp(N);
+
// NaN (if exists) will be propagated later, so orderness doesn't matter.
SDValue Compare =
DAG.getSetCC(DL, CCVT, LHS, RHS, IsMax ? ISD::SETGT : ISD::SETLT);
- MinMax = DAG.getSelect(DL, VT, Compare, LHS, RHS);
+ MinMax = DAG.getSelect(DL, VT, Compare, LHS, RHS, Flags);
}
// Propagate any NaN of both operands
@@ -8461,7 +8461,7 @@ SDValue TargetLowering::expandFMINIMUM_FMAXIMUM(SDNode *N,
ConstantFP *FPNaN = ConstantFP::get(
*DAG.getContext(), APFloat::getNaN(DAG.EVTToAPFloatSemantics(VT)));
MinMax = DAG.getSelect(DL, VT, DAG.getSetCC(DL, CCVT, LHS, RHS, ISD::SETUO),
- DAG.getConstantFP(*FPNaN, DL, VT), MinMax);
+ DAG.getConstantFP(*FPNaN, DL, VT), MinMax, Flags);
}
// fminimum/fmaximum requires -0.0 less than +0.0
@@ -8473,11 +8473,11 @@ SDValue TargetLowering::expandFMINIMUM_FMAXIMUM(SDNode *N,
DAG.getTargetConstant(IsMax ? fcPosZero : fcNegZero, DL, MVT::i32);
SDValue LCmp = DAG.getSelect(
DL, VT, DAG.getNode(ISD::IS_FPCLASS, DL, CCVT, LHS, TestZero), LHS,
- MinMax);
+ MinMax, Flags);
SDValue RCmp = DAG.getSelect(
DL, VT, DAG.getNode(ISD::IS_FPCLASS, DL, CCVT, RHS, TestZero), RHS,
- LCmp);
- MinMax = DAG.getSelect(DL, VT, IsZero, RCmp, MinMax);
+ LCmp, Flags);
+ MinMax = DAG.getSelect(DL, VT, IsZero, RCmp, MinMax, Flags);
}
return MinMax;
diff --git a/llvm/test/CodeGen/AMDGPU/fmaximum3.ll b/llvm/test/CodeGen/AMDGPU/fmaximum3.ll
index 3ec36f03a48aa..9ce1ba3316dd5 100644
--- a/llvm/test/CodeGen/AMDGPU/fmaximum3.ll
+++ b/llvm/test/CodeGen/AMDGPU/fmaximum3.ll
@@ -497,47 +497,19 @@ define <2 x float> @v_fmaximum3_v2f32(<2 x float> %a, <2 x float> %b, <2 x float
; GFX9-LABEL: v_fmaximum3_v2f32:
; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX9-NEXT: v_cmp_gt_f32_e32 vcc, v1, v3
-; GFX9-NEXT: v_cndmask_b32_e32 v6, v3, v1, vcc
+; GFX9-NEXT: v_max_f32_e32 v6, v1, v3
; GFX9-NEXT: v_mov_b32_e32 v7, 0x7fc00000
; GFX9-NEXT: v_cmp_o_f32_e32 vcc, v1, v3
-; GFX9-NEXT: v_cndmask_b32_e32 v6, v7, v6, vcc
-; GFX9-NEXT: v_cmp_class_f32_e64 vcc, v1, 64
-; GFX9-NEXT: v_cndmask_b32_e32 v1, v6, v1, vcc
-; GFX9-NEXT: v_cmp_class_f32_e64 vcc, v3, 64
-; GFX9-NEXT: v_cndmask_b32_e32 v1, v1, v3, vcc
-; GFX9-NEXT: v_cmp_eq_f32_e32 vcc, 0, v6
-; GFX9-NEXT: v_cndmask_b32_e32 v1, v6, v1, vcc
-; GFX9-NEXT: v_cmp_gt_f32_e32 vcc, v0, v2
-; GFX9-NEXT: v_cndmask_b32_e32 v3, v2, v0, vcc
+; GFX9-NEXT: v_cndmask_b32_e32 v1, v7, v6, vcc
+; GFX9-NEXT: v_max_f32_e32 v3, v0, v2
; GFX9-NEXT: v_cmp_o_f32_e32 vcc, v0, v2
-; GFX9-NEXT: v_cndmask_b32_e32 v3, v7, v3, vcc
-; GFX9-NEXT: v_cmp_class_f32_e64 vcc, v0, 64
-; GFX9-NEXT: v_cndmask_b32_e32 v0, v3, v0, vcc
-; GFX9-NEXT: v_cmp_class_f32_e64 vcc, v2, 64
-; GFX9-NEXT: v_cndmask_b32_e32 v0, v0, v2, vcc
-; GFX9-NEXT: v_cmp_eq_f32_e32 vcc, 0, v3
-; GFX9-NEXT: v_cndmask_b32_e32 v0, v3, v0, vcc
-; GFX9-NEXT: v_cmp_gt_f32_e32 vcc, v4, v0
-; GFX9-NEXT: v_cndmask_b32_e32 v2, v0, v4, vcc
+; GFX9-NEXT: v_cndmask_b32_e32 v0, v7, v3, vcc
+; GFX9-NEXT: v_max_f32_e32 v2, v4, v0
; GFX9-NEXT: v_cmp_o_f32_e32 vcc, v4, v0
-; GFX9-NEXT: v_cndmask_b32_e32 v2, v7, v2, vcc
-; GFX9-NEXT: v_cmp_class_f32_e64 vcc, v4, 64
-; GFX9-NEXT: v_cndmask_b32_e32 v3, v2, v4, vcc
-; GFX9-NEXT: v_cmp_class_f32_e64 vcc, v0, 64
-; GFX9-NEXT: v_cndmask_b32_e32 v0, v3, v0, vcc
-; GFX9-NEXT: v_cmp_eq_f32_e32 vcc, 0, v2
-; GFX9-NEXT: v_cndmask_b32_e32 v0, v2, v0, vcc
-; GFX9-NEXT: v_cmp_gt_f32_e32 vcc, v5, v1
-; GFX9-NEXT: v_cndmask_b32_e32 v2, v1, v5, vcc
+; GFX9-NEXT: v_cndmask_b32_e32 v0, v7, v2, vcc
+; GFX9-NEXT: v_max_f32_e32 v2, v5, v1
; GFX9-NEXT: v_cmp_o_f32_e32 vcc, v5, v1
-; GFX9-NEXT: v_cndmask_b32_e32 v2, v7, v2, vcc
-; GFX9-NEXT: v_cmp_class_f32_e64 vcc, v5, 64
-; GFX9-NEXT: v_cndmask_b32_e32 v3, v2, v5, vcc
-; GFX9-NEXT: v_cmp_class_f32_e64 vcc, v1, 64
-; GFX9-NEXT: v_cndmask_b32_e32 v1, v3, v1, vcc
-; GFX9-NEXT: v_cmp_eq_f32_e32 vcc, 0, v2
-; GFX9-NEXT: v_cndmask_b32_e32 v1, v2, v1, vcc
+; GFX9-NEXT: v_cndmask_b32_e32 v1, v7, v2, vcc
; GFX9-NEXT: s_setpc_b64 s[30:31]
%max0 = call <2 x float> @llvm.maximum.v2f32(<2 x float> %a, <2 x float> %b)
%max1 = call <2 x float> @llvm.maximum.v2f32(<2 x float> %c, <2 x float> %max0)
@@ -559,47 +531,19 @@ define <2 x float> @v_fmaximum3_v2f32_commute(<2 x float> %a, <2 x float> %b, <2
; GFX9-LABEL: v_fmaximum3_v2f32_commute:
; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX9-NEXT: v_cmp_gt_f32_e32 vcc, v1, v3
-; GFX9-NEXT: v_cndmask_b32_e32 v6, v3, v1, vcc
+; GFX9-NEXT: v_max_f32_e32 v6, v1, v3
; GFX9-NEXT: v_mov_b32_e32 v7, 0x7fc00000
; GFX9-NEXT: v_cmp_o_f32_e32 vcc, v1, v3
-; GFX9-NEXT: v_cndmask_b32_e32 v6, v7, v6, vcc
-; GFX9-NEXT: v_cmp_class_f32_e64 vcc, v1, 64
-; GFX9-NEXT: v_cndmask_b32_e32 v1, v6, v1, vcc
-; GFX9-NEXT: v_cmp_class_f32_e64 vcc, v3, 64
-; GFX9-NEXT: v_cndmask_b32_e32 v1, v1, v3, vcc
-; GFX9-NEXT: v_cmp_eq_f32_e32 vcc, 0, v6
-; GFX9-NEXT: v_cndmask_b32_e32 v1, v6, v1, vcc
-; GFX9-NEXT: v_cmp_gt_f32_e32 vcc, v0, v2
-; GFX9-NEXT: v_cndmask_b32_e32 v3, v2, v0, vcc
+; GFX9-NEXT: v_cndmask_b32_e32 v1, v7, v6, vcc
+; GFX9-NEXT: v_max_f32_e32 v3, v0, v2
; GFX9-NEXT: v_cmp_o_f32_e32 vcc, v0, v2
-; GFX9-NEXT: v_cndmask_b32_e32 v3, v7, v3, vcc
-; GFX9-NEXT: v_cmp_class_f32_e64 vcc, v0, 64
-; GFX9-NEXT: v_cndmask_b32_e32 v0, v3, v0, vcc
-; GFX9-NEXT: v_cmp_class_f32_e64 vcc, v2, 64
-; GFX9-NEXT: v_cndmask_b32_e32 v0, v0, v2, vcc
-; GFX9-NEXT: v_cmp_eq_f32_e32 vcc, 0, v3
-; GFX9-NEXT: v_cndmask_b32_e32 v0, v3, v0, vcc
-; GFX9-NEXT: v_cmp_gt_f32_e32 vcc, v0, v4
-; GFX9-NEXT: v_cndmask_b32_e32 v2, v4, v0, vcc
+; GFX9-NEXT: v_cndmask_b32_e32 v0, v7, v3, vcc
+; GFX9-NEXT: v_max_f32_e32 v2, v0, v4
; GFX9-NEXT: v_cmp_o_f32_e32 vcc, v0, v4
-; GFX9-NEXT: v_cndmask_b32_e32 v2, v7, v2, vcc
-; GFX9-NEXT: v_cmp_class_f32_e64 vcc, v0, 64
-; GFX9-NEXT: v_cndmask_b32_e32 v0, v2, v0, vcc
-; GFX9-NEXT: v_cmp_class_f32_e64 vcc, v4, 64
-; GFX9-NEXT: v_cndmask_b32_e32 v0, v0, v4, vcc
-; GFX9-NEXT: v_cmp_eq_f32_e32 vcc, 0, v2
-; GFX9-NEXT: v_cndmask_b32_e32 v0, v2, v0, vcc
-; GFX9-NEXT: v_cmp_gt_f32_e32 vcc, v1, v5
-; GFX9-NEXT: v_cndmask_b32_e32 v2, v5, v1, vcc
+; GFX9-NEXT: v_cndmask_b32_e32 v0, v7, v2, vcc
+; GFX9-NEXT: v_max_f32_e32 v2, v1, v5
; GFX9-NEXT: v_cmp_o_f32_e32 vcc, v1, v5
-; GFX9-NEXT: v_cndmask_b32_e32 v2, v7, v2, vcc
-; GFX9-NEXT: v_cmp_class_f32_e64 vcc, v1, 64
-; GFX9-NEXT: v_cndmask_b32_e32 v1, v2, v1, vcc
-; GFX9-NEXT: v_cmp_class_f32_e64 vcc, v5, 64
-; GFX9-NEXT: v_cndmask_b32_e32 v1, v1, v5, vcc
-; GFX9-NEXT: v_cmp_eq_f32_e32 vcc, 0, v2
-; GFX9-NEXT: v_cndmask_b32_e32 v1, v2, v1, vcc
+; GFX9-NEXT: v_cndmask_b32_e32 v1, v7, v2, vcc
; GFX9-NEXT: s_setpc_b64 s[30:31]
%max0 = call <2 x float> @llvm.maximum.v2f32(<2 x float> %a, <2 x float> %b)
%max1 = call <2 x float> @llvm.maximum.v2f32(<2 x float> %max0, <2 x float> %c)
@@ -621,47 +565,19 @@ define <2 x float> @v_fmaximum3_v2f32__fabs_all(<2 x float> %a, <2 x float> %b,
; GFX9-LABEL: v_fmaximum3_v2f32__fabs_all:
; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX9-NEXT: v_cmp_gt_f32_e64 vcc, |v1|, |v3|
-; GFX9-NEXT: v_cndmask_b32_e32 v6, v3, v1, vcc
+; GFX9-NEXT: v_max_f32_e64 v6, |v1|, |v3|
; GFX9-NEXT: v_mov_b32_e32 v7, 0x7fc00000
-; GFX9-NEXT: v_cmp_o_f32_e64 s[4:5], |v1|, |v3|
-; GFX9-NEXT: v_cndmask_b32_e64 v6, v7, |v6|, s[4:5]
-; GFX9-NEXT: v_cmp_class_f32_e64 s[4:5], |v1|, 64
-; GFX9-NEXT: v_cndmask_b32_e64 v1, v6, |v1|, s[4:5]
-; GFX9-NEXT: v_cmp_class_f32_e64 s[4:5], |v3|, 64
-; GFX9-NEXT: v_cndmask_b32_e64 v1, v1, |v3|, s[4:5]
-; GFX9-NEXT: v_cmp_eq_f32_e32 vcc, 0, v6
-; GFX9-NEXT: v_cndmask_b32_e32 v1, v6, v1, vcc
-; GFX9-NEXT: v_cmp_gt_f32_e64 vcc, |v0|, |v2|
-; GFX9-NEXT: v_cndmask_b32_e32 v3, v2, v0, vcc
-; GFX9-NEXT: v_cmp_o_f32_e64 s[4:5], |v0|, |v2|
-; GFX9-NEXT: v_cndmask_b32_e64 v3, v7, |v3|, s[4:5]
-; GFX9-NEXT: v_cmp_class_f32_e64 s[4:5], |v0|, 64
-; GFX9-NEXT: v_cndmask_b32_e64 v0, v3, |v0|, s[4:5]
-; GFX9-NEXT: v_cmp_class_f32_e64 s[4:5], |v2|, 64
-; GFX9-NEXT: v_cndmask_b32_e64 v0, v0, |v2|, s[4:5]
-; GFX9-NEXT: v_cmp_eq_f32_e32 vcc, 0, v3
-; GFX9-NEXT: v_cndmask_b32_e32 v0, v3, v0, vcc
-; GFX9-NEXT: v_cmp_gt_f32_e64 s[4:5], v0, |v4|
-; GFX9-NEXT: v_cndmask_b32_e64 v2, |v4|, v0, s[4:5]
+; GFX9-NEXT: v_cmp_o_f32_e64 vcc, |v1|, |v3|
+; GFX9-NEXT: v_cndmask_b32_e32 v1, v7, v6, vcc
+; GFX9-NEXT: v_max_f32_e64 v3, |v0|, |v2|
+; GFX9-NEXT: v_cmp_o_f32_e64 vcc, |v0|, |v2|
+; GFX9-NEXT: v_cndmask_b32_e32 v0, v7, v3, vcc
+; GFX9-NEXT: v_max_f32_e64 v2, v0, |v4|
; GFX9-NEXT: v_cmp_o_f32_e64 vcc, v0, |v4|
-; GFX9-NEXT: v_cndmask_b32_e32 v2, v7, v2, vcc
-; GFX9-NEXT: v_cmp_class_f32_e64 vcc, v0, 64
-; GFX9-NEXT: v_cndmask_b32_e32 v0, v2, v0, vcc
-; GFX9-NEXT: v_cmp_class_f32_e64 s[4:5], |v4|, 64
-; GFX9-NEXT: v_cndmask_b32_e64 v0, v0, |v4|, s[4:5]
-; GFX9-NEXT: v_cmp_eq_f32_e32 vcc, 0, v2
-; GFX9-NEXT: v_cmp_gt_f32_e64 s[4:5], v1, |v5|
-; GFX9-NEXT: v_cndmask_b32_e32 v0, v2, v0, vcc
-; GFX9-NEXT: v_cndmask_b32_e64 v2, |v5|, v1, s[4:5]
+; GFX9-NEXT: v_cndmask_b32_e32 v0, v7, v2, vcc
+; GFX9-NEXT: v_max_f32_e64 v2, v1, |v5|
; GFX9-NEXT: v_cmp_o_f32_e64 vcc, v1, |v5|
-; GFX9-NEXT: v_cndmask_b32_e32 v2, v7, v2, vcc
-; GFX9-NEXT: v_cmp_class_f32_e64 vcc, v1, 64
-; GFX9-NEXT: v_cndmask_b32_e32 v1, v2, v1, vcc
-; GFX9-NEXT: v_cmp_class_f32_e64 s[4:5], |v5|, 64
-; GFX9-NEXT: v_cndmask_b32_e64 v1, v1, |v5|, s[4:5]
-; GFX9-NEXT: v_cmp_eq_f32_e32 vcc, 0, v2
-; GFX9-NEXT: v_cndmask_b32_e32 v1, v2, v1, vcc
+; GFX9-NEXT: v_cndmask_b32_e32 v1, v7, v2, vcc
; GFX9-NEXT: s_setpc_b64 s[30:31]
%a.fabs = call <2 x float> @llvm.fabs.v2f32(<2 x float> %a)
%b.fabs = call <2 x float> @llvm.fabs.v2f32(<2 x float> %b)
@@ -686,47 +602,19 @@ define <2 x float> @v_fmaximum3_v2f32__fneg_all(<2 x float> %a, <2 x float> %b,
; GFX9-LABEL: v_fmaximum3_v2f32__fneg_all:
; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX9-NEXT: v_cmp_gt_f32_e64 vcc, -v1, -v3
-; GFX9-NEXT: v_cndmask_b32_e32 v6, v3, v1, vcc
+; GFX9-NEXT: v_max_f32_e64 v6, -v1, -v3
; GFX9-NEXT: v_mov_b32_e32 v7, 0x7fc00000
-; GFX9-NEXT: v_cmp_o_f32_e64 s[4:5], -v1, -v3
-; GFX9-NEXT: v_cndmask_b32_e64 v6, v7, -v6, s[4:5]
-; GFX9-NEXT: v_cmp_class_f32_e64 s[4:5], -v1, 64
-; GFX9-NEXT: v_cndmask_b32_e64 v1, v6, -v1, s[4:5]
-; GFX9-NEXT: v_cmp_class_f32_e64 s[4:5], -v3, 64
-; GFX9-NEXT: v_cndmask_b32_e64 v1, v1, -v3, s[4:5]
-; GFX9-NEXT: v_cmp_eq_f32_e32 vcc, 0, v6
-; GFX9-NEXT: v_cndmask_b32_e32 v1, v6, v1, vcc
-; GFX9-NEXT: v_cmp_gt_f32_e64 vcc, -v0, -v2
-; GFX9-NEXT: v_cndmask_b32_e32 v3, v2, v0, vcc
-; GFX9-NEXT: v_cmp_o_f32_e64 s[4:5], -v0, -v2
-; GFX9-NEXT: v_cndmask_b32_e64 v3, v7, -v3, s[4:5]
-; GFX9-NEXT: v_cmp_class_f32_e64 s[4:5], -v0, 64
-; GFX9-NEXT: v_cndmask_b32_e64 v0, v3, -v0, s[4:5]
-; GFX9-NEXT: v_cmp_class_f32_e64 s[4:5], -v2, 64
-; GFX9-NEXT: v_cndmask_b32_e64 v0, v0, -v2, s[4:5]
-; GFX9-NEXT: v_cmp_eq_f32_e32 vcc, 0, v3
-; GFX9-NEXT: v_cndmask_b32_e32 v0, v3, v0, vcc
-; GFX9-NEXT: v_cmp_gt_f32_e64 s[4:5], v0, -v4
-; GFX9-NEXT: v_cndmask_b32_e64 v2, -v4, v0, s[4:5]
+; GFX9-NEXT: v_cmp_o_f32_e64 vcc, -v1, -v3
+; GFX9-NEXT: v_cndmask_b32_e32 v1, v7, v6, vcc
+; GFX9-NEXT: v_max_f32_e64 v3, -v0, -v2
+; GFX9-NEXT: v_cmp_o_f32_e64 vcc, -v0, -v2
+; GFX9-NEXT: v_cndmask_b32_e32 v0, v7, v3, vcc
+; GFX9-NEXT: v_max_f32_e64 v2, v0, -v4
; GFX9-NEXT: v_cmp_o_f32_e64 vcc, v0, -v4
-; GFX9-NEXT: v_cndmask_b32_e32 v2, v7, v2, vcc
-; GFX9-NEXT: v_cmp_class_f32_e64 vcc, v0, 64
-; GFX9-NEXT: v_cndmask_b32_e32 v0, v2, v0, vcc
-; GFX9-NEXT: v_cmp_class_f32_e64 s[4:5], -v4, 64
-; GFX9-NEXT: v_cndmask_b32_e64 v0, v0, -v4, s[4:5]
-; GFX9-NEXT: v_cmp_eq_f32_e32 vcc, 0, v2
-; GFX9-NEXT: v_cmp_gt_f32_e64 s[4:5], v1, -v5
-; GFX9-NEXT: v_cndmask_b32_e32 v0, v2, v0, vcc
-; GFX9-NEXT: v_cndmask_b32_e64 v2, -v5, v1, s[4:5]
+; GFX9-NEXT: v_cndmask_b32_e32 v0, v7, v2, vcc
+; GFX9-NEXT: v_max_f32_e64 v2, v1, -v5
; GFX9-NEXT: v_cmp_o_f32_e64 vcc, v1, -v5
-; GFX9-NEXT: v_cndmask_b32_e32 v2, v7, v2, vcc
-; GFX9-NEXT: v_cmp_class_f32_e64 vcc, v1, 64
-; GFX9-NEXT: v_cndmask_b32_e32 v1, v2, v1, vcc
-; GFX9-NEXT: v_cmp_class_f32_e64 s[4:5], -v5, 64
-; GFX9-NEXT: v_cndmask_b32_e64 v1, v1, -v5, s[4:5]
-; GFX9-NEXT: v_cmp_eq_f32_e32 vcc, 0, v2
-; GFX9-NEXT: v_cndmask_b32_e32 v1, v2, v1, vcc
+; GFX9-NEXT: v_cndmask_b32_e32 v1, v7, v2, vcc
; GFX9-NEXT: s_setpc_b64 s[30:31]
%a.fneg = fneg <2 x float> %a
%b.fneg = fneg <2 x float> %b
@@ -751,35 +639,19 @@ define <2 x float> @v_fmaximum3_v2f32__inlineimm1(<2 x float> %a, <2 x float> %c
; GFX9-LABEL: v_fmaximum3_v2f32__inlineimm1:
; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX9-NEXT: v_cmp_lt_f32_e32 vcc, 2.0, v1
-; GFX9-NEXT: v_cndmask_b32_e32 v4, 2.0, v1, vcc
+; GFX9-NEXT: v_max_f32_e32 v4, 2.0, v1
; GFX9-NEXT: v_mov_b32_e32 v5, 0x7fc00000
; GFX9-NEXT: v_cmp_o_f32_e32 vcc, v1, v1
; GFX9-NEXT: v_cndmask_b32_e32 v1, v5, v4, vcc
-; GFX9-NEXT: v_cmp_lt_f32_e32 vcc, 2.0, v0
-; GFX9-NEXT: v_cndmask_b32_e32 v4, 2.0, v0, vcc
+; GFX9-NEXT: v_max_f32_e32 v4, 2.0, v0
; GFX9-NEXT: v_cmp_o_f32_e32 vcc, v0, v0
; GFX9-NEXT: v_cndmask_b32_e32 v0, v5, v4, vcc
-; GFX9-NEXT: v_cmp_gt_f32_e32 vcc, v0, v2
-; GFX9-NEXT: v_cndmask_b32_e32 v4, v2, v0, vcc
+; GFX9-NEXT: v_max_f32_e32 v4, v0, v2
; GFX9-NEXT: v_cmp_o_f32_e32 vcc, v0, v2
-; GFX9-NEXT: v_cndmask_b32_e32 v4, v5, v4, vcc
-; GFX9-NEXT: v_cmp_class_f32_e64 vcc, v0, 64
-; GFX9-NEXT: v_cndmask_b32_e32 v0, v4, v0, vcc
-; GFX9-NEXT: v_cmp_class_f32_e64 vcc, v2, 64
-; GFX9-NEXT: v_cndmask_b32_e32 v0, v0, v2, vcc
-; GFX9-NEXT: v_cmp_eq_f32_e32 vcc, 0, v4
-; GFX9-NEXT: v_cndmask_b32_e32 v0, v4, v0, vcc
-; GFX9-NEXT: v_cmp_gt_f32_e32 vcc, v1, v3
-; GFX9-NEXT: v_cndmask_b32_e32 v2, v3, v1, vcc
+; GFX9-NEXT: v_cndmask_b32_e32 v0, v5, v4, vcc
+; GFX9-NEXT: v_max_f32_e32 v2, v1, v3
; GFX9-NEXT: v_cmp_o_f32_e32 vcc, v1, v3
-; GFX9-NEXT: v_cndmask_b32_e32 v2, v5, v2, vcc
-; GFX9-NEXT: v_cmp_class_f32_e64 vcc, v1, 64
-; GFX9-NEXT: v_cndmask_b32_e32 v1, v2, v1, vcc
-; GFX9-NEXT: v_cmp_class_f32_e64 vcc, v3, 64
-; GFX9-NEXT: v_cndmask_b32_e32 v1, v1, v3, vcc
-; GFX9-NEXT: v_cmp_eq_f32_e32 vcc, 0, v2
-; GFX9-NEXT: v_cndmask_b32_e32 v1, v2, v1, vcc
+; GFX9-NEXT: v_cndmask_b32_e32 v1, v5, v2, vcc
; GFX9-NEXT: s_setpc_b64 s[30:31]
%max0 = call <2 x float> @llvm.maximum.v2f32(<2 x float> %a, <2 x float> <float 2.0, float 2.0>)
%max1 = call <2 x float> @llvm.maximum.v2f32(<2 x float> %max0, <2 x float> %c)
@@ -801,33 +673,17 @@ define <2 x float> @v_fmaximum3_v2f32__inlineimm2(<2 x float> %a, <2 x float> %b
; GFX9-LABEL: v_fmaximum3_v2f32__inlineimm2:
; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX9-NEXT: v_cmp_gt_f32_e32 vcc, v1, v3
-; GFX9-NEXT: v_cndmask_b32_e32 v4, v3, v1, vcc
+; GFX9-NEXT: v_max_f32_e32 v4, v1, v3
; GFX9-NEXT: v_mov_b32_e32 v5, 0x7fc00000
; GFX9-NEXT: v_cmp_o_f32_e32 vcc, v1, v3
-; GFX9-NEXT: v_cndmask_b32_e32 v4, v5, v4, vcc
-; GFX9-NEXT: v_cmp_class_f32_e64 vcc, v1, 64
-; GFX9-NEXT: v_cndmask_b32_e32 v1, v4, v1, vcc
-; GFX9-NEXT: v_cmp_class_f32_e64 vcc, v3, 64
-; GFX9-NEXT: v_cndmask_b32_e32 v1, v1, v3, vcc
-; GFX9-NEXT: v_cmp_eq_f32_e32 vcc, 0, v4
-; GFX9-NEXT: v_cndmask_b32_e32 v1, v4, v1, vcc
-; GFX9-NEXT: v_cmp_gt_f32_e32 vcc, v0, v2
-; GFX9-NEXT: v_cndmask_b32_e32 v3, v2, v0, vcc
+; GFX9-NEXT: v_cndmask_b32_e32 v1, v5, v4, vcc
+; GFX9-NEXT: v_max_f32_e32 v3, v0, v2
; GFX9-NEXT: v_cmp_o_f32_e32 vcc, v0, v2
-; GFX9-NEXT: v_cndmask_b32_e32 v3, v5, v3, vcc
-; GFX9-NEXT: v_cmp_class_f32_e64 vcc, v0, 64
-; GFX9-NEXT: v_cndmask_b32_e32 v0, v3, v0, vcc
-; GFX9-NEXT: v_cmp_class_f32_e64 vcc, v2, 64
-; GFX9-NEXT: v_cndmask_b32_e32 v0, v0, v2, vcc
-; GFX9-NEXT: v_cmp_eq_f32_e32 vcc, 0, v3
-; GFX9-NEXT: v_cndmask_b32_e32 v0, v3, v0, vcc
-; GFX9-NEXT: v_cmp_lt_f32_e32 vcc, 4.0, v0
-; GFX9-NEXT: v_cndmask_b32_e32 v2, 4.0, v0, vcc
+; GFX9-NEXT: v_cndmask_b32_e32 v0, v5, v3, vcc
+; GFX9-NEXT: v_max_f32_e32 v2, 4.0, v0
; GFX9-NEXT: v_cmp_o_f32_e32 vcc, v0, v0
; GFX9-NEXT: v_cndmask_b32_e32 v0, v5, v2, vcc
-; GFX9-NEXT: v_cmp_lt_f32_e32 vcc, 4.0, v1
-; GFX9-NEXT: v_cndmask_b32_e32 v2, 4.0, v1, vcc
+; GFX9-NEXT: v_max_f32_e32 v2, 4.0, v1
; GFX9-NEXT: v_cmp_o_f32_e32 vcc, v1, v1
; GFX9-NEXT: v_cndmask_b32_e32 v1, v5, v2, vcc
; GFX9-NEXT: s_setpc_b64 s[30:31]
@@ -852,67 +708,25 @@ define <3 x float> @v_fmaximum3_v3f32(<3 x float> %a, <3 x float> %b, <3 x float
; GFX9-L...
[truncated]
|
On top of #93550 |
7499e03
to
0ec84b9
Compare
@@ -28,35 +28,17 @@ define half @test_fminimum(half %x, half %y) { | |||
define <8 x half> @test_fminimum_scalarize(<8 x half> %x, <8 x half> %y) "no-nans-fp-math"="true" "no-signed-zeros-fp-math"="true" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The _scalarize
should be removed now.
@e-kud What's these tests testing for? If in a NNan, Nsz situation, we can generate VMIN/MAXPH directly. Or you actually wanted "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false"
? But they are the default value, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you've said we test that only VMIN/VMAX is enough. Otherwise we need to deal with +/-0
using fpclass
and NaN
s with vcmpunord
. It's called scalarize because already scalarized version of FMINIMUM comes to lowering. Probably this is the reason why it has no much support. Let me see what's going wrong and why we stop vectorizing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean should be removed? Based on the test name, the attributes should not be here. Also, should migrate to using the instruction flags. In any case this patch shouldn't do anything with this function, other than update it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean the _scalarize
in the function name. The codegen was scalarized for some reason, and it's not ture with this patch. So keeping this name is confusing. It's not important since @e-kud kindly help investigating it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the situation is the following: FMINIMUM
and FMAXIMUM
are set to Expand
through setF16Action
lambda. And I didn't set v8f16
separately to Custom
because I saw SELECT
inside setF16Action
. And since FMINIMUM
lowering requires support of SELECT
I decided that we need to support SELECT
for v8f16
first. Now I see, that despite of SELECT
inside setF16Action
, we overwrite it later with Custom
for v8f16
.
When I've set Custom
for FMINIMUM
I've seen the desired result
test_fminimum_scalarize: # @test_fminimum_scalarize
.cfi_startproc
# %bb.0:
vminph %xmm1, %xmm0, %xmm0
retq
Let me support it with fixing test naming to illustrate the attributes
Changes to cases are from SDFlags, not related to vector unroll? |
Only look at the top commit |
First, expandFMINIMUM_FMAXIMUM should be a never-fail API. The client wanted it expanded, and it can always be expanded. This logic was tied up with what the VectorLegalizer wanted. Prefer using the min/max opcodes, and unrolling if we don't have a vselect. This seems to produce better code in all the changed tests.
0ec84b9
to
f1296cd
Compare
ping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
return; | ||
} | ||
break; | ||
Results.push_back(TLI.expandFMINIMUM_FMAXIMUM(Node, DAG)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add an assert in case expandFMINIMUM_FMAXIMUM may change to return SDValue() someday?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will hit an assert anyway as it is later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay
First, expandFMINIMUM_FMAXIMUM should be a never-fail API. The client
wanted it expanded, and it can always be expanded. This logic was tied
up with what the VectorLegalizer wanted.
Prefer using the min/max opcodes, and unrolling if we don't have a vselect.
This seems to produce better code in all the changed tests.