-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[SelectionDAG] Fix isKnownNeverZeroFloat for vectors #78308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-llvm-selectiondag @llvm/pr-subscribers-backend-x86 Author: Danila Malyutin (danilaml) ChangesReturn true iff all of vector elements are constant AND not zero Fixes #77805 Previously, it'd return Full diff: https://github.com/llvm/llvm-project/pull/78308.diff 2 Files Affected:
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
index 01d31806c8442f..8bd39db82e1b0b 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
@@ -5239,14 +5239,11 @@ bool SelectionDAG::isKnownNeverZeroFloat(SDValue Op) const {
// Return false if we find any zero in a vector.
if (Op->getOpcode() == ISD::BUILD_VECTOR ||
Op->getOpcode() == ISD::SPLAT_VECTOR) {
- for (const SDValue &OpVal : Op->op_values()) {
- if (OpVal.isUndef())
- return false;
+ return llvm::all_of(Op->op_values(), [](const SDValue &OpVal){
if (auto *C = dyn_cast<ConstantFPSDNode>(OpVal))
- if (C->isZero())
- return false;
- }
- return true;
+ return !C->isZero();
+ return false;
+ });
}
return false;
}
diff --git a/llvm/test/CodeGen/X86/fminimum-fmaximum.ll b/llvm/test/CodeGen/X86/fminimum-fmaximum.ll
index 8905d2bce5e92a..442d1d29590807 100644
--- a/llvm/test/CodeGen/X86/fminimum-fmaximum.ll
+++ b/llvm/test/CodeGen/X86/fminimum-fmaximum.ll
@@ -1322,3 +1322,63 @@ define <4 x float> @test_fmaximum_vector_zero(<4 x float> %x) {
%r = call <4 x float> @llvm.maximum.v4f32(<4 x float> %x, <4 x float> <float 0., float 0., float 0., float 0.>)
ret <4 x float> %r
}
+
+; Check that signed zeroes are handled correctly in this case
+define <4 x float> @test_fmaximum_v4f32_splat(<4 x float> %x, float %y) {
+; SSE2-LABEL: test_fmaximum_v4f32_splat:
+; SSE2: # %bb.0:
+; SSE2-NEXT: shufps {{.*#+}} xmm1 = xmm1[0,0,0,0]
+; SSE2-NEXT: pxor %xmm2, %xmm2
+; SSE2-NEXT: pcmpgtd %xmm0, %xmm2
+; SSE2-NEXT: movdqa %xmm2, %xmm3
+; SSE2-NEXT: pandn %xmm0, %xmm3
+; SSE2-NEXT: movaps %xmm1, %xmm4
+; SSE2-NEXT: andps %xmm2, %xmm4
+; SSE2-NEXT: orps %xmm3, %xmm4
+; SSE2-NEXT: pand %xmm2, %xmm0
+; SSE2-NEXT: andnps %xmm1, %xmm2
+; SSE2-NEXT: por %xmm2, %xmm0
+; SSE2-NEXT: movdqa %xmm0, %xmm1
+; SSE2-NEXT: maxps %xmm4, %xmm1
+; SSE2-NEXT: movdqa %xmm0, %xmm2
+; SSE2-NEXT: cmpunordps %xmm0, %xmm2
+; SSE2-NEXT: andps %xmm2, %xmm0
+; SSE2-NEXT: andnps %xmm1, %xmm2
+; SSE2-NEXT: orps %xmm2, %xmm0
+; SSE2-NEXT: retq
+;
+; AVX1-LABEL: test_fmaximum_v4f32_splat:
+; AVX1: # %bb.0:
+; AVX1-NEXT: vshufps {{.*#+}} xmm1 = xmm1[0,0,0,0]
+; AVX1-NEXT: vblendvps %xmm0, %xmm1, %xmm0, %xmm2
+; AVX1-NEXT: vblendvps %xmm0, %xmm0, %xmm1, %xmm0
+; AVX1-NEXT: vmaxps %xmm2, %xmm0, %xmm1
+; AVX1-NEXT: vcmpunordps %xmm0, %xmm0, %xmm2
+; AVX1-NEXT: vblendvps %xmm2, %xmm0, %xmm1, %xmm0
+; AVX1-NEXT: retq
+;
+; AVX512-LABEL: test_fmaximum_v4f32_splat:
+; AVX512: # %bb.0:
+; AVX512-NEXT: vbroadcastss %xmm1, %xmm1
+; AVX512-NEXT: vblendvps %xmm0, %xmm1, %xmm0, %xmm2
+; AVX512-NEXT: vblendvps %xmm0, %xmm0, %xmm1, %xmm0
+; AVX512-NEXT: vmaxps %xmm2, %xmm0, %xmm1
+; AVX512-NEXT: vcmpunordps %xmm0, %xmm0, %xmm2
+; AVX512-NEXT: vblendvps %xmm2, %xmm0, %xmm1, %xmm0
+; AVX512-NEXT: retq
+;
+; X86-LABEL: test_fmaximum_v4f32_splat:
+; X86: # %bb.0:
+; X86-NEXT: vbroadcastss {{[0-9]+}}(%esp), %xmm1
+; X86-NEXT: vblendvps %xmm0, %xmm1, %xmm0, %xmm2
+; X86-NEXT: vblendvps %xmm0, %xmm0, %xmm1, %xmm0
+; X86-NEXT: vmaxps %xmm2, %xmm0, %xmm1
+; X86-NEXT: vcmpunordps %xmm0, %xmm0, %xmm2
+; X86-NEXT: vblendvps %xmm2, %xmm0, %xmm1, %xmm0
+; X86-NEXT: retl
+ %splatinsert = insertelement <4 x float> poison, float %y, i64 0
+ %vec = shufflevector <4 x float> %splatinsert, <4 x float> poison, <4 x i32> zeroinitializer
+ %r = call <4 x float> @llvm.maximum.v4f32(<4 x float> %x, <4 x float> %vec) readnone
+ ret <4 x float> %r
+}
+
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please can you rebase on d6ee91b to show the codegen diff
} | ||
return true; | ||
return !C->isZero(); | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might be able to do this with matchUnaryFpPredicate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. Didn't know about it. I hope I used it correctly.
3874940
to
687cc43
Compare
SSE2 codegen is terrible (is this all to avoid a simple branch?), but otherwise looks fine to me. |
Return true iff all of vector elements are constant AND not zero Fixes llvm#77805
687cc43
to
067bce5
Compare
Return true iff all of vector elements are constant AND not zero
Fixes #77805
Previously, it'd return
true
(as in - the value is known to be never zero) for any build_vector/splat_vector with non-constant elements.