-
Notifications
You must be signed in to change notification settings - Fork 13.6k
AMDGPU: Replace unused permlane inputs with poison instead of undef #131288
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AMDGPU: Replace unused permlane inputs with poison instead of undef #131288
Conversation
@llvm/pr-subscribers-llvm-transforms @llvm/pr-subscribers-backend-amdgpu Author: Matt Arsenault (arsenm) ChangesFull diff: https://github.com/llvm/llvm-project/pull/131288.diff 2 Files Affected:
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp b/llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp
index 26c48c19ebfd8..7cd97e95b0189 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp
@@ -1115,7 +1115,7 @@ GCNTTIImpl::instCombineIntrinsic(InstCombiner &IC, IntrinsicInst &II) const {
case Intrinsic::amdgcn_permlanex16_var: {
// Discard vdst_in if it's not going to be read.
Value *VDstIn = II.getArgOperand(0);
- if (isa<UndefValue>(VDstIn))
+ if (isa<PoisonValue>(VDstIn))
break;
// FetchInvalid operand idx.
@@ -1134,7 +1134,7 @@ GCNTTIImpl::instCombineIntrinsic(InstCombiner &IC, IntrinsicInst &II) const {
if (!FetchInvalid->getZExtValue() && !BoundCtrl->getZExtValue())
break;
- return IC.replaceOperand(II, 0, UndefValue::get(VDstIn->getType()));
+ return IC.replaceOperand(II, 0, PoisonValue::get(VDstIn->getType()));
}
case Intrinsic::amdgcn_permlane64:
case Intrinsic::amdgcn_readfirstlane:
diff --git a/llvm/test/Transforms/InstCombine/AMDGPU/amdgcn-intrinsics.ll b/llvm/test/Transforms/InstCombine/AMDGPU/amdgcn-intrinsics.ll
index 3e30b2a5b0911..99e6fc98588ae 100644
--- a/llvm/test/Transforms/InstCombine/AMDGPU/amdgcn-intrinsics.ll
+++ b/llvm/test/Transforms/InstCombine/AMDGPU/amdgcn-intrinsics.ll
@@ -3272,7 +3272,7 @@ define amdgpu_kernel void @permlane16(ptr addrspace(1) %out, i32 %src0, i32 %src
define amdgpu_kernel void @permlane16_bound_ctrl(ptr addrspace(1) %out, i32 %src0, i32 %src1, i32 %src2) {
; CHECK-LABEL: @permlane16_bound_ctrl(
-; CHECK-NEXT: [[RES:%.*]] = call i32 @llvm.amdgcn.permlane16.i32(i32 undef, i32 [[SRC0:%.*]], i32 [[SRC1:%.*]], i32 [[SRC2:%.*]], i1 false, i1 true)
+; CHECK-NEXT: [[RES:%.*]] = call i32 @llvm.amdgcn.permlane16.i32(i32 poison, i32 [[SRC0:%.*]], i32 [[SRC1:%.*]], i32 [[SRC2:%.*]], i1 false, i1 true)
; CHECK-NEXT: store i32 [[RES]], ptr addrspace(1) [[OUT:%.*]], align 4
; CHECK-NEXT: ret void
;
@@ -3283,7 +3283,7 @@ define amdgpu_kernel void @permlane16_bound_ctrl(ptr addrspace(1) %out, i32 %src
define amdgpu_kernel void @permlane16_fetch_invalid_bound_ctrl(ptr addrspace(1) %out, i32 %src0, i32 %src1, i32 %src2) {
; CHECK-LABEL: @permlane16_fetch_invalid_bound_ctrl(
-; CHECK-NEXT: [[RES:%.*]] = call i32 @llvm.amdgcn.permlane16.i32(i32 undef, i32 [[SRC0:%.*]], i32 [[SRC1:%.*]], i32 [[SRC2:%.*]], i1 true, i1 true)
+; CHECK-NEXT: [[RES:%.*]] = call i32 @llvm.amdgcn.permlane16.i32(i32 poison, i32 [[SRC0:%.*]], i32 [[SRC1:%.*]], i32 [[SRC2:%.*]], i1 true, i1 true)
; CHECK-NEXT: store i32 [[RES]], ptr addrspace(1) [[OUT:%.*]], align 4
; CHECK-NEXT: ret void
;
@@ -3311,7 +3311,7 @@ define amdgpu_kernel void @permlanex16(ptr addrspace(1) %out, i32 %src0, i32 %sr
define amdgpu_kernel void @permlanex16_bound_ctrl(ptr addrspace(1) %out, i32 %src0, i32 %src1, i32 %src2) {
; CHECK-LABEL: @permlanex16_bound_ctrl(
-; CHECK-NEXT: [[RES:%.*]] = call i32 @llvm.amdgcn.permlanex16.i32(i32 undef, i32 [[SRC0:%.*]], i32 [[SRC1:%.*]], i32 [[SRC2:%.*]], i1 false, i1 true)
+; CHECK-NEXT: [[RES:%.*]] = call i32 @llvm.amdgcn.permlanex16.i32(i32 poison, i32 [[SRC0:%.*]], i32 [[SRC1:%.*]], i32 [[SRC2:%.*]], i1 false, i1 true)
; CHECK-NEXT: store i32 [[RES]], ptr addrspace(1) [[OUT:%.*]], align 4
; CHECK-NEXT: ret void
;
@@ -3322,7 +3322,7 @@ define amdgpu_kernel void @permlanex16_bound_ctrl(ptr addrspace(1) %out, i32 %sr
define amdgpu_kernel void @permlanex16_fetch_invalid_bound_ctrl(ptr addrspace(1) %out, i32 %src0, i32 %src1, i32 %src2) {
; CHECK-LABEL: @permlanex16_fetch_invalid_bound_ctrl(
-; CHECK-NEXT: [[RES:%.*]] = call i32 @llvm.amdgcn.permlanex16.i32(i32 undef, i32 [[SRC0:%.*]], i32 [[SRC1:%.*]], i32 [[SRC2:%.*]], i1 true, i1 true)
+; CHECK-NEXT: [[RES:%.*]] = call i32 @llvm.amdgcn.permlanex16.i32(i32 poison, i32 [[SRC0:%.*]], i32 [[SRC1:%.*]], i32 [[SRC2:%.*]], i1 true, i1 true)
; CHECK-NEXT: store i32 [[RES]], ptr addrspace(1) [[OUT:%.*]], align 4
; CHECK-NEXT: ret void
;
|
@@ -1115,7 +1115,7 @@ GCNTTIImpl::instCombineIntrinsic(InstCombiner &IC, IntrinsicInst &II) const { | |||
case Intrinsic::amdgcn_permlanex16_var: { | |||
// Discard vdst_in if it's not going to be read. | |||
Value *VDstIn = II.getArgOperand(0); | |||
if (isa<UndefValue>(VDstIn)) | |||
if (isa<PoisonValue>(VDstIn)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know how prevalent this intrinsic is, but I would maybe leave this one as-is to preserve compatibility with older IR. We can always do a search & replace later when we get rid of undef.
The rest LGTM, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is mostly to stop infinite looping the combiner
Same kind of objection as #131287: as a general strategy, "replace unused inputs with poison" seems incompatible with "propagate poison from arguments to result". @nunoplopes any thoughts on this? |
Repeating from the other review, but this is not the case. Poison does not unconditionally fold through intrinsics. This is specific to an operand for an intrinsic. It just happens in the common case, normal arithmetic-like intrinsics propagate poison in any inputs |
Then I think the poison-propagating rules for this intrinsic should be documented. They're not obvious and "it just does what the underlying instruction does" is no longer sufficient. |
We're currently not propagating poison for these intrinsics, and this patch isn't changing any behavior. This is changing an implementation detail of the discard input that is already done |
This patch is relying on an undocumented assumption that poison is not propagated for all arguments of the intrinsic. But I won't insist, if you really want to commit it as-is. |
The only folding of anything for target intrinsics is done explicitly here in this function |
4928c54
to
cbc0b70
Compare
b9480c7
to
f44ea0e
Compare
No description provided.